Sequence assembly and annotation MCQs With Answer
This quiz set focuses on sequence assembly and annotation tailored for M.Pharm students engaged in bioinformatics and computational biotechnology. It covers core concepts such as sequencing read types, assembly strategies (overlap-layout-consensus and de Bruijn graphs), metrics like N50 and coverage, scaffolding, error correction and hybrid assembly approaches. The annotation section addresses gene prediction methods (ab initio vs evidence-based), structural and functional annotation, common tools (e.g., Prokka, AUGUSTUS, InterProScan), file formats (GFF/GTF) and quality assessment metrics (BUSCO). These MCQs are intended to reinforce practical understanding and decision-making when designing assembly and annotation pipelines in pharmaceutical research contexts.
Q1. What does the N50 metric represent in genome assembly?
- The length of the longest contig in the assembly
- The average contig length weighted by GC content
- The contig length such that 50% of the assembly is contained in contigs of that length or longer
- The percentage of reads that successfully mapped back to contigs
Correct Answer: The contig length such that 50% of the assembly is contained in contigs of that length or longer
Q2. Which assembly algorithm is most associated with constructing de Bruijn graphs?
- Overlap-Layout-Consensus (OLC)
- De Bruijn graph-based assembly
- Greedy extension assembly
- Hierarchical shotgun assembly
Correct Answer: De Bruijn graph-based assembly
Q3. Which sequencing technology is characterized by very long reads but higher per-read error rates?
- Illumina short-read sequencing
- Sanger sequencing
- Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing
- Microarray hybridization
Correct Answer: Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing
Q4. What is the main purpose of read error correction before assembly?
- To remove adapter sequences only
- To convert RNA reads to DNA sequences
- To reduce sequencing errors that can fragment the assembly and produce false branches in graphs
- To increase GC content artificially
Correct Answer: To reduce sequencing errors that can fragment the assembly and produce false branches in graphs
Q5. Paired-end reads are particularly useful in assembly because they:
- Always have lower error rates than single reads
- Provide orientation and approximate distance information to link contigs across repeats
- Are only used for transcriptome assembly
- Do not require quality trimming
Correct Answer: Provide orientation and approximate distance information to link contigs across repeats
Q6. Which file format is commonly used to store genome feature annotations like gene coordinates?
- FASTQ
- BAM
- GFF/GTF
- FASTA
Correct Answer: GFF/GTF
Q7. BUSCO is a tool used to:
- Assemble reads into contigs using de Bruijn graphs
- Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs
- Predict promoter regions in bacterial genomes
- Mask repetitive sequences prior to annotation
Correct Answer: Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs
Q8. Which approach describes ab initio gene prediction?
- Using only experimental evidence (RNA-seq) to define exons
- Predicting genes based on intrinsic signals and statistical models without external evidence
- Transferring annotations from a closely related species using alignment
- Annotating only non-coding RNAs
Correct Answer: Predicting genes based on intrinsic signals and statistical models without external evidence
Q9. RepeatMasker is primarily used to:
- Predict coding sequences (CDS)
- Mask or identify repetitive DNA elements before annotation
- Correct sequencing errors in reads
- Align protein sequences to the genome
Correct Answer: Mask or identify repetitive DNA elements before annotation
Q10. Which of the following tools is widely used for prokaryotic genome annotation?
- Prokka
- TopHat
- Cufflinks
- Bowtie
Correct Answer: Prokka
Q11. In de Bruijn graph assembly, what is a k-mer?
- A read aligned to a reference
- An exact DNA subsequence of length k used as nodes or edges in the graph
- A scoring metric for coverage depth
- The longest contig assembled
Correct Answer: An exact DNA subsequence of length k used as nodes or edges in the graph
Q12. What is scaffolding in genome assembly?
- The process of trimming low-quality bases from reads
- Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads
- Annotating genes with functional descriptions
- Converting RNA alignments into gene models
Correct Answer: Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads
Q13. Which of the following best describes hybrid assembly?
- Using only short reads for assembly
- Combining different sequencing data types (e.g., short and long reads) to improve assembly quality
- Annotating genomes using both ab initio and evidence-based methods
- Assembling transcriptomes from single-cell RNA-seq
Correct Answer: Combining different sequencing data types (e.g., short and long reads) to improve assembly quality
Q14. Functional annotation commonly uses which of the following to assign putative functions to predicted proteins?
- BLAST searches against curated protein databases and InterProScan domain searches
- Only GC content analysis
- RepeatMasker output
- k-mer frequency distributions
Correct Answer: BLAST searches against curated protein databases and InterProScan domain searches
Q15. What role does Pilon play in genome workflows?
- It performs ab initio gene prediction
- It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments
- It annotates tRNAs and rRNAs
- It generates de Bruijn graphs from raw reads
Correct Answer: It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments
Q16. Which tool is commonly used to predict transfer RNA genes in a genome?
- tRNAscan-SE
- GATK
- SPAdes
- MAFFT
Correct Answer: tRNAscan-SE
Q17. In annotation, what is a pseudogene?
- A gene that encodes a functional protein in all species
- A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts
- An RNA gene with known structure and function
- A short repeat element in the genome
Correct Answer: A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts
Q18. Which statement about reference-guided (mapping-based) assembly is correct?
- It builds a genome de novo without any external reference
- It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists
- It always outperforms de novo assembly in novel species discovery
- It cannot detect structural variants
Correct Answer: It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists
Q19. Which of the following is a common measure to evaluate assembly accuracy besides N50?
- GC skew only
- Read mapping rate, number of misassemblies, and BUSCO completeness
- Only the number of raw reads
- Protein isoelectric point distribution
Correct Answer: Read mapping rate, number of misassemblies, and BUSCO completeness
Q20. During functional annotation, InterProScan is mainly used to:
- Assemble raw sequencing reads
- Detect protein domains and signatures to help infer protein functions
- Align genomic reads to a reference genome
- Mask low-complexity regions in DNA
Correct Answer: Detect protein domains and signatures to help infer protein functions

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

