Sequence assembly and annotation MCQs With Answer

Sequence assembly and annotation MCQs With Answer

This quiz set focuses on sequence assembly and annotation tailored for M.Pharm students engaged in bioinformatics and computational biotechnology. It covers core concepts such as sequencing read types, assembly strategies (overlap-layout-consensus and de Bruijn graphs), metrics like N50 and coverage, scaffolding, error correction and hybrid assembly approaches. The annotation section addresses gene prediction methods (ab initio vs evidence-based), structural and functional annotation, common tools (e.g., Prokka, AUGUSTUS, InterProScan), file formats (GFF/GTF) and quality assessment metrics (BUSCO). These MCQs are intended to reinforce practical understanding and decision-making when designing assembly and annotation pipelines in pharmaceutical research contexts.

Q1. What does the N50 metric represent in genome assembly?

  • The length of the longest contig in the assembly
  • The average contig length weighted by GC content
  • The contig length such that 50% of the assembly is contained in contigs of that length or longer
  • The percentage of reads that successfully mapped back to contigs

Correct Answer: The contig length such that 50% of the assembly is contained in contigs of that length or longer

Q2. Which assembly algorithm is most associated with constructing de Bruijn graphs?

  • Overlap-Layout-Consensus (OLC)
  • De Bruijn graph-based assembly
  • Greedy extension assembly
  • Hierarchical shotgun assembly

Correct Answer: De Bruijn graph-based assembly

Q3. Which sequencing technology is characterized by very long reads but higher per-read error rates?

  • Illumina short-read sequencing
  • Sanger sequencing
  • Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing
  • Microarray hybridization

Correct Answer: Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing

Q4. What is the main purpose of read error correction before assembly?

  • To remove adapter sequences only
  • To convert RNA reads to DNA sequences
  • To reduce sequencing errors that can fragment the assembly and produce false branches in graphs
  • To increase GC content artificially

Correct Answer: To reduce sequencing errors that can fragment the assembly and produce false branches in graphs

Q5. Paired-end reads are particularly useful in assembly because they:

  • Always have lower error rates than single reads
  • Provide orientation and approximate distance information to link contigs across repeats
  • Are only used for transcriptome assembly
  • Do not require quality trimming

Correct Answer: Provide orientation and approximate distance information to link contigs across repeats

Q6. Which file format is commonly used to store genome feature annotations like gene coordinates?

  • FASTQ
  • BAM
  • GFF/GTF
  • FASTA

Correct Answer: GFF/GTF

Q7. BUSCO is a tool used to:

  • Assemble reads into contigs using de Bruijn graphs
  • Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs
  • Predict promoter regions in bacterial genomes
  • Mask repetitive sequences prior to annotation

Correct Answer: Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs

Q8. Which approach describes ab initio gene prediction?

  • Using only experimental evidence (RNA-seq) to define exons
  • Predicting genes based on intrinsic signals and statistical models without external evidence
  • Transferring annotations from a closely related species using alignment
  • Annotating only non-coding RNAs

Correct Answer: Predicting genes based on intrinsic signals and statistical models without external evidence

Q9. RepeatMasker is primarily used to:

  • Predict coding sequences (CDS)
  • Mask or identify repetitive DNA elements before annotation
  • Correct sequencing errors in reads
  • Align protein sequences to the genome

Correct Answer: Mask or identify repetitive DNA elements before annotation

Q10. Which of the following tools is widely used for prokaryotic genome annotation?

  • Prokka
  • TopHat
  • Cufflinks
  • Bowtie

Correct Answer: Prokka

Q11. In de Bruijn graph assembly, what is a k-mer?

  • A read aligned to a reference
  • An exact DNA subsequence of length k used as nodes or edges in the graph
  • A scoring metric for coverage depth
  • The longest contig assembled

Correct Answer: An exact DNA subsequence of length k used as nodes or edges in the graph

Q12. What is scaffolding in genome assembly?

  • The process of trimming low-quality bases from reads
  • Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads
  • Annotating genes with functional descriptions
  • Converting RNA alignments into gene models

Correct Answer: Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads

Q13. Which of the following best describes hybrid assembly?

  • Using only short reads for assembly
  • Combining different sequencing data types (e.g., short and long reads) to improve assembly quality
  • Annotating genomes using both ab initio and evidence-based methods
  • Assembling transcriptomes from single-cell RNA-seq

Correct Answer: Combining different sequencing data types (e.g., short and long reads) to improve assembly quality

Q14. Functional annotation commonly uses which of the following to assign putative functions to predicted proteins?

  • BLAST searches against curated protein databases and InterProScan domain searches
  • Only GC content analysis
  • RepeatMasker output
  • k-mer frequency distributions

Correct Answer: BLAST searches against curated protein databases and InterProScan domain searches

Q15. What role does Pilon play in genome workflows?

  • It performs ab initio gene prediction
  • It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments
  • It annotates tRNAs and rRNAs
  • It generates de Bruijn graphs from raw reads

Correct Answer: It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments

Q16. Which tool is commonly used to predict transfer RNA genes in a genome?

  • tRNAscan-SE
  • GATK
  • SPAdes
  • MAFFT

Correct Answer: tRNAscan-SE

Q17. In annotation, what is a pseudogene?

  • A gene that encodes a functional protein in all species
  • A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts
  • An RNA gene with known structure and function
  • A short repeat element in the genome

Correct Answer: A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts

Q18. Which statement about reference-guided (mapping-based) assembly is correct?

  • It builds a genome de novo without any external reference
  • It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists
  • It always outperforms de novo assembly in novel species discovery
  • It cannot detect structural variants

Correct Answer: It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists

Q19. Which of the following is a common measure to evaluate assembly accuracy besides N50?

  • GC skew only
  • Read mapping rate, number of misassemblies, and BUSCO completeness
  • Only the number of raw reads
  • Protein isoelectric point distribution

Correct Answer: Read mapping rate, number of misassemblies, and BUSCO completeness

Q20. During functional annotation, InterProScan is mainly used to:

  • Assemble raw sequencing reads
  • Detect protein domains and signatures to help infer protein functions
  • Align genomic reads to a reference genome
  • Mask low-complexity regions in DNA

Correct Answer: Detect protein domains and signatures to help infer protein functions

Leave a Comment