Sequence assembly and annotation MCQs With Answer

Sequence assembly and annotation MCQs With Answer

This quiz set focuses on sequence assembly and annotation tailored for M.Pharm students engaged in bioinformatics and computational biotechnology. It covers core concepts such as sequencing read types, assembly strategies (overlap-layout-consensus and de Bruijn graphs), metrics like N50 and coverage, scaffolding, error correction and hybrid assembly approaches. The annotation section addresses gene prediction methods (ab initio vs evidence-based), structural and functional annotation, common tools (e.g., Prokka, AUGUSTUS, InterProScan), file formats (GFF/GTF) and quality assessment metrics (BUSCO). These MCQs are intended to reinforce practical understanding and decision-making when designing assembly and annotation pipelines in pharmaceutical research contexts.

Q1. What does the N50 metric represent in genome assembly?

The length of the longest contig in the assembly
The average contig length weighted by GC content
The contig length such that 50% of the assembly is contained in contigs of that length or longer
The percentage of reads that successfully mapped back to contigs

Correct Answer: The contig length such that 50% of the assembly is contained in contigs of that length or longer

Q2. Which assembly algorithm is most associated with constructing de Bruijn graphs?

Overlap-Layout-Consensus (OLC)
De Bruijn graph-based assembly
Greedy extension assembly
Hierarchical shotgun assembly

Correct Answer: De Bruijn graph-based assembly

Q3. Which sequencing technology is characterized by very long reads but higher per-read error rates?

Illumina short-read sequencing
Sanger sequencing
Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing
Microarray hybridization

Correct Answer: Pacific Biosciences (PacBio) and Oxford Nanopore long-read sequencing

Q4. What is the main purpose of read error correction before assembly?

To remove adapter sequences only
To convert RNA reads to DNA sequences
To reduce sequencing errors that can fragment the assembly and produce false branches in graphs
To increase GC content artificially

Correct Answer: To reduce sequencing errors that can fragment the assembly and produce false branches in graphs

Q5. Paired-end reads are particularly useful in assembly because they:

Always have lower error rates than single reads
Provide orientation and approximate distance information to link contigs across repeats
Are only used for transcriptome assembly
Do not require quality trimming

Correct Answer: Provide orientation and approximate distance information to link contigs across repeats

Q6. Which file format is commonly used to store genome feature annotations like gene coordinates?

FASTQ
BAM
GFF/GTF
FASTA

Correct Answer: GFF/GTF

Q7. BUSCO is a tool used to:

Assemble reads into contigs using de Bruijn graphs
Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs
Predict promoter regions in bacterial genomes
Mask repetitive sequences prior to annotation

Correct Answer: Assess assembly and annotation completeness using evolutionarily conserved single-copy orthologs

Q8. Which approach describes ab initio gene prediction?

Using only experimental evidence (RNA-seq) to define exons
Predicting genes based on intrinsic signals and statistical models without external evidence
Transferring annotations from a closely related species using alignment
Annotating only non-coding RNAs

Correct Answer: Predicting genes based on intrinsic signals and statistical models without external evidence

Q9. RepeatMasker is primarily used to:

Predict coding sequences (CDS)
Mask or identify repetitive DNA elements before annotation
Correct sequencing errors in reads
Align protein sequences to the genome

Correct Answer: Mask or identify repetitive DNA elements before annotation

Q10. Which of the following tools is widely used for prokaryotic genome annotation?

Prokka
TopHat
Cufflinks
Bowtie

Correct Answer: Prokka

Q11. In de Bruijn graph assembly, what is a k-mer?

A read aligned to a reference
An exact DNA subsequence of length k used as nodes or edges in the graph
A scoring metric for coverage depth
The longest contig assembled

Correct Answer: An exact DNA subsequence of length k used as nodes or edges in the graph

Q12. What is scaffolding in genome assembly?

The process of trimming low-quality bases from reads
Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads
Annotating genes with functional descriptions
Converting RNA alignments into gene models

Correct Answer: Ordering and orienting contigs into larger structures using linking information such as mate-pair or long reads

Q13. Which of the following best describes hybrid assembly?

Using only short reads for assembly
Combining different sequencing data types (e.g., short and long reads) to improve assembly quality
Annotating genomes using both ab initio and evidence-based methods
Assembling transcriptomes from single-cell RNA-seq

Correct Answer: Combining different sequencing data types (e.g., short and long reads) to improve assembly quality

Q14. Functional annotation commonly uses which of the following to assign putative functions to predicted proteins?

BLAST searches against curated protein databases and InterProScan domain searches
Only GC content analysis
RepeatMasker output
k-mer frequency distributions

Correct Answer: BLAST searches against curated protein databases and InterProScan domain searches

Q15. What role does Pilon play in genome workflows?

It performs ab initio gene prediction
It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments
It annotates tRNAs and rRNAs
It generates de Bruijn graphs from raw reads

Correct Answer: It polishes assemblies by correcting bases, fixing small indels and improving consensus using read alignments

Q16. Which tool is commonly used to predict transfer RNA genes in a genome?

tRNAscan-SE
GATK
SPAdes
MAFFT

Correct Answer: tRNAscan-SE

Q17. In annotation, what is a pseudogene?

A gene that encodes a functional protein in all species
A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts
An RNA gene with known structure and function
A short repeat element in the genome

Correct Answer: A non-functional genomic sequence resembling a gene, often disrupted by stop codons or frameshifts

Q18. Which statement about reference-guided (mapping-based) assembly is correct?

It builds a genome de novo without any external reference
It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists
It always outperforms de novo assembly in novel species discovery
It cannot detect structural variants

Correct Answer: It aligns reads to a known reference and reconstructs sequence, useful when a close reference exists

Q19. Which of the following is a common measure to evaluate assembly accuracy besides N50?

GC skew only
Read mapping rate, number of misassemblies, and BUSCO completeness
Only the number of raw reads
Protein isoelectric point distribution

Correct Answer: Read mapping rate, number of misassemblies, and BUSCO completeness

Q20. During functional annotation, InterProScan is mainly used to:

Assemble raw sequencing reads
Detect protein domains and signatures to help infer protein functions
Align genomic reads to a reference genome
Mask low-complexity regions in DNA

Correct Answer: Detect protein domains and signatures to help infer protein functions

Download

G S Sachin

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.

Mail- Sachin@pharmacyfreak.com

Leave a Comment Cancel reply