DNA sequencing analysis MCQs With Answer

Introduction:

This quiz set on DNA sequencing analysis is tailored for M.Pharm students specializing in Advanced Pharmaceutical Biotechnology. It covers core concepts including sequencing platforms, data formats, quality metrics, alignment and assembly algorithms, variant calling, and downstream interpretation relevant to drug development and pharmacogenomics. Questions are designed to deepen practical understanding of experimental design, bioinformatics workflows, error profiles, and common pitfalls encountered during sequence data analysis. Each MCQ tests both theoretical knowledge and applied reasoning to prepare students for research, regulatory submissions, and industry work where accurate sequence interpretation is critical for safety and efficacy assessments.

Q1. Which file format stores raw base calls along with per-base quality scores commonly used as the primary output of next-generation sequencing runs?

  • FASTA
  • BAM
  • FASTQ
  • VCF

Correct Answer: FASTQ

Q2. Which score represents the logarithmic probability that a base call is incorrect and is widely used for assessing per-base quality?

  • MAPQ
  • Phred score
  • Qubit score
  • RPKM

Correct Answer: Phred score

Q3. Which alignment algorithm is optimized for mapping short reads to a large reference genome using the Burrows-Wheeler transform?

  • BWA
  • Needleman-Wunsch
  • BLAST
  • ClustalW

Correct Answer: BWA

Q4. In SAM/BAM files, which field describes the confidence that a read is mapped to the reported position?

  • CIGAR string
  • FLAG
  • MAPQ
  • QUAL

Correct Answer: MAPQ

Q5. Which sequencing technology is known for very long reads with higher raw error rates but useful for resolving structural variants?

  • Illumina sequencing by synthesis
  • Sanger capillary sequencing
  • Oxford Nanopore
  • Pyrosequencing

Correct Answer: Oxford Nanopore

Q6. What is the primary purpose of adapter trimming in raw read preprocessing?

  • To increase GC content of reads
  • To remove sequencing platform-specific primer and adapter sequences that interfere with alignment
  • To convert FASTQ to FASTA format
  • To merge paired-end reads into longer contiguous sequences

Correct Answer: To remove sequencing platform-specific primer and adapter sequences that interfere with alignment

Q7. Which metric describes the minimum contig length such that 50% of the assembled genome is contained in contigs of that length or longer?

  • NG50
  • N50
  • Q50
  • LC50

Correct Answer: N50

Q8. In variant calling, which file format is the standard for reporting discovered single nucleotide variants and indels along with annotations?

  • FASTQ
  • BED
  • VCF
  • BAM

Correct Answer: VCF

Q9. Which of the following describes paired-end sequencing advantage over single-end reads?

  • Provides two independent measurements of the same base quality only
  • Helps resolve repeat regions and improves mapping across structural variants by providing insert size information
  • Eliminates the need for quality trimming
  • Reduces total sequencing run time by half

Correct Answer: Helps resolve repeat regions and improves mapping across structural variants by providing insert size information

Q10. Which approach to de novo assembly uses k-mers and constructs a graph to efficiently assemble short-read data?

  • Overlap-layout-consensus
  • de Bruijn graph
  • Hidden Markov Model
  • Suffix tree assembly

Correct Answer: de Bruijn graph

Q11. What is the effect of PCR duplicates on variant calling and how are they commonly handled?

  • Increase coverage uniformly; handled by downsampling randomly
  • Introduce false enrichment of reads from the same molecule; removed or marked using duplicate-marking tools (e.g., Picard)
  • Create additional true variants; preserved for sensitivity
  • Convert reads to FASTA; resolved by recalibration

Correct Answer: Introduce false enrichment of reads from the same molecule; removed or marked using duplicate-marking tools (e.g., Picard)

Q12. Which quality control tool provides per-base sequence quality, GC content, adapter content and duplication levels for sequencing libraries?

  • GATK
  • FastQC
  • BCFtools
  • Trimmomatic

Correct Answer: FastQC

Q13. Which concept describes the number of times, on average, a nucleotide is sequenced and is critical for sensitivity in detecting low-frequency variants?

  • Coverage depth
  • Read length
  • GC bias
  • Base composition

Correct Answer: Coverage depth

Q14. In the context of variant interpretation, what does “phasing” refer to?

  • Assigning each variant to a specific chromosome copy (haplotype) to determine if variants are in cis or trans
  • Converting variant coordinates between genome builds
  • Applying base quality recalibration
  • Annotating variants with clinical significance

Correct Answer: Assigning each variant to a specific chromosome copy (haplotype) to determine if variants are in cis or trans

Q15. Which error mode is particularly problematic for sequencing platforms that use signal intensity across homopolymers, causing indel errors?

  • Substitution errors in GC-rich regions
  • Homopolymer-associated indel errors
  • Base-calling phasing errors only in Illumina
  • Adapter ligation errors

Correct Answer: Homopolymer-associated indel errors

Q16. When aligning reads, what does the CIGAR string encode in a SAM/BAM record?

  • Base quality scores for each read
  • Mapping quality score only
  • A compact description of alignment operations (matches, insertions, deletions, clipping)
  • Read group and sample metadata

Correct Answer: A compact description of alignment operations (matches, insertions, deletions, clipping)

Q17. Which practice improves variant calling accuracy by recalibrating base quality scores using known variant sites?

  • Local realignment
  • Base Quality Score Recalibration (BQSR)
  • Adapter trimming
  • De novo assembly

Correct Answer: Base Quality Score Recalibration (BQSR)

Q18. For targeted sequencing panels, which metric describes the proportion of target bases covered by at least one read or by a specified depth?

  • On-target rate
  • Coverage breadth
  • Read length distribution
  • GC content bias

Correct Answer: Coverage breadth

Q19. Which variant calling challenge is most effectively addressed by using unique molecular identifiers (UMIs) during library preparation?

  • Improving read length
  • Distinguishing true low-frequency variants from PCR and sequencing errors by tracking original molecules
  • Eliminating the need for alignment
  • Increasing GC content uniformity

Correct Answer: Distinguishing true low-frequency variants from PCR and sequencing errors by tracking original molecules

Q20. Which coordinate system is used by VCF files for variant positions and differs from the 0-based coordinate system used in some alignment formats?

  • Half-open 0-based
  • 1-based inclusive
  • 0-based exclusive
  • Coordinate-free

Correct Answer: 1-based inclusive

Leave a Comment

PRO
Ad-Free Access
$3.99 / month
  • No Interruptions
  • Faster Page Loads
  • Support Content Creators