Introduction:
This quiz set on DNA sequencing analysis is tailored for M.Pharm students specializing in Advanced Pharmaceutical Biotechnology. It covers core concepts including sequencing platforms, data formats, quality metrics, alignment and assembly algorithms, variant calling, and downstream interpretation relevant to drug development and pharmacogenomics. Questions are designed to deepen practical understanding of experimental design, bioinformatics workflows, error profiles, and common pitfalls encountered during sequence data analysis. Each MCQ tests both theoretical knowledge and applied reasoning to prepare students for research, regulatory submissions, and industry work where accurate sequence interpretation is critical for safety and efficacy assessments.
Q1. Which file format stores raw base calls along with per-base quality scores commonly used as the primary output of next-generation sequencing runs?
- FASTA
- BAM
- FASTQ
- VCF
Correct Answer: FASTQ
Q2. Which score represents the logarithmic probability that a base call is incorrect and is widely used for assessing per-base quality?
- MAPQ
- Phred score
- Qubit score
- RPKM
Correct Answer: Phred score
Q3. Which alignment algorithm is optimized for mapping short reads to a large reference genome using the Burrows-Wheeler transform?
- BWA
- Needleman-Wunsch
- BLAST
- ClustalW
Correct Answer: BWA
Q4. In SAM/BAM files, which field describes the confidence that a read is mapped to the reported position?
- CIGAR string
- FLAG
- MAPQ
- QUAL
Correct Answer: MAPQ
Q5. Which sequencing technology is known for very long reads with higher raw error rates but useful for resolving structural variants?
- Illumina sequencing by synthesis
- Sanger capillary sequencing
- Oxford Nanopore
- Pyrosequencing
Correct Answer: Oxford Nanopore
Q6. What is the primary purpose of adapter trimming in raw read preprocessing?
- To increase GC content of reads
- To remove sequencing platform-specific primer and adapter sequences that interfere with alignment
- To convert FASTQ to FASTA format
- To merge paired-end reads into longer contiguous sequences
Correct Answer: To remove sequencing platform-specific primer and adapter sequences that interfere with alignment
Q7. Which metric describes the minimum contig length such that 50% of the assembled genome is contained in contigs of that length or longer?
- NG50
- N50
- Q50
- LC50
Correct Answer: N50
Q8. In variant calling, which file format is the standard for reporting discovered single nucleotide variants and indels along with annotations?
- FASTQ
- BED
- VCF
- BAM
Correct Answer: VCF
Q9. Which of the following describes paired-end sequencing advantage over single-end reads?
- Provides two independent measurements of the same base quality only
- Helps resolve repeat regions and improves mapping across structural variants by providing insert size information
- Eliminates the need for quality trimming
- Reduces total sequencing run time by half
Correct Answer: Helps resolve repeat regions and improves mapping across structural variants by providing insert size information
Q10. Which approach to de novo assembly uses k-mers and constructs a graph to efficiently assemble short-read data?
- Overlap-layout-consensus
- de Bruijn graph
- Hidden Markov Model
- Suffix tree assembly
Correct Answer: de Bruijn graph
Q11. What is the effect of PCR duplicates on variant calling and how are they commonly handled?
- Increase coverage uniformly; handled by downsampling randomly
- Introduce false enrichment of reads from the same molecule; removed or marked using duplicate-marking tools (e.g., Picard)
- Create additional true variants; preserved for sensitivity
- Convert reads to FASTA; resolved by recalibration
Correct Answer: Introduce false enrichment of reads from the same molecule; removed or marked using duplicate-marking tools (e.g., Picard)
Q12. Which quality control tool provides per-base sequence quality, GC content, adapter content and duplication levels for sequencing libraries?
- GATK
- FastQC
- BCFtools
- Trimmomatic
Correct Answer: FastQC
Q13. Which concept describes the number of times, on average, a nucleotide is sequenced and is critical for sensitivity in detecting low-frequency variants?
- Coverage depth
- Read length
- GC bias
- Base composition
Correct Answer: Coverage depth
Q14. In the context of variant interpretation, what does “phasing” refer to?
- Assigning each variant to a specific chromosome copy (haplotype) to determine if variants are in cis or trans
- Converting variant coordinates between genome builds
- Applying base quality recalibration
- Annotating variants with clinical significance
Correct Answer: Assigning each variant to a specific chromosome copy (haplotype) to determine if variants are in cis or trans
Q15. Which error mode is particularly problematic for sequencing platforms that use signal intensity across homopolymers, causing indel errors?
- Substitution errors in GC-rich regions
- Homopolymer-associated indel errors
- Base-calling phasing errors only in Illumina
- Adapter ligation errors
Correct Answer: Homopolymer-associated indel errors
Q16. When aligning reads, what does the CIGAR string encode in a SAM/BAM record?
- Base quality scores for each read
- Mapping quality score only
- A compact description of alignment operations (matches, insertions, deletions, clipping)
- Read group and sample metadata
Correct Answer: A compact description of alignment operations (matches, insertions, deletions, clipping)
Q17. Which practice improves variant calling accuracy by recalibrating base quality scores using known variant sites?
- Local realignment
- Base Quality Score Recalibration (BQSR)
- Adapter trimming
- De novo assembly
Correct Answer: Base Quality Score Recalibration (BQSR)
Q18. For targeted sequencing panels, which metric describes the proportion of target bases covered by at least one read or by a specified depth?
- On-target rate
- Coverage breadth
- Read length distribution
- GC content bias
Correct Answer: Coverage breadth
Q19. Which variant calling challenge is most effectively addressed by using unique molecular identifiers (UMIs) during library preparation?
- Improving read length
- Distinguishing true low-frequency variants from PCR and sequencing errors by tracking original molecules
- Eliminating the need for alignment
- Increasing GC content uniformity
Correct Answer: Distinguishing true low-frequency variants from PCR and sequencing errors by tracking original molecules
Q20. Which coordinate system is used by VCF files for variant positions and differs from the 0-based coordinate system used in some alignment formats?
- Half-open 0-based
- 1-based inclusive
- 0-based exclusive
- Coordinate-free
Correct Answer: 1-based inclusive

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

