Introduction: This quiz collection focuses on sequence data collection and storage — a crucial area in bioinformatics and computational biotechnology for M.Pharm students. It covers how nucleotide and protein sequences are generated, quality-controlled, formatted, annotated, archived and shared. Questions address sequencing platforms (short- and long-read), common file formats (FASTA, FASTQ, SAM/BAM/CRAM), metadata standards, public repositories, and practical concerns like compression, checksums, data provenance, privacy and laboratory information management systems. Designed to reinforce classroom learning and practical skills, these MCQs emphasize real-world considerations for managing sequence data in pharmaceutical research and regulatory settings.
Q1. Which file format is primarily used to store raw sequencing reads together with their per-base quality scores?
- FASTA
- GFF
- FASTQ
- BAM
Correct Answer: FASTQ
Q2. What does the quality score in a FASTQ file (Phred score) represent?
- The position of the read in the sequencing run
- The probability that a base call is incorrect
- The GC content of the read
- The length of homopolymer runs
Correct Answer: The probability that a base call is incorrect
Q3. Which public repository is primarily used to deposit raw high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration?
- Protein Data Bank (PDB)
- Sequence Read Archive (SRA)
- UniProt
- RefSeq
Correct Answer: Sequence Read Archive (SRA)
Q4. Which of the following formats stores sequence alignments and can be indexed for random access?
- FASTQ
- SAM/BAM
- TXT
- FASTA
Correct Answer: SAM/BAM
Q5. CRAM format is often preferred over BAM because it:
- Removes all read names
- Is a human-readable plain text format
- Provides more efficient compression by referring to a reference genome
- Only stores consensus sequences
Correct Answer: Provides more efficient compression by referring to a reference genome
Q6. When submitting sequence data to public databases, which piece of information is considered essential metadata?
- Sequencer operator’s home address
- Library construction method and sample source
- Preferred file compression algorithm
- Name of the laboratory instrument vendor only
Correct Answer: Library construction method and sample source
Q7. Which checksum algorithm is commonly used to verify integrity of downloaded sequence files (e.g., from SRA or GenBank)?
- ROT13
- SHA-512
- MD5
- Base64
Correct Answer: MD5
Q8. Adapter contamination in sequencing reads is best removed by which preprocessing step?
- Indexing
- Trimming
- Annotation
- Assembly
Correct Answer: Trimming
Q9. Which standard or guideline is commonly used to describe sequence metadata to improve reproducibility and data reuse?
- MIxS (Minimum Information about any (x) Sequence)
- HTML5
- ISO-9001
- SMTP
Correct Answer: MIxS (Minimum Information about any (x) Sequence)
Q10. Paired-end sequencing differs from single-end sequencing primarily because paired-end reads:
- Are always longer than single-end reads
- Consist of two reads from opposite ends of the same DNA fragment
- Contain quality scores while single-end does not
- Do not require alignment
Correct Answer: Consist of two reads from opposite ends of the same DNA fragment
Q11. Which of the following describes the primary difference between raw and processed sequence data?
- Raw data has been aligned; processed data is unaligned
- Raw data is instrument output without significant transformation; processed data has undergone QC, trimming, alignment or assembly
- Processed data is always larger in file size than raw data
- Raw data cannot be stored in public repositories
Correct Answer: Raw data is instrument output without significant transformation; processed data has undergone QC, trimming, alignment or assembly
Q12. Which laboratory information system feature is most important for provenance tracking of sequence datasets in a pharmaceutical lab?
- Automated invoicing module
- Versioned sample and workflow audit trails
- Graphical color themes
- Email notification frequency settings
Correct Answer: Versioned sample and workflow audit trails
Q13. Which compression tool is commonly applied to FASTQ files to reduce storage while maintaining compatibility with many bioinformatics tools?
- gzip
- tar
- zip (with proprietary extensions)
- 7zip exclusive format
Correct Answer: gzip
Q14. Ethical considerations when sharing human sequencing data often require which additional protection?
- Publishing raw reads with full patient identifiers
- De-identification and controlled-access repository deposit
- Removal of quality scores only
- Conversion of FASTQ to plain text CSV
Correct Answer: De-identification and controlled-access repository deposit
Q15. Which accession identifier prefix is commonly associated with GenBank nucleotide sequence records?
- PDB
- SAM
- NC_ or accession strings like MN123456
- UNI
Correct Answer: NC_ or accession strings like MN123456
Q16. Indexing a BAM file (creating a .bai) is important because it:
- Makes the file human-readable
- Allows efficient retrieval of alignments from specific genomic regions
- Encrypts the data for security
- Converts it to FASTQ
Correct Answer: Allows efficient retrieval of alignments from specific genomic regions
Q17. Which of the following best describes “data provenance” in the context of sequence data management?
- A log of software UI color changes
- Record of the origin, processing steps, parameters and versions used to generate the data
- A list of publications citing the dataset only
- Random metadata unrelated to the sequencing experiment
Correct Answer: Record of the origin, processing steps, parameters and versions used to generate the data
Q18. Which ontology or controlled vocabulary would help standardize sample attributes like organism, tissue, and disease state?
- Gene Ontology (GO)
- Medical Subject Headings (MeSH) and ontologies like EFO or Uberon
- JPEG
- SMTP
Correct Answer: Medical Subject Headings (MeSH) and ontologies like EFO or Uberon
Q19. Which practice reduces the chance of accidental loss when storing large sequencing datasets?
- Keeping a single copy on the local instrument only
- Implementing automated off-site backups and checksums
- Uploading to social media platforms
- Renaming files daily without tracking
Correct Answer: Implementing automated off-site backups and checksums
Q20. Which factor is most important when choosing cloud storage for long-term archiving of sequence data in a regulated pharmaceutical environment?
- Lowest possible latency for streaming videos
- Compliance with regulatory standards (e.g., HIPAA/GxP), encryption, and auditability
- Availability of free emoticons
- Support for legacy proprietary office formats only
Correct Answer: Compliance with regulatory standards (e.g., HIPAA/GxP), encryption, and auditability

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

