Sequence data collection and storage MCQs With Answer

Introduction: This quiz collection focuses on sequence data collection and storage — a crucial area in bioinformatics and computational biotechnology for M.Pharm students. It covers how nucleotide and protein sequences are generated, quality-controlled, formatted, annotated, archived and shared. Questions address sequencing platforms (short- and long-read), common file formats (FASTA, FASTQ, SAM/BAM/CRAM), metadata standards, public repositories, and practical concerns like compression, checksums, data provenance, privacy and laboratory information management systems. Designed to reinforce classroom learning and practical skills, these MCQs emphasize real-world considerations for managing sequence data in pharmaceutical research and regulatory settings.

Q1. Which file format is primarily used to store raw sequencing reads together with their per-base quality scores?

FASTA
GFF
FASTQ
BAM

Correct Answer: FASTQ

Q2. What does the quality score in a FASTQ file (Phred score) represent?

The position of the read in the sequencing run
The probability that a base call is incorrect
The GC content of the read
The length of homopolymer runs

Correct Answer: The probability that a base call is incorrect

Q3. Which public repository is primarily used to deposit raw high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration?

Protein Data Bank (PDB)
Sequence Read Archive (SRA)
UniProt
RefSeq

Correct Answer: Sequence Read Archive (SRA)

Q4. Which of the following formats stores sequence alignments and can be indexed for random access?

FASTQ
SAM/BAM
TXT
FASTA

Correct Answer: SAM/BAM

Q5. CRAM format is often preferred over BAM because it:

Removes all read names
Is a human-readable plain text format
Provides more efficient compression by referring to a reference genome
Only stores consensus sequences

Correct Answer: Provides more efficient compression by referring to a reference genome

Q6. When submitting sequence data to public databases, which piece of information is considered essential metadata?

Sequencer operator’s home address
Library construction method and sample source
Preferred file compression algorithm
Name of the laboratory instrument vendor only

Correct Answer: Library construction method and sample source

Q7. Which checksum algorithm is commonly used to verify integrity of downloaded sequence files (e.g., from SRA or GenBank)?

ROT13
SHA-512
MD5
Base64

Correct Answer: MD5

Q8. Adapter contamination in sequencing reads is best removed by which preprocessing step?

Indexing
Trimming
Annotation
Assembly

Correct Answer: Trimming

Q9. Which standard or guideline is commonly used to describe sequence metadata to improve reproducibility and data reuse?

MIxS (Minimum Information about any (x) Sequence)
HTML5
ISO-9001
SMTP

Correct Answer: MIxS (Minimum Information about any (x) Sequence)

Q10. Paired-end sequencing differs from single-end sequencing primarily because paired-end reads:

Are always longer than single-end reads
Consist of two reads from opposite ends of the same DNA fragment
Contain quality scores while single-end does not
Do not require alignment

Correct Answer: Consist of two reads from opposite ends of the same DNA fragment

Q11. Which of the following describes the primary difference between raw and processed sequence data?

Raw data has been aligned; processed data is unaligned
Raw data is instrument output without significant transformation; processed data has undergone QC, trimming, alignment or assembly
Processed data is always larger in file size than raw data
Raw data cannot be stored in public repositories

Correct Answer: Raw data is instrument output without significant transformation; processed data has undergone QC, trimming, alignment or assembly

Q12. Which laboratory information system feature is most important for provenance tracking of sequence datasets in a pharmaceutical lab?

Automated invoicing module
Versioned sample and workflow audit trails
Graphical color themes
Email notification frequency settings

Correct Answer: Versioned sample and workflow audit trails

Q13. Which compression tool is commonly applied to FASTQ files to reduce storage while maintaining compatibility with many bioinformatics tools?

gzip
tar
zip (with proprietary extensions)
7zip exclusive format

Correct Answer: gzip

Q14. Ethical considerations when sharing human sequencing data often require which additional protection?

Publishing raw reads with full patient identifiers
De-identification and controlled-access repository deposit
Removal of quality scores only
Conversion of FASTQ to plain text CSV

Correct Answer: De-identification and controlled-access repository deposit

Q15. Which accession identifier prefix is commonly associated with GenBank nucleotide sequence records?

PDB
SAM
NC_ or accession strings like MN123456
UNI

Correct Answer: NC_ or accession strings like MN123456

Q16. Indexing a BAM file (creating a .bai) is important because it:

Makes the file human-readable
Allows efficient retrieval of alignments from specific genomic regions
Encrypts the data for security
Converts it to FASTQ

Correct Answer: Allows efficient retrieval of alignments from specific genomic regions

Q17. Which of the following best describes “data provenance” in the context of sequence data management?

A log of software UI color changes
Record of the origin, processing steps, parameters and versions used to generate the data
A list of publications citing the dataset only
Random metadata unrelated to the sequencing experiment

Correct Answer: Record of the origin, processing steps, parameters and versions used to generate the data

Q18. Which ontology or controlled vocabulary would help standardize sample attributes like organism, tissue, and disease state?

Gene Ontology (GO)
Medical Subject Headings (MeSH) and ontologies like EFO or Uberon
JPEG
SMTP

Correct Answer: Medical Subject Headings (MeSH) and ontologies like EFO or Uberon

Q19. Which practice reduces the chance of accidental loss when storing large sequencing datasets?

Keeping a single copy on the local instrument only
Implementing automated off-site backups and checksums
Uploading to social media platforms
Renaming files daily without tracking

Correct Answer: Implementing automated off-site backups and checksums

Q20. Which factor is most important when choosing cloud storage for long-term archiving of sequence data in a regulated pharmaceutical environment?

Lowest possible latency for streaming videos
Compliance with regulatory standards (e.g., HIPAA/GxP), encryption, and auditability
Availability of free emoticons
Support for legacy proprietary office formats only

Correct Answer: Compliance with regulatory standards (e.g., HIPAA/GxP), encryption, and auditability

Download

G S Sachin

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.

Mail- Sachin@pharmacyfreak.com

Leave a Comment Cancel reply