Protein sequence databases MCQs With Answer

Introduction: Protein sequence databases MCQs With Answer is designed for M.Pharm students to build strong conceptual and practical understanding of protein sequence repositories used in drug discovery and biopharmaceutical research. This set of carefully curated multiple-choice questions covers primary and secondary sequence databases, annotation standards, accession systems, redundancy reduction methods, and practical considerations for querying and interpreting sequence data. Emphasis is placed on databases like UniProt, NCBI RefSeq, PDB, and specialized resources, along with formats (FASTA), evidence codes, and cross-references that affect downstream analyses such as target validation, homology modeling, and pharmacogenomics. The questions include explanations of common pitfalls and current best practices.

Q1. What is the primary difference between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?

Swiss-Prot contains only protein structures while TrEMBL contains only sequences
Swiss-Prot entries are manually reviewed and curated, TrEMBL entries are computationally annotated
Swiss-Prot is a nucleotide database and TrEMBL is a protein database
TrEMBL provides high-quality functional annotation whereas Swiss-Prot contains raw sequencing reads

Correct Answer: Swiss-Prot entries are manually reviewed and curated, TrEMBL entries are computationally annotated

Q2. Which database provides a non-redundant, curated set of reference sequences for genomes, transcripts, and proteins maintained by NCBI?

UniProtKB
RefSeq
GenBank
PDB

Correct Answer: RefSeq

Q3. What is an accession number in the context of sequence databases?

A sequence quality score assigned during BLAST searches
A unique, stable identifier assigned to a database entry for tracking and citation
A database-specific encryption key for secure data access
An automatic annotation label indicating experimental evidence

Correct Answer: A unique, stable identifier assigned to a database entry for tracking and citation

Q4. Which format is most commonly used to exchange and submit protein sequences to public databases?

FASTQ
GFF
FASTA
PDBML

Correct Answer: FASTA

Q5. What is UniRef and why is it useful in large-scale sequence analyses?

A structural database; useful for 3D visualization of proteins
A clustering of UniProt sequences at various identity thresholds to reduce redundancy for faster similarity searches
An expression database for protein abundance measurements in tissues
A repository of raw sequencing reads for metagenomics

Correct Answer: A clustering of UniProt sequences at various identity thresholds to reduce redundancy for faster similarity searches

Q6. Which of the following best describes “evidence codes” in UniProt annotations?

Numerical scores indicating sequence quality
Labels indicating the type of supporting evidence for an annotation, such as experimental or computational
Encryption tags for secure data transfer
Taxonomic identifiers for species of origin

Correct Answer: Labels indicating the type of supporting evidence for an annotation, such as experimental or computational

Q7. In the context of protein databases, what does “cross-reference” mean?

The checksum used to verify sequence integrity
A link or pointer from one database entry to related entries in other resources (e.g., PDB, GO, RefSeq)
A way to compare two sequences using global alignment
The process of annotating sequence features manually

Correct Answer: A link or pointer from one database entry to related entries in other resources (e.g., PDB, GO, RefSeq)

Q8. Which database would you consult primarily for experimentally-determined three-dimensional protein structures?

UniProtKB
PDB (Protein Data Bank)
RefSeq
KEGG

Correct Answer: PDB (Protein Data Bank)

Q9. What is the main purpose of UniParc?

A curated reference of enzymatic reactions
A non-redundant archive that stores unique protein sequences from many databases to track history and source cross-references
A genome assembly repository for microorganisms
A tool for multiple sequence alignment visualization

Correct Answer: A non-redundant archive that stores unique protein sequences from many databases to track history and source cross-references

Q10. Which identifier system used by NCBI was deprecated and replaced by accession.version to provide stable tracking?

GI numbers
UniProt IDs
DOIs
EC numbers

Correct Answer: GI numbers

Q11. Which resource is most appropriate for obtaining functional annotations mapped to controlled vocabularies like molecular function and biological process?

Gene Ontology (GO) annotations linked via UniProt
Raw FASTQ files in SRA
Electron density maps in EMDB
Taxonomy records in NCBI Taxonomy without functional terms

Correct Answer: Gene Ontology (GO) annotations linked via UniProt

Q12. When submitting a protein sequence to a public database, which metadata elements are most critical for useful annotation in pharmacological research?

Experimental method, organism/taxonomy, tissue/source, functional evidence, and publication references
Only the sequence length and molecular weight
The submitter’s email and laboratory address only
Raw chromatogram files without contextual information

Correct Answer: Experimental method, organism/taxonomy, tissue/source, functional evidence, and publication references

Q13. What does “RefSeqNM_” prefix typically indicate in NCBI RefSeq records?

A non-coding RNA record
A curated mRNA (nucleotide) reference sequence
A mitochondrial genome sequence
An enzymatic activity annotation

Correct Answer: A curated mRNA (nucleotide) reference sequence

Q14. Which of the following statements about sequence versioning is correct?

Accession numbers change every time the entry is viewed
Version suffixes (e.g., .1, .2) indicate updates to the sequence, allowing citation of the exact sequence version used
Versioning is only used in structural databases, not sequence databases
Version numbers reflect the number of publications citing the entry

Correct Answer: Version suffixes (e.g., .1, .2) indicate updates to the sequence, allowing citation of the exact sequence version used

Q15. For homology searches to identify potential off-targets of a drug target protein, which database and clustering level would you typically choose for a balance of sensitivity and speed?

UniRef100 for fastest searches with maximal redundancy
UniRef50 for maximum sequence diversity but slower searches
UniRef90 for a balance between redundancy reduction and retained sensitivity
UniParc because it stores redundant raw sequences for exhaustive matches

Correct Answer: UniRef90 for a balance between redundancy reduction and retained sensitivity

Q16. What is the role of curated annotation in M.Pharm research when using protein sequence databases?

Curated annotation is irrelevant; only raw sequences are needed
Curated annotation improves the reliability of functional inference, target validation, and pathway mapping relevant to pharmacology
Curated annotation only provides aesthetic labels and has no scientific value
It slows down computational analysis and should be avoided

Correct Answer: Curated annotation improves the reliability of functional inference, target validation, and pathway mapping relevant to pharmacology

Q17. Which of the following best explains “primary” versus “secondary” sequence databases?

Primary databases store experimentally-determined sequences submitted directly by researchers; secondary databases provide value-added annotations, curation, and cross-links
Primary databases are private while secondary databases are public
Primary databases contain only nucleotide data, secondary contain only protein data
There is no difference; both terms mean the same thing

Correct Answer: Primary databases store experimentally-determined sequences submitted directly by researchers; secondary databases provide value-added annotations, curation, and cross-links

Q18. Why are taxonomic identifiers (taxIDs) important in protein sequence databases for drug research?

They are used to generate sequence quality scores
TaxIDs link sequences to specific organisms, which is critical for host-pathogen studies, species-specific pharmacology, and avoiding cross-species annotation errors
They encrypt the sequence to prevent misuse
TaxIDs determine the color codes used in sequence visualizers

Correct Answer: TaxIDs link sequences to specific organisms, which is critical for host-pathogen studies, species-specific pharmacology, and avoiding cross-species annotation errors

Q19. Which database or tool integrates sequence data with metabolic and signaling pathway information useful for drug target contextualization?

KEGG (Kyoto Encyclopedia of Genes and Genomes)
EMDB (Electron Microscopy Data Bank)
UniParc
SRA (Sequence Read Archive)

Correct Answer: KEGG (Kyoto Encyclopedia of Genes and Genomes)

Q20. In the context of sequence database searches, what is the primary advantage of using an annotated reference like RefSeq or Swiss-Prot over raw GenBank entries?

Annotated references typically include curated features, consistent identifiers, and reduced redundancy, improving accuracy of functional inferences and reproducibility in pharmacological studies
GenBank entries are always experimentally validated whereas RefSeq is not
Annotated references do not support BLAST searches
There is no practical difference; both provide identical annotations

Correct Answer: Annotated references typically include curated features, consistent identifiers, and reduced redundancy, improving accuracy of functional inferences and reproducibility in pharmacological studies

Download

G S Sachin

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.

Mail- Sachin@pharmacyfreak.com

Leave a Comment Cancel reply