Introduction: Protein sequence databases MCQs With Answer is designed for M.Pharm students to build strong conceptual and practical understanding of protein sequence repositories used in drug discovery and biopharmaceutical research. This set of carefully curated multiple-choice questions covers primary and secondary sequence databases, annotation standards, accession systems, redundancy reduction methods, and practical considerations for querying and interpreting sequence data. Emphasis is placed on databases like UniProt, NCBI RefSeq, PDB, and specialized resources, along with formats (FASTA), evidence codes, and cross-references that affect downstream analyses such as target validation, homology modeling, and pharmacogenomics. The questions include explanations of common pitfalls and current best practices.
Q1. What is the primary difference between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?
- Swiss-Prot contains only protein structures while TrEMBL contains only sequences
- Swiss-Prot entries are manually reviewed and curated, TrEMBL entries are computationally annotated
- Swiss-Prot is a nucleotide database and TrEMBL is a protein database
- TrEMBL provides high-quality functional annotation whereas Swiss-Prot contains raw sequencing reads
Correct Answer: Swiss-Prot entries are manually reviewed and curated, TrEMBL entries are computationally annotated
Q2. Which database provides a non-redundant, curated set of reference sequences for genomes, transcripts, and proteins maintained by NCBI?
- UniProtKB
- RefSeq
- GenBank
- PDB
Correct Answer: RefSeq
Q3. What is an accession number in the context of sequence databases?
- A sequence quality score assigned during BLAST searches
- A unique, stable identifier assigned to a database entry for tracking and citation
- A database-specific encryption key for secure data access
- An automatic annotation label indicating experimental evidence
Correct Answer: A unique, stable identifier assigned to a database entry for tracking and citation
Q4. Which format is most commonly used to exchange and submit protein sequences to public databases?
- FASTQ
- GFF
- FASTA
- PDBML
Correct Answer: FASTA
Q5. What is UniRef and why is it useful in large-scale sequence analyses?
- A structural database; useful for 3D visualization of proteins
- A clustering of UniProt sequences at various identity thresholds to reduce redundancy for faster similarity searches
- An expression database for protein abundance measurements in tissues
- A repository of raw sequencing reads for metagenomics
Correct Answer: A clustering of UniProt sequences at various identity thresholds to reduce redundancy for faster similarity searches
Q6. Which of the following best describes “evidence codes” in UniProt annotations?
- Numerical scores indicating sequence quality
- Labels indicating the type of supporting evidence for an annotation, such as experimental or computational
- Encryption tags for secure data transfer
- Taxonomic identifiers for species of origin
Correct Answer: Labels indicating the type of supporting evidence for an annotation, such as experimental or computational
Q7. In the context of protein databases, what does “cross-reference” mean?
- The checksum used to verify sequence integrity
- A link or pointer from one database entry to related entries in other resources (e.g., PDB, GO, RefSeq)
- A way to compare two sequences using global alignment
- The process of annotating sequence features manually
Correct Answer: A link or pointer from one database entry to related entries in other resources (e.g., PDB, GO, RefSeq)
Q8. Which database would you consult primarily for experimentally-determined three-dimensional protein structures?
- UniProtKB
- PDB (Protein Data Bank)
- RefSeq
- KEGG
Correct Answer: PDB (Protein Data Bank)
Q9. What is the main purpose of UniParc?
- A curated reference of enzymatic reactions
- A non-redundant archive that stores unique protein sequences from many databases to track history and source cross-references
- A genome assembly repository for microorganisms
- A tool for multiple sequence alignment visualization
Correct Answer: A non-redundant archive that stores unique protein sequences from many databases to track history and source cross-references
Q10. Which identifier system used by NCBI was deprecated and replaced by accession.version to provide stable tracking?
- GI numbers
- UniProt IDs
- DOIs
- EC numbers
Correct Answer: GI numbers
Q11. Which resource is most appropriate for obtaining functional annotations mapped to controlled vocabularies like molecular function and biological process?
- Gene Ontology (GO) annotations linked via UniProt
- Raw FASTQ files in SRA
- Electron density maps in EMDB
- Taxonomy records in NCBI Taxonomy without functional terms
Correct Answer: Gene Ontology (GO) annotations linked via UniProt
Q12. When submitting a protein sequence to a public database, which metadata elements are most critical for useful annotation in pharmacological research?
- Experimental method, organism/taxonomy, tissue/source, functional evidence, and publication references
- Only the sequence length and molecular weight
- The submitter’s email and laboratory address only
- Raw chromatogram files without contextual information
Correct Answer: Experimental method, organism/taxonomy, tissue/source, functional evidence, and publication references
Q13. What does “RefSeqNM_” prefix typically indicate in NCBI RefSeq records?
- A non-coding RNA record
- A curated mRNA (nucleotide) reference sequence
- A mitochondrial genome sequence
- An enzymatic activity annotation
Correct Answer: A curated mRNA (nucleotide) reference sequence
Q14. Which of the following statements about sequence versioning is correct?
- Accession numbers change every time the entry is viewed
- Version suffixes (e.g., .1, .2) indicate updates to the sequence, allowing citation of the exact sequence version used
- Versioning is only used in structural databases, not sequence databases
- Version numbers reflect the number of publications citing the entry
Correct Answer: Version suffixes (e.g., .1, .2) indicate updates to the sequence, allowing citation of the exact sequence version used
Q15. For homology searches to identify potential off-targets of a drug target protein, which database and clustering level would you typically choose for a balance of sensitivity and speed?
- UniRef100 for fastest searches with maximal redundancy
- UniRef50 for maximum sequence diversity but slower searches
- UniRef90 for a balance between redundancy reduction and retained sensitivity
- UniParc because it stores redundant raw sequences for exhaustive matches
Correct Answer: UniRef90 for a balance between redundancy reduction and retained sensitivity
Q16. What is the role of curated annotation in M.Pharm research when using protein sequence databases?
- Curated annotation is irrelevant; only raw sequences are needed
- Curated annotation improves the reliability of functional inference, target validation, and pathway mapping relevant to pharmacology
- Curated annotation only provides aesthetic labels and has no scientific value
- It slows down computational analysis and should be avoided
Correct Answer: Curated annotation improves the reliability of functional inference, target validation, and pathway mapping relevant to pharmacology
Q17. Which of the following best explains “primary” versus “secondary” sequence databases?
- Primary databases store experimentally-determined sequences submitted directly by researchers; secondary databases provide value-added annotations, curation, and cross-links
- Primary databases are private while secondary databases are public
- Primary databases contain only nucleotide data, secondary contain only protein data
- There is no difference; both terms mean the same thing
Correct Answer: Primary databases store experimentally-determined sequences submitted directly by researchers; secondary databases provide value-added annotations, curation, and cross-links
Q18. Why are taxonomic identifiers (taxIDs) important in protein sequence databases for drug research?
- They are used to generate sequence quality scores
- TaxIDs link sequences to specific organisms, which is critical for host-pathogen studies, species-specific pharmacology, and avoiding cross-species annotation errors
- They encrypt the sequence to prevent misuse
- TaxIDs determine the color codes used in sequence visualizers
Correct Answer: TaxIDs link sequences to specific organisms, which is critical for host-pathogen studies, species-specific pharmacology, and avoiding cross-species annotation errors
Q19. Which database or tool integrates sequence data with metabolic and signaling pathway information useful for drug target contextualization?
- KEGG (Kyoto Encyclopedia of Genes and Genomes)
- EMDB (Electron Microscopy Data Bank)
- UniParc
- SRA (Sequence Read Archive)
Correct Answer: KEGG (Kyoto Encyclopedia of Genes and Genomes)
Q20. In the context of sequence database searches, what is the primary advantage of using an annotated reference like RefSeq or Swiss-Prot over raw GenBank entries?
- Annotated references typically include curated features, consistent identifiers, and reduced redundancy, improving accuracy of functional inferences and reproducibility in pharmacological studies
- GenBank entries are always experimentally validated whereas RefSeq is not
- Annotated references do not support BLAST searches
- There is no practical difference; both provide identical annotations
Correct Answer: Annotated references typically include curated features, consistent identifiers, and reduced redundancy, improving accuracy of functional inferences and reproducibility in pharmacological studies

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

