Chemical, biochemical and pharmaceutical databases are essential tools for B.Pharm students studying drug discovery, medicinal chemistry, pharmacology and regulatory affairs. This introduction covers core keywords like PubChem, ChEMBL, DrugBank, PDB, UniProt, cheminformatics, bioinformatics, SMILES, InChIKey, fingerprints, ADMET and database curation. Understanding database search strategies, structure and substructure search, bioassay data, pharmacokinetic properties and data formats (SDF, CSV) helps in literature mining, lead identification and formulation research. These MCQs with answers will strengthen your practical skills in querying chemical and biochemical repositories, interpreting database records, and applying database-derived information to pharmaceutical research. Now let’s test your knowledge with 30 MCQs on this topic.
Q1. Which public database primarily provides chemical structures, biological activities and bioassay results aggregated from multiple sources?
- DrugBank
- PubChem
- UniProt
- Protein Data Bank
Correct Answer: PubChem
Q2. What identifier is a compact, hashed representation commonly used to index chemical structures in search engines?
- SMILES
- InChI
- InChIKey
- CAS Registry Number
Correct Answer: InChIKey
Q3. Which database specializes in drug-centred data including drug approvals, targets, mechanisms and chemical structure focused for pharmaceutical research?
- ChEMBL
- PubChem
- DrugBank
- KEGG
Correct Answer: DrugBank
Q4. In cheminformatics, which fingerprint type is widely used for similarity searching and machine learning models?
- ECFP (circular fingerprints)
- RMSD fingerprint
- NMR fingerprint
- FASTA fingerprint
Correct Answer: ECFP (circular fingerprints)
Q5. Which file format is commonly used to store one or more 2D/3D chemical structures and associated properties for database exchange?
- FASTA
- SDF (Structure-Data File)
- PDBML
- CSV only
Correct Answer: SDF (Structure-Data File)
Q6. Which database contains experimentally determined 3D structures of proteins, nucleic acids and complexes useful for structure-based drug design?
- UniProt
- PDB (Protein Data Bank)
- ChEMBL
- PubChem BioAssay
Correct Answer: PDB (Protein Data Bank)
Q7. What does ADMET stand for in pharmaceutical databases and drug development contexts?
- Absorption, Distribution, Metabolism, Excretion, Toxicity
- Active Drug Metabolism and Efficacy Testing
- Analytical Data, Methods, Equipment, Testing
- Accessibility, Delivery, Manufacturing, Evaluation, Transfer
Correct Answer: Absorption, Distribution, Metabolism, Excretion, Toxicity
Q8. Which database is most appropriate for retrieving curated bioactivity data (IC50, Ki) linked to targets and assay descriptions for QSAR studies?
- ChEMBL
- PubMed
- GenBank
- ClinicalTrials.gov
Correct Answer: ChEMBL
Q9. What is the principal difference between SMILES and canonical SMILES?
- SMILES includes 3D coordinates, canonical SMILES does not
- Canonical SMILES is a unique SMILES representation for a structure
- SMILES is used only for proteins, canonical SMILES for small molecules
- There is no difference; both are identical by definition
Correct Answer: Canonical SMILES is a unique SMILES representation for a structure
Q10. Which identifier uniquely identifies a protein entry in UniProt database?
- PDB ID
- UniProt accession number
- PubChem CID
- CAS RN
Correct Answer: UniProt accession number
Q11. For substructure searching, which search type will find molecules that contain a given fragment exactly as part of their structure?
- Exact match search
- Substructure search
- Similarity search
- Text search
Correct Answer: Substructure search
Q12. Which public resource links genes, proteins and small molecules into biochemical pathways useful for pharmacology studies?
- KEGG
- DrugBank only
- CAS
- PubChem Compound Summary
Correct Answer: KEGG
Q13. Which metric is commonly used to quantify chemical similarity between two fingerprint vectors?
- Euclidean distance
- Tanimoto coefficient
- Pearson correlation
- Root mean square deviation (RMSD)
Correct Answer: Tanimoto coefficient
Q14. When retrieving compound records, which field often provides the authoritative registry number used in publications and regulatory documents?
- PubChem CID
- InChIKey
- CAS Registry Number
- SMILES string
Correct Answer: CAS Registry Number
Q15. Which database would you use to find clinical trial information, status and outcomes for investigational drugs?
- ClinicalTrials.gov
- ChEMBL
- PubChem
- Protein Data Bank
Correct Answer: ClinicalTrials.gov
Q16. What is a major curation-related challenge in chemical databases that can affect data quality?
- Too many 3D structures
- Stereochemistry and tautomer standardization
- Excessive use of fingerprints
- Absence of SMILES strings
Correct Answer: Stereochemistry and tautomer standardization
Q17. Which database is primarily designed for enzyme information including kinetics, substrates and inhibitors?
- BRENDA
- DrugBank
- PubChem Compound
- UniProt
Correct Answer: BRENDA
Q18. What does a PubChem CID refer to?
- Clinical identification code for trials
- Compound Identifier assigned by PubChem
- Canonical InChI string
- A type of chemical fingerprint
Correct Answer: Compound Identifier assigned by PubChem
Q19. Which database would you consult to investigate drug–target interactions and mechanism of action with literature links?
- DrugBank
- FASTA
- EMBL
- PDB only
Correct Answer: DrugBank
Q20. In chemical registries, why are canonical identifiers (InChI/InChIKey) preferred for data integration?
- They encode bioactivity data directly
- They provide a standardized, machine-readable representation for structure matching
- They are human-readable chemical names
- They replace the need for any other metadata
Correct Answer: They provide a standardized, machine-readable representation for structure matching
Q21. Which database houses spectral reference data (NMR, MS) useful for compound identification?
- NMRShiftDB / MassBank
- ClinicalTrials.gov
- UniProt
- KEGG Pathway
Correct Answer: NMRShiftDB / MassBank
Q22. Which search approach finds molecules with similar properties rather than exact substructures?
- Exact structure search
- Similarity search using fingerprints
- Text-based search only
- Elemental composition search
Correct Answer: Similarity search using fingerprints
Q23. What is the role of cross-references in major pharmaceutical databases?
- They restrict access to paid users
- They link a record to related entries across databases for integrated information
- They convert SMILES to InChI automatically
- They duplicate entries within the same database
Correct Answer: They link a record to related entries across databases for integrated information
Q24. Which database is the primary public repository for DNA and RNA sequence data useful for pharmacogenomics?
- GenBank
- PubChem Compound
- DrugBank
- PDB
Correct Answer: GenBank
Q25. When performing virtual screening, which database feature reduces false positives by filtering compounds by molecular weight, logP, and hydrogen bond counts?
- Visualization tools
- Property filters or ADME filters
- Text-mining engines
- Sequence alignment
Correct Answer: Property filters or ADME filters
Q26. Which chemical descriptor encodes 2D connectivity as a linear text string used for quick structure searches?
- InChIKey
- SMILES
- PDB code
- UniProt ID
Correct Answer: SMILES
Q27. Which repository would you use to deposit and retrieve macromolecular structural models and related experimental data?
- PubChem Substance
- Protein Data Bank (PDB)
- ChEMBL compound library
- ClinicalTrials.gov
Correct Answer: Protein Data Bank (PDB)
Q28. What is the significance of assay metadata (assay type, conditions, endpoint) in bioactivity databases?
- It is only needed for visualization
- It determines the biological relevance and comparability of activity data
- It is irrelevant for QSAR modelling
- It increases file size without benefit
Correct Answer: It determines the biological relevance and comparability of activity data
Q29. Which database is a good starting point to find marketed drug formulations, patents and regulatory status for pharmacists?
- DrugBank and FDA Orange Book
- UniProt and PDB
- ChEMBL only
- MassBank only
Correct Answer: DrugBank and FDA Orange Book
Q30. For integrating chemical and biological data into cheminformatics workflows, which practice improves reproducibility and interoperability?
- Using proprietary, undocumented file formats
- Annotating records with standardized identifiers (InChIKey, UniProt ID, PubChem CID)
- Removing all metadata to reduce complexity
- Mixing multiple naming conventions without cross-reference
Correct Answer: Annotating records with standardized identifiers (InChIKey, UniProt ID, PubChem CID)

