Artificial Intelligence in pharma MCQs With Answer
This quiz collection is tailored for M.Pharm students studying Computer Aided Drug Development (MPH 203T). It focuses on how artificial intelligence (AI) techniques are applied across drug discovery and development — from virtual screening and QSAR modeling to ADMET prediction, de novo molecule generation, and synthesis planning. Questions emphasize practical understanding of algorithms (machine learning, deep learning, graph neural networks), data representation (SMILES, molecular graphs, fingerprints), model evaluation, and regulatory or ethical challenges. Each multiple-choice item tests both conceptual depth and application-level reasoning to prepare students for advanced coursework, research, and industry roles where AI accelerates pharmaceutical innovation.
Q1. What is the primary advantage of graph neural networks (GNNs) over traditional fingerprint-based methods in molecular property prediction?
- GNNs require far less training data than fingerprint methods
- GNNs can learn task-specific representations directly from molecular graph structure
- Fingerprint methods inherently capture 3D conformations better than GNNs
- GNNs always produce interpretable mechanistic rules for activity
Correct Answer: GNNs can learn task-specific representations directly from molecular graph structure
Q2. In de novo molecular design, which generative model is best known for producing continuous latent representations that can be sampled to generate novel compounds?
- Random forest
- Variational autoencoder (VAE)
- Support vector machine (SVM)
- K-means clustering
Correct Answer: Variational autoencoder (VAE)
Q3. Which molecular representation encodes molecules as linear text strings commonly used with sequence-based deep learning models?
- 3D atomic coordinates (PDB)
- Simplified Molecular Input Line Entry System (SMILES)
- Molecular orbital coefficients
- Extended-connectivity fingerprints (ECFP)
Correct Answer: Simplified Molecular Input Line Entry System (SMILES)
Q4. What metric is most appropriate for evaluating a binary classifier in an imbalanced dataset where positive class is rare, such as predicting toxic compounds?
- Accuracy
- ROC AUC (Area Under the Receiver Operating Characteristic curve)
- Mean squared error (MSE)
- Silhouette score
Correct Answer: ROC AUC (Area Under the Receiver Operating Characteristic curve)
Q5. Which AI approach is particularly useful for optimizing multi-parameter objectives (e.g., potency, selectivity, ADMET) in compound design?
- Single-target QSAR
- Multi-objective reinforcement learning
- Principal component analysis (PCA)
- Hierarchical clustering
Correct Answer: Multi-objective reinforcement learning
Q6. In virtual screening, what is a key limitation of docking scores that AI-based rescoring methods aim to address?
- Docking always provides exact binding free energies
- Docking scores often poorly correlate with experimental affinities due to simplified scoring functions
- Docking cannot generate ligand conformations
- Docking is immune to protein flexibility
Correct Answer: Docking scores often poorly correlate with experimental affinities due to simplified scoring functions
Q7. Which technique helps reduce overfitting in deep learning models trained on limited molecular datasets?
- Increasing model depth without regularization
- Data augmentation, transfer learning, and dropout
- Using unnormalized noisy labels
- Removing validation sets
Correct Answer: Data augmentation, transfer learning, and dropout
Q8. What does the term ‘transfer learning’ mean in the context of AI for drug discovery?
- Training a model from scratch for every new bioassay
- Using model weights pre-trained on a large dataset and fine-tuning on a smaller task-specific dataset
- Transferring compounds between companies
- Converting 3D structures to SMILES strings
Correct Answer: Using model weights pre-trained on a large dataset and fine-tuning on a smaller task-specific dataset
Q9. Which explainability method provides feature importance values for model predictions and is commonly applied to chemoinformatics models?
- PCA decomposition
- SHAP (Shapley Additive Explanations)
- t-SNE visualization
- Ensemble averaging
Correct Answer: SHAP (Shapley Additive Explanations)
Q10. In ADMET prediction, which endpoint is typically predicted using regression models rather than classification?
- Acute toxicity (binary toxic/non-toxic)
- Blood-brain barrier permeability as a continuous logBB value
- Presence or absence of a specific metabolite
- Compound patentability
Correct Answer: Blood-brain barrier permeability as a continuous logBB value
Q11. What is the main advantage of federated learning for pharmaceutical companies collaborating on model training?
- It eliminates the need for model validation
- It allows shared model training without exchanging raw proprietary data
- It guarantees identical data distributions across partners
- It reduces model complexity to linear models only
Correct Answer: It allows shared model training without exchanging raw proprietary data
Q12. Which of the following best describes ‘active learning’ in the context of experimental planning for drug discovery?
- Randomly selecting compounds to test in the lab
- Selecting the most informative compounds for experimental testing based on model uncertainty
- Using only historical data and avoiding new experiments
- Clustering compounds and testing only cluster centroids
Correct Answer: Selecting the most informative compounds for experimental testing based on model uncertainty
Q13. When evaluating regression models for pIC50 prediction, which metric reports the average magnitude of prediction errors in the original units?
- Area under precision-recall curve
- Root mean squared error (RMSE)
- Cohen’s kappa
- Adjusted Rand index
Correct Answer: Root mean squared error (RMSE)
Q14. Which AI method is commonly used for retrosynthetic route planning and predicting synthesis steps?
- Convolutional neural networks on images only
- Sequence-to-sequence models or reinforcement learning for reaction prediction and planning
- Unsupervised clustering of spectra
- Principal component analysis of reagents
Correct Answer: Sequence-to-sequence models or reinforcement learning for reaction prediction and planning
Q15. In QSAR modeling, what is the danger of using highly correlated descriptors without feature selection?
- Improved model generalizability
- Multicollinearity leading to unstable coefficient estimates and overfitting
- Guaranteed better interpretation of mechanistic causality
- Reduced computational cost
Correct Answer: Multicollinearity leading to unstable coefficient estimates and overfitting
Q16. Which statement about generative adversarial networks (GANs) in molecule generation is correct?
- GANs provide an explicit likelihood and easy posterior inference for molecules
- GANs consist of a generator and discriminator trained adversarially to produce realistic samples
- GANs always outperform VAEs on molecular novelty and validity metrics
- GANs do not require any hyperparameter tuning
Correct Answer: GANs consist of a generator and discriminator trained adversarially to produce realistic samples
Q17. Which preclinical safety endpoint is often predicted using in silico models incorporating structural alerts and machine learning?
- Market exclusivity duration
- hERG channel inhibition risk (cardiotoxicity)
- Exact human therapeutic dose
- Packaging requirements
Correct Answer: hERG channel inhibition risk (cardiotoxicity)
Q18. What is a key regulatory consideration when deploying AI models in drug development workflows?
- Models must always be black-box to protect intellectual property
- Documentation of training data provenance, performance, and validation is required for transparency and reproducibility
- Regulators do not require evidence for in silico predictions ever
- Only the most complex models are acceptable to regulators
Correct Answer: Documentation of training data provenance, performance, and validation is required for transparency and reproducibility
Q19. Which data curation step is most critical before training AI models on assay data?
- Ignoring duplicate records to speed up training
- Standardizing chemical structures, removing duplicates, and reconciling inconsistent activity units
- Randomly shuffling SMILES strings without validation
- Converting all activities to categorical labels without retaining units
Correct Answer: Standardizing chemical structures, removing duplicates, and reconciling inconsistent activity units
Q20. For discovery of novel scaffolds, which strategy leverages AI to explore chemical space efficiently?
- Exhaustive high-throughput screening of all purchasable compounds only
- Generative models combined with property filters and active learning to propose and prioritize novel scaffolds
- Relying solely on medicinal chemists to enumerate all possibilities manually
- Using only 2D similarity searches against a single known active
Correct Answer: Generative models combined with property filters and active learning to propose and prioritize novel scaffolds

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

