Quantitative Structure-Activity Relationship (QSAR) is a key computational technique in modern drug design that quantitatively links chemical structure with biological activity using molecular descriptors, statistical models, and validation strategies. B. Pharm students should master descriptor types (constitutional, topological, electronic, hydrophobic), regression and machine-learning approaches, model validation (R2, Q2, RMSE, external test), applicability domain, pharmacophore concepts, and ADME/toxicity prediction. Understanding descriptor selection, data preprocessing, overfitting, and ethical model use builds practical skills for research and industry. This focused MCQ set emphasizes theory, calculations, interpretation, and critical evaluation for exam readiness. Now let’s test your knowledge with 30 MCQs on this topic.
Q1. What is the primary purpose of QSAR?
- To determine the crystal structure of a drug
- To relate chemical structure quantitatively to biological activity
- To perform clinical trials
- To synthesize new chemical compounds
Correct Answer: To relate chemical structure quantitatively to biological activity
Q2. Which of the following is a common molecular descriptor representing lipophilicity?
- Topological polar surface area (TPSA)
- LogP
- Number of rotatable bonds
- Molecular weight
Correct Answer: LogP
Q3. Which descriptor type captures 2D connectivity and graph-based information?
- Geometrical descriptors
- Topological descriptors
- Quantum chemical descriptors
- 3D pharmacophore descriptors
Correct Answer: Topological descriptors
Q4. pIC50 is defined as which of the following?
- Negative log10 of IC50 in molar units
- Log10 of IC50 in micromolar units
- IC50 multiplied by 10
- Inverse of potency expressed in percentage
Correct Answer: Negative log10 of IC50 in molar units
Q5. Which classical statistical method is most often used for simple QSAR model building?
- Multiple linear regression (MLR)
- Principal component analysis (PCA)
- Hierarchical clustering
- Docking simulation
Correct Answer: Multiple linear regression (MLR)
Q6. Which validation metric represents cross-validated predictive ability (often for internal validation)?
- R-squared (R2)
- Q-squared (Q2)
- Mean molecular weight
- Number of descriptors
Correct Answer: Q-squared (Q2)
Q7. What does the applicability domain of a QSAR model describe?
- The computational cost of the model
- The chemical space where model predictions are considered reliable
- The user interface of the software
- The number of descriptors used
Correct Answer: The chemical space where model predictions are considered reliable
Q8. What is the purpose of Y-randomization (Y-scrambling) in QSAR?
- To increase descriptor dimensionality
- To test whether model correlation is due to chance
- To improve docking accuracy
- To normalize descriptor values
Correct Answer: To test whether model correlation is due to chance
Q9. Which technique is commonly used to reduce multicollinearity among descriptors?
- Leave-one-out cross-validation
- Principal component analysis (PCA)
- Y-randomization
- Hansch analysis
Correct Answer: Principal component analysis (PCA)
Q10. A sign of model overfitting is:
- High external R2 and low training R2
- High training R2 but poor external/test set performance
- Low descriptor count
- Consistent performance on training and test sets
Correct Answer: High training R2 but poor external/test set performance
Q11. Which method is a classic 3D-QSAR technique?
- Hansch analysis
- CoMFA (Comparative Molecular Field Analysis)
- Linear free energy relationship (LFER)
- LogP estimation
Correct Answer: CoMFA (Comparative Molecular Field Analysis)
Q12. Which descriptor is calculated from 3D molecular conformation rather than 2D structure?
- Topological polar surface area (TPSA)
- Number of hydrogen bond donors
- Molecular volume
- Number of heavy atoms
Correct Answer: Molecular volume
Q13. External validation of a QSAR model typically involves:
- Using the same training set for testing
- Testing model performance on an independent test set
- Shuffling descriptor columns
- Removing outliers from prediction set only
Correct Answer: Testing model performance on an independent test set
Q14. Which of the following is NOT a recognized descriptor class?
- Constitutional descriptors
- Topological descriptors
- Electronic descriptors
- Rhetorical descriptors
Correct Answer: Rhetorical descriptors
Q15. Hansch analysis is best described as:
- A technique to predict solubility using 3D fields
- A method correlating biological activity with physicochemical properties using linear free energy relationships
- A clustering method for chemical libraries
- A quantum mechanical calculation of HOMO energies
Correct Answer: A method correlating biological activity with physicochemical properties using linear free energy relationships
Q16. Why is pKa important in QSAR and drug design?
- It determines the molecular weight
- It influences ionization state, permeability, and ADME properties
- It measures protein binding directly
- It is a type of topological descriptor
Correct Answer: It influences ionization state, permeability, and ADME properties
Q17. Which machine learning algorithm is typically non-linear and ensemble-based?
- Multiple linear regression (MLR)
- Random forest
- Ordinary least squares
- Simple linear regression
Correct Answer: Random forest
Q18. Which statistic is commonly used to detect multicollinearity among descriptors?
- Root mean square error (RMSE)
- Variance inflation factor (VIF)
- Q-squared (Q2)
- Leverage only value
Correct Answer: Variance inflation factor (VIF)
Q19. Why is feature scaling (normalization/standardization) important in QSAR modelling?
- To convert chemical names into numbers
- To bring descriptors to comparable scales and prevent dominance by large-valued descriptors
- To define the applicability domain
- To compute pIC50 values
Correct Answer: To bring descriptors to comparable scales and prevent dominance by large-valued descriptors
Q20. Leave-one-out cross-validation (LOO-CV) means:
- Removing one descriptor from the model
- Using each compound once as the test set while training on the rest
- Leaving one model parameter unspecified
- Running a validation with one external dataset only
Correct Answer: Using each compound once as the test set while training on the rest
Q21. Which error metric directly expresses average prediction error in same units as activity?
- Q2
- R2
- Root mean square error (RMSE)
- Descriptor count
Correct Answer: Root mean square error (RMSE)
Q22. The leverage method (hat values) is used to:
- Measure descriptor autocorrelation
- Identify compounds outside the applicability domain
- Compute logP values
- Normalize activity data
Correct Answer: Identify compounds outside the applicability domain
Q23. Which of the following is an internal validation technique?
- External test set evaluation
- Cross-validation (e.g., k-fold CV)
- Prospective clinical trial
- Independent experimental assay
Correct Answer: Cross-validation (e.g., k-fold CV)
Q24. TPSA (topological polar surface area) is most relevant for predicting:
- Lipophilicity (LogP)
- Protein crystallization propensity
- Cell permeability and oral absorption
- Exact 3D conformation energies
Correct Answer: Cell permeability and oral absorption
Q25. QSPR stands for:
- Quantitative Structure-Property Relationship
- Qualitative Structure-Parameter Regression
- Quantum Structure Potential Ranking
- Quick Structure Prediction Routine
Correct Answer: Quantitative Structure-Property Relationship
Q26. Which practice improves interpretability of a QSAR model?
- Using hundreds of correlated descriptors
- Selecting a small set of relevant, chemically meaningful descriptors
- Applying Y-randomization repeatedly without reporting
- Hiding descriptor definitions
Correct Answer: Selecting a small set of relevant, chemically meaningful descriptors
Q27. Which tool is typically used to calculate 3D molecular descriptors?
- Spreadsheet software only
- Molecular modeling or cheminformatics software
- Text editor
- Hand calculation without geometry
Correct Answer: Molecular modeling or cheminformatics software
Q28. When is log transformation of activity data recommended?
- When activity values are negative
- When activity values span several orders of magnitude
- Never; raw values are always best
- Only for categorical endpoints
Correct Answer: When activity values span several orders of magnitude
Q29. What is a key ethical consideration when publishing QSAR models?
- Omitting validation metrics to simplify presentation
- Ensuring reproducibility, transparency, and reporting applicability domain to avoid misleading predictions
- Using proprietary descriptors without documentation
- Overstating predictive power without test data
Correct Answer: Ensuring reproducibility, transparency, and reporting applicability domain to avoid misleading predictions
Q30. Which approach helps prevent chance correlations during model development?
- Y-randomization and proper external validation
- Maximizing number of descriptors regardless of relevance
- Using only the training set for reporting performance
- Ignoring applicability domain
Correct Answer: Y-randomization and proper external validation

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com

