Introduction:
This quiz collection focuses on statistical methods in QSAR (Quantitative Structure–Activity Relationship) and validation metrics tailored for M.Pharm students studying Computer Aided Drug Design. It covers core concepts such as regression techniques, descriptor selection, multicollinearity, cross-validation strategies, and external validation metrics used to judge model robustness and predictivity. Emphasis is placed on practical validation criteria—R2, Q2, RMSE, CCC, applicability domain, leverage and Tropsha’s rules—so students can critically evaluate QSAR models for reliability and avoid pitfalls like overfitting and chance correlations. These MCQs reinforce theoretical understanding and prepare students for applying rigorous statistical checks in real QSAR projects.
Q1. Which statistical metric primarily measures the proportion of variance in the observed activity explained by the QSAR model?
- Root Mean Square Error (RMSE)
- Concordance Correlation Coefficient (CCC)
- Adjusted R-squared (R2)
- Mean Absolute Error (MAE)
Correct Answer: Adjusted R-squared (R2)
Q2. Which cross-validation method leaves out one compound at a time to estimate internal predictivity?
- k-fold cross-validation
- Leave-One-Out (LOO) cross-validation
- Bootstrapping
- External validation
Correct Answer: Leave-One-Out (LOO) cross-validation
Q3. Which metric indicates how closely predicted values agree with observed values considering both precision and accuracy?
- Q2 (cross-validated R2)
- Concordance Correlation Coefficient (CCC)
- Variance Inflation Factor (VIF)
- Area Under Curve (AUC)
Correct Answer: Concordance Correlation Coefficient (CCC)
Q4. In QSAR model validation, what does a high Variance Inflation Factor (VIF > 10) indicate?
- The model has excellent predictive performance
- Severe multicollinearity among descriptors
- Low external predictivity
- Good agreement between observed and predicted values
Correct Answer: Severe multicollinearity among descriptors
Q5. Which of the following tests is used to detect chance correlation by randomly permuting response values?
- Y-randomization (Y-scrambling)
- Williams plot
- Leverage analysis
- Principal Component Analysis (PCA)
Correct Answer: Y-randomization (Y-scrambling)
Q6. Tropsha’s external validation criterion includes comparing R2 and R0^2; what does R0^2 refer to?
- R-squared for model fitted with shuffled descriptors
- R-squared for regression of predicted vs observed with intercept forced to zero
- Cross-validated R-squared (Q2)
- R-squared adjusted for number of descriptors
Correct Answer: R-squared for regression of predicted vs observed with intercept forced to zero
Q7. Which metric quantifies average magnitude of prediction errors without emphasizing large errors?
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Coefficient of determination (R2)
- Leverage
Correct Answer: Mean Absolute Error (MAE)
Q8. In applicability domain analysis, which plot displays standardized residuals versus leverage to identify outliers and influential compounds?
- ROC curve
- Williams plot
- Box plot
- Scree plot
Correct Answer: Williams plot
Q9. Which method reduces descriptor dimensionality by creating orthogonal linear combinations of descriptors?
- Multiple Linear Regression (MLR)
- Principal Component Analysis (PCA)
- Y-randomization
- Leverage calculation
Correct Answer: Principal Component Analysis (PCA)
Q10. Which validation metric is most appropriate for binary classification QSAR models and measures discrimination capability?
- Concordance Correlation Coefficient (CCC)
- Area Under the ROC Curve (AUC)
- RMSE
- Adjusted R-squared
Correct Answer: Area Under the ROC Curve (AUC)
Q11. Which of the following is a recommended threshold for an acceptable cross-validated Q2 in QSAR predictive models?
- Q2 > 0.1
- Q2 > 0.9
- Q2 > 0.5
- Q2 < 0.0
Correct Answer: Q2 > 0.5
Q12. Which descriptor selection technique uses evolutionary processes (selection, crossover, mutation) to find an optimal subset?
- Stepwise regression
- Genetic algorithm (GA)
- Principal Component Regression (PCR)
- Leverage-based pruning
Correct Answer: Genetic algorithm (GA)
Q13. For external validation, which statistic compares prediction errors between training and test sets to reveal systematic bias?
- Root Mean Square Error of Prediction (RMSEP)
- Q2 (leave-one-out)
- Variance Inflation Factor (VIF)
- k-fold cross-validation score
Correct Answer: Root Mean Square Error of Prediction (RMSEP)
Q14. Which of the following indicates an influential compound in leverage analysis?
- Leverage value much lower than warning leverage (h*)
- Standardized residual close to zero
- Leverage value greater than the warning leverage (h*)
- High Q2 value
Correct Answer: Leverage value greater than the warning leverage (h*)
Q15. Matthews Correlation Coefficient (MCC) is preferred for imbalanced classification because it:
- Only measures sensitivity
- Combines TP, TN, FP, FN into a single balanced metric
- Is identical to accuracy
- Depends solely on prevalence
Correct Answer: Combines TP, TN, FP, FN into a single balanced metric
Q16. Which regression approach is most appropriate when predictors are highly collinear and the number of descriptors exceeds samples?
- Ordinary Least Squares (OLS)
- Partial Least Squares (PLS)
- Univariate linear regression
- Y-randomization
Correct Answer: Partial Least Squares (PLS)
Q17. In external validation, what does a slope k close to 1 in the regression of predicted versus observed imply?
- Severe systematic underestimation
- Model bias toward the mean
- Good agreement without systematic scaling bias
- An overfitted model
Correct Answer: Good agreement without systematic scaling bias
Q18. Which metric is most sensitive to large individual prediction errors because it squares residuals?
- Mean Absolute Error (MAE)
- Root Mean Square Error (RMSE)
- R2
- Concordance Correlation Coefficient (CCC)
Correct Answer: Root Mean Square Error (RMSE)
Q19. What is the main purpose of using an external test set separate from training during QSAR modelling?
- To increase model complexity
- To estimate true predictive performance on unseen compounds
- To perform descriptor scaling
- To calculate variance inflation
Correct Answer: To estimate true predictive performance on unseen compounds
Q20. Which criterion indicates a reliable QSAR model according to Golbraikh and Tropsha when R2pred and slopes satisfy specified limits?
- High VIF values for descriptors
- R2 for training > 0.6 and |k – 1| < 0.1 with R2pred sufficiently high
- Very low RMSE only for training set
- Q2 < 0.2
Correct Answer: R2 for training > 0.6 and |k – 1| < 0.1 with R2pred sufficiently high

I am a Registered Pharmacist under the Pharmacy Act, 1948, and the founder of PharmacyFreak.com. I hold a Bachelor of Pharmacy degree from Rungta College of Pharmaceutical Science and Research. With a strong academic foundation and practical knowledge, I am committed to providing accurate, easy-to-understand content to support pharmacy students and professionals. My aim is to make complex pharmaceutical concepts accessible and useful for real-world application.
Mail- Sachin@pharmacyfreak.com
