Statistical methods in QSAR and validation metrics MCQs With Answer

Introduction:

This quiz collection focuses on statistical methods in QSAR (Quantitative Structure–Activity Relationship) and validation metrics tailored for M.Pharm students studying Computer Aided Drug Design. It covers core concepts such as regression techniques, descriptor selection, multicollinearity, cross-validation strategies, and external validation metrics used to judge model robustness and predictivity. Emphasis is placed on practical validation criteria—R2, Q2, RMSE, CCC, applicability domain, leverage and Tropsha’s rules—so students can critically evaluate QSAR models for reliability and avoid pitfalls like overfitting and chance correlations. These MCQs reinforce theoretical understanding and prepare students for applying rigorous statistical checks in real QSAR projects.

Q1. Which statistical metric primarily measures the proportion of variance in the observed activity explained by the QSAR model?

  • Root Mean Square Error (RMSE)
  • Concordance Correlation Coefficient (CCC)
  • Adjusted R-squared (R2)
  • Mean Absolute Error (MAE)

Correct Answer: Adjusted R-squared (R2)

Q2. Which cross-validation method leaves out one compound at a time to estimate internal predictivity?

  • k-fold cross-validation
  • Leave-One-Out (LOO) cross-validation
  • Bootstrapping
  • External validation

Correct Answer: Leave-One-Out (LOO) cross-validation

Q3. Which metric indicates how closely predicted values agree with observed values considering both precision and accuracy?

  • Q2 (cross-validated R2)
  • Concordance Correlation Coefficient (CCC)
  • Variance Inflation Factor (VIF)
  • Area Under Curve (AUC)

Correct Answer: Concordance Correlation Coefficient (CCC)

Q4. In QSAR model validation, what does a high Variance Inflation Factor (VIF > 10) indicate?

  • The model has excellent predictive performance
  • Severe multicollinearity among descriptors
  • Low external predictivity
  • Good agreement between observed and predicted values

Correct Answer: Severe multicollinearity among descriptors

Q5. Which of the following tests is used to detect chance correlation by randomly permuting response values?

  • Y-randomization (Y-scrambling)
  • Williams plot
  • Leverage analysis
  • Principal Component Analysis (PCA)

Correct Answer: Y-randomization (Y-scrambling)

Q6. Tropsha’s external validation criterion includes comparing R2 and R0^2; what does R0^2 refer to?

  • R-squared for model fitted with shuffled descriptors
  • R-squared for regression of predicted vs observed with intercept forced to zero
  • Cross-validated R-squared (Q2)
  • R-squared adjusted for number of descriptors

Correct Answer: R-squared for regression of predicted vs observed with intercept forced to zero

Q7. Which metric quantifies average magnitude of prediction errors without emphasizing large errors?

  • Root Mean Square Error (RMSE)
  • Mean Absolute Error (MAE)
  • Coefficient of determination (R2)
  • Leverage

Correct Answer: Mean Absolute Error (MAE)

Q8. In applicability domain analysis, which plot displays standardized residuals versus leverage to identify outliers and influential compounds?

  • ROC curve
  • Williams plot
  • Box plot
  • Scree plot

Correct Answer: Williams plot

Q9. Which method reduces descriptor dimensionality by creating orthogonal linear combinations of descriptors?

  • Multiple Linear Regression (MLR)
  • Principal Component Analysis (PCA)
  • Y-randomization
  • Leverage calculation

Correct Answer: Principal Component Analysis (PCA)

Q10. Which validation metric is most appropriate for binary classification QSAR models and measures discrimination capability?

  • Concordance Correlation Coefficient (CCC)
  • Area Under the ROC Curve (AUC)
  • RMSE
  • Adjusted R-squared

Correct Answer: Area Under the ROC Curve (AUC)

Q11. Which of the following is a recommended threshold for an acceptable cross-validated Q2 in QSAR predictive models?

  • Q2 > 0.1
  • Q2 > 0.9
  • Q2 > 0.5
  • Q2 < 0.0

Correct Answer: Q2 > 0.5

Q12. Which descriptor selection technique uses evolutionary processes (selection, crossover, mutation) to find an optimal subset?

  • Stepwise regression
  • Genetic algorithm (GA)
  • Principal Component Regression (PCR)
  • Leverage-based pruning

Correct Answer: Genetic algorithm (GA)

Q13. For external validation, which statistic compares prediction errors between training and test sets to reveal systematic bias?

  • Root Mean Square Error of Prediction (RMSEP)
  • Q2 (leave-one-out)
  • Variance Inflation Factor (VIF)
  • k-fold cross-validation score

Correct Answer: Root Mean Square Error of Prediction (RMSEP)

Q14. Which of the following indicates an influential compound in leverage analysis?

  • Leverage value much lower than warning leverage (h*)
  • Standardized residual close to zero
  • Leverage value greater than the warning leverage (h*)
  • High Q2 value

Correct Answer: Leverage value greater than the warning leverage (h*)

Q15. Matthews Correlation Coefficient (MCC) is preferred for imbalanced classification because it:

  • Only measures sensitivity
  • Combines TP, TN, FP, FN into a single balanced metric
  • Is identical to accuracy
  • Depends solely on prevalence

Correct Answer: Combines TP, TN, FP, FN into a single balanced metric

Q16. Which regression approach is most appropriate when predictors are highly collinear and the number of descriptors exceeds samples?

  • Ordinary Least Squares (OLS)
  • Partial Least Squares (PLS)
  • Univariate linear regression
  • Y-randomization

Correct Answer: Partial Least Squares (PLS)

Q17. In external validation, what does a slope k close to 1 in the regression of predicted versus observed imply?

  • Severe systematic underestimation
  • Model bias toward the mean
  • Good agreement without systematic scaling bias
  • An overfitted model

Correct Answer: Good agreement without systematic scaling bias

Q18. Which metric is most sensitive to large individual prediction errors because it squares residuals?

  • Mean Absolute Error (MAE)
  • Root Mean Square Error (RMSE)
  • R2
  • Concordance Correlation Coefficient (CCC)

Correct Answer: Root Mean Square Error (RMSE)

Q19. What is the main purpose of using an external test set separate from training during QSAR modelling?

  • To increase model complexity
  • To estimate true predictive performance on unseen compounds
  • To perform descriptor scaling
  • To calculate variance inflation

Correct Answer: To estimate true predictive performance on unseen compounds

Q20. Which criterion indicates a reliable QSAR model according to Golbraikh and Tropsha when R2pred and slopes satisfy specified limits?

  • High VIF values for descriptors
  • R2 for training > 0.6 and |k – 1| < 0.1 with R2pred sufficiently high
  • Very low RMSE only for training set
  • Q2 < 0.2

Correct Answer: R2 for training > 0.6 and |k – 1| < 0.1 with R2pred sufficiently high

Leave a Comment