Introduction: Regression analysis is essential in pharmaceutical applications for modelling relationships between drug dose, concentration, and therapeutic response. B. Pharm students must master linear and multiple regression, logistic and non-linear models, model validation, goodness-of-fit, R-squared, adjusted R-squared, residual analysis, multicollinearity, and predictive modeling for pharmacokinetics, formulation optimization, bioavailability, and quality control. Practical skills include interpreting coefficients, interaction terms, confidence and prediction intervals, variable selection methods, and avoiding overfitting using cross-validation and penalized regression. Understanding these concepts enhances data-driven decision-making in drug development, dose-response analysis, and regulatory submissions. Now let’s test your knowledge with 30 MCQs on this topic.
Q1. What does the slope coefficient in a simple linear regression model represent in a pharmacokinetic dose-response study?
- Change in response per unit change in dose
- Baseline response when dose is zero
- Random variation not explained by the model
- Overall fit of the regression line
Correct Answer: Change in response per unit change in dose
Q2. Which assumption is NOT required for ordinary least squares (OLS) regression to produce unbiased coefficient estimates?
- Linearity of relationship between predictors and outcome
- No multicollinearity among predictors
- Errors are normally distributed for unbiasedness
- Errors have zero mean and are uncorrelated with predictors
Correct Answer: Errors are normally distributed for unbiasedness
Q3. In multiple linear regression for predicting drug plasma concentration, what does an interaction term between dose and formulation indicate?
- That dose and formulation independently affect concentration only
- That effect of dose on concentration depends on formulation
- That multicollinearity is present
- That residuals are heteroscedastic
Correct Answer: That effect of dose on concentration depends on formulation
Q4. Which metric penalizes adding irrelevant predictors and is preferable to R-squared for model comparison?
- Adjusted R-squared
- Pearson correlation
- Mean squared error
- Cook’s distance
Correct Answer: Adjusted R-squared
Q5. In regression diagnostics, a high Variance Inflation Factor (VIF) indicates what problem?
- Heteroscedasticity
- Autocorrelation
- Multicollinearity
- Nonlinearity
Correct Answer: Multicollinearity
Q6. Which regression method helps prevent overfitting by shrinking coefficients and can be used in QSAR or formulation modeling?
- Ordinary least squares
- Ridge regression
- Principal component analysis
- Kaplan-Meier estimation
Correct Answer: Ridge regression
Q7. For a binary outcome like adverse event occurrence, which regression is most appropriate?
- Linear regression
- Logistic regression
- Poisson regression
- Proportional hazards regression
Correct Answer: Logistic regression
Q8. What does a residual plot showing increasing spread with fitted values suggest in a dissolution study model?
- Linearity holds perfectly
- Heteroscedasticity
- Multicollinearity
- High R-squared
Correct Answer: Heteroscedasticity
Q9. When developing an IVIVC (in vitro–in vivo correlation), which regression outcome is most relevant?
- Predicting batch manufacturing cost
- Predicting in vivo plasma concentration from in vitro dissolution
- Estimating pKa of the drug substance
- Assessing stability under accelerated conditions
Correct Answer: Predicting in vivo plasma concentration from in vitro dissolution
Q10. What does a 95% prediction interval from a regression model represent for a new patient’s drug concentration?
- Range where 95% of future observed concentrations will fall
- Range where 95% of coefficient estimates lie
- Range where the mean concentration lies with 95% confidence
- Range of residuals used to fit the model
Correct Answer: Range where 95% of future observed concentrations will fall
Q11. In stepwise variable selection, what is a major drawback when applied to pharmacological datasets?
- Produces unbiased coefficients
- Always selects the true biological predictors
- Can produce unstable models and overfit
- Eliminates multicollinearity
Correct Answer: Can produce unstable models and overfit
Q12. Which test is used to detect autocorrelation in residuals from a time-series PK model?
- Durbin-Watson test
- Shapiro-Wilk test
- Breusch-Pagan test
- Levene’s test
Correct Answer: Durbin-Watson test
Q13. In dose-response modelling, which regression form is better for an S-shaped curve?
- Simple linear regression
- Logistic (sigmoidal) or Hill equation (non-linear)
- Poisson regression
- Proportional odds model
Correct Answer: Logistic (sigmoidal) or Hill equation (non-linear)
Q14. What is the primary purpose of cross-validation when building predictive models for drug formulation?
- To increase the sample size artificially
- To evaluate model generalizability on unseen data
- To measure multicollinearity between predictors
- To compute Cook’s distance for outliers
Correct Answer: To evaluate model generalizability on unseen data
Q15. Which diagnostic identifies influential observations that disproportionately affect regression coefficients?
- Variance Inflation Factor (VIF)
- Cook’s distance
- R-squared
- Adjusted R-squared
Correct Answer: Cook’s distance
Q16. In a PK regression, transforming concentration (log transformation) is often used to address which issue?
- Autocorrelation
- Nonlinearity and heteroscedasticity
- Multicollinearity
- Low sample size
Correct Answer: Nonlinearity and heteroscedasticity
Q17. What does an R-squared value of 0.85 indicate in a predictive model for dissolution rate?
- 85% of variability in dissolution rate is explained by predictors
- Model predictions are 85% accurate for individual samples
- 85% probability that the model is valid
- 85% of predictors are significant
Correct Answer: 85% of variability in dissolution rate is explained by predictors
Q18. Which approach is preferred to handle many correlated descriptors in QSAR modeling?
- Stepwise selection without regularization
- Principal component regression or penalized methods
- Ignore correlations and use OLS
- Use univariate regressions only
Correct Answer: Principal component regression or penalized methods
Q19. When comparing two nested regression models, which test assesses whether adding predictors significantly improves fit?
- t-test for coefficients
- F-test for nested models
- Breusch-Pagan test
- Shapiro-Wilk test
Correct Answer: F-test for nested models
Q20. In logistic regression predicting adverse reaction (yes/no), what does an odds ratio greater than 1 signify for a predictor?
- Predictor decreases odds of adverse reaction
- Predictor has no effect
- Predictor increases odds of adverse reaction
- Predictor is collinear with outcome
Correct Answer: Predictor increases odds of adverse reaction
Q21. Which method estimates coefficients when the response variable follows a Poisson distribution (e.g., count of side effects)?
- Linear regression with OLS
- Poisson regression (generalized linear model)
- Kaplan-Meier method
- ANOVA
Correct Answer: Poisson regression (generalized linear model)
Q22. What is the effect of omitting an important confounder from a regression model in a clinical study?
- No impact on coefficients
- Can bias coefficient estimates
- Always increases R-squared
- Eliminates heteroscedasticity
Correct Answer: Can bias coefficient estimates
Q23. Which validation metric is preferable for imbalanced binary classification in pharmacovigilance signal detection?
- Accuracy
- Area under the ROC curve (AUC)
- R-squared
- Mean absolute error
Correct Answer: Area under the ROC curve (AUC)
Q24. For time-to-event data like time to treatment failure, which regression framework is most appropriate?
- Linear regression
- Cox proportional hazards model
- Logistic regression
- Poisson regression
Correct Answer: Cox proportional hazards model
Q25. In model interpretation, what does a small p-value for a regression coefficient indicate?
- Strong evidence that the coefficient differs from zero
- That the predictor is clinically important regardless of effect size
- That the model has poor fit
- That multicollinearity is present
Correct Answer: Strong evidence that the coefficient differs from zero
Q26. Which technique helps select variables while accounting for model complexity by shrinking some coefficients to exactly zero?
- Ridge regression
- Lasso regression
- PCA only
- Ordinary least squares
Correct Answer: Lasso regression
Q27. When building a predictive model for bioavailability, why is external validation on an independent dataset important?
- To improve the model’s R-squared on the training set
- To assess how well the model generalizes to new data
- To reduce the number of predictors automatically
- To guarantee causal inference
Correct Answer: To assess how well the model generalizes to new data
Q28. In regression, what is leverage and why is it important in pharmaceutical data analysis?
- Measure of heteroscedasticity; important for model variance only
- Measure of how far an observation’s predictor values are from the mean; identifies points that can strongly influence fit
- A regularization parameter used in ridge regression
- Another term for residuals
Correct Answer: Measure of how far an observation’s predictor values are from the mean; identifies points that can strongly influence fit
Q29. Why might nonlinear regression be preferred over linear regression for modeling enzyme kinetics in drug metabolism?
- Enzyme kinetics often follow saturable (Michaelis-Menten) relationships that are inherently nonlinear
- Nonlinear regression requires fewer data points always
- Linear regression cannot compute residuals
- Nonlinear regression eliminates confounding
Correct Answer: Enzyme kinetics often follow saturable (Michaelis-Menten) relationships that are inherently nonlinear
Q30. Which approach helps address heteroscedastic residuals when modeling concentration-time data?
- Ignore heteroscedasticity because coefficients remain unbiased
- Use weighted least squares or transform the response (e.g., log transform)
- Reduce sample size to stabilize variance
- Use only univariate regressions
Correct Answer: Use weighted least squares or transform the response (e.g., log transform)

