Regression modeling – simple and multiple regression hypothesis testing MCQs With Answer

Introduction: In pharmaceutical research, regression modeling is a vital statistical tool for exploring relationships between drug responses and predictors. Simple regression examines how one independent variable (for example, dose) affects an outcome (for example, plasma concentration), while multiple regression evaluates several predictors (dose, age, formulation, and co-medication) simultaneously. Hypothesis testing in regression—using t-tests for individual coefficients and F-tests for overall model fit—helps determine whether predictors significantly influence outcomes. Understanding core assumptions (linearity, independence, homoscedasticity, normality) and diagnostics (residual analysis, variance inflation factor, Durbin–Watson test) ensures valid conclusions in formulation development and pharmacokinetics. Now let’s test your knowledge with 30 MCQs on this topic.

Q1. What is the primary difference between simple regression and multiple regression?

  • Simple regression uses categorical predictors while multiple regression uses continuous predictors
  • Simple regression models one predictor variable; multiple regression models two or more predictor variables
  • Simple regression uses hypothesis testing; multiple regression does not
  • Simple regression requires normality; multiple regression does not require any assumptions

Correct Answer: Simple regression models one predictor variable; multiple regression models two or more predictor variables

Q2. In testing the slope coefficient (β1) in simple linear regression, the null hypothesis is usually:

  • β1 = 1
  • β1 ≠ 0
  • β1 = 0
  • β1 > 0

Correct Answer: β1 = 0

Q3. Which test is commonly used to assess whether an individual regression coefficient differs significantly from zero?

  • Chi-square test
  • t-test
  • Fisher’s exact test
  • Z-test

Correct Answer: t-test

Q4. What does the F-test assess in multiple regression?

  • Whether the residuals are normally distributed
  • Whether at least one predictor variable is significantly related to the response
  • The presence of multicollinearity
  • Whether the slope is equal to 1

Correct Answer: Whether at least one predictor variable is significantly related to the response

Q5. Which statement best describes R-squared (R²)?

  • Proportion of variance in predictors explained by the response
  • Proportion of variance in the response explained by the model
  • Average squared residual
  • Standardized regression coefficient

Correct Answer: Proportion of variance in the response explained by the model

Q6. Why is adjusted R-squared often preferred to R-squared in multiple regression?

  • Adjusted R-squared increases with every added predictor regardless of relevance
  • Adjusted R-squared penalizes adding non-informative predictors and adjusts for number of predictors
  • Adjusted R-squared is always equal to R-squared
  • Adjusted R-squared measures multicollinearity directly

Correct Answer: Adjusted R-squared penalizes adding non-informative predictors and adjusts for number of predictors

Q7. A variance inflation factor (VIF) greater than which value is commonly taken as evidence of problematic multicollinearity?

  • 1
  • 2
  • 5
  • 10

Correct Answer: 10

Q8. Which diagnostic test is commonly used to detect heteroscedasticity (non-constant variance of residuals)?

  • Durbin-Watson test
  • Breusch-Pagan test
  • Levene’s test for equality of variances between two groups only
  • Kruskal-Wallis test

Correct Answer: Breusch-Pagan test

Q9. The Durbin–Watson statistic is used to detect which issue in regression residuals?

  • Heteroscedasticity
  • Non-linearity
  • Autocorrelation (serial correlation)
  • Multicollinearity

Correct Answer: Autocorrelation (serial correlation)

Q10. How are categorical predictors typically included in a regression model?

  • As continuous variables scaled 0–1 without change
  • Using dummy (indicator) variables
  • They cannot be included in regression
  • By converting them to z-scores

Correct Answer: Using dummy (indicator) variables

Q11. What does an interaction term in multiple regression represent?

  • The sum of two predictors
  • The multiplicative effect showing that the effect of one predictor depends on the level of another predictor
  • An error in the model
  • A method to remove collinearity

Correct Answer: The multiplicative effect showing that the effect of one predictor depends on the level of another predictor

Q12. Which test is commonly used to check residuals for normality in small to moderate sample sizes?

  • Shapiro-Wilk test
  • Kolmogorov-Smirnov test with no parameters
  • ANOVA
  • Chi-square goodness-of-fit

Correct Answer: Shapiro-Wilk test

Q13. A 95% confidence interval for a regression slope β1 that does not include zero implies:

  • The slope estimate is biased
  • The corresponding predictor is statistically significant at approximately α = 0.05
  • The predictor has no practical significance
  • The residuals are heteroscedastic

Correct Answer: The corresponding predictor is statistically significant at approximately α = 0.05

Q14. What is the purpose of calculating standardized (beta) coefficients in regression?

  • To test normality of residuals
  • To compare the relative importance of predictors measured on different scales
  • To produce categorical predictors
  • To reduce heteroscedasticity

Correct Answer: To compare the relative importance of predictors measured on different scales

Q15. Which measure identifies influential observations that can unduly affect regression estimates?

  • Cook’s distance
  • VIF
  • R-squared
  • Adjusted R-squared

Correct Answer: Cook’s distance

Q16. In pharmacokinetic modeling, applying a log transformation to concentration data is often useful because:

  • It always makes data categorical
  • It stabilizes variance and linearizes exponential decay relationships
  • It increases heteroscedasticity
  • It removes the need to check assumptions

Correct Answer: It stabilizes variance and linearizes exponential decay relationships

Q17. One criticism of stepwise variable selection methods is that they:

  • Always find the true causal predictors
  • Can produce models that capitalize on random noise and lack reproducibility
  • Are guaranteed to minimize prediction error on new data
  • Do not require hypothesis testing

Correct Answer: Can produce models that capitalize on random noise and lack reproducibility

Q18. Which statement best distinguishes prediction from causation in regression?

  • Good predictive performance implies causation
  • Regression coefficients always indicate causal effects
  • Prediction focuses on accurate forecasts; causal inference requires design or assumptions to support cause-effect claims
  • There is no difference between prediction and causation

Correct Answer: Prediction focuses on accurate forecasts; causal inference requires design or assumptions to support cause-effect claims

Q19. A p-value associated with a regression coefficient indicates:

  • The probability that the null hypothesis is true
  • The probability of observing data as extreme as observed, assuming the null hypothesis is true
  • The magnitude of the effect
  • The sample size required

Correct Answer: The probability of observing data as extreme as observed, assuming the null hypothesis is true

Q20. A Type I error in regression hypothesis testing means:

  • Failing to detect a true effect
  • Incorrectly concluding a predictor has an effect when it does not
  • The model has perfect fit
  • Residuals are normally distributed

Correct Answer: Incorrectly concluding a predictor has an effect when it does not

Q21. For a multiple regression with n observations and k predictors (excluding intercept), the residual degrees of freedom is:

  • n
  • n – 1
  • n – k – 1
  • k

Correct Answer: n – k – 1

Q22. Which effect is expected when multicollinearity among predictors increases?

  • Standard errors of coefficient estimates increase, making inference less precise
  • R-squared decreases dramatically
  • Model always becomes more accurate for prediction
  • Residual variance becomes zero

Correct Answer: Standard errors of coefficient estimates increase, making inference less precise

Q23. Which pattern suggests overfitting when comparing training and test performance?

  • High R-squared on training data but low R-squared on test data
  • Low training error and low test error
  • Equal performance on training and test sets
  • High bias and low variance on training data

Correct Answer: High R-squared on training data but low R-squared on test data

Q24. Why is centering continuous predictors (subtracting the mean) useful when including interaction terms?

  • It eliminates the need for dummy variables
  • It reduces multicollinearity between main effects and interaction terms and eases interpretation
  • It guarantees residual normality
  • It increases VIF values

Correct Answer: It reduces multicollinearity between main effects and interaction terms and eases interpretation

Q25. The null hypothesis for the overall F-test in multiple regression is:

  • All residuals are normally distributed
  • All regression coefficients (except intercept) are zero
  • At least one coefficient is non-zero
  • The model explains 100% variance

Correct Answer: All regression coefficients (except intercept) are zero

Q26. In regression decomposition, the relationship SST = SSR + SSE means:

  • Total sum of squares equals explained sum of squares plus unexplained sum of squares
  • Standard sum of terms equals square root of SSR times SSE
  • Sample size equals sum of squares
  • SST is always less than SSR

Correct Answer: Total sum of squares equals explained sum of squares plus unexplained sum of squares

Q27. When is multiple regression particularly preferable to simple regression in pharmaceutical studies?

  • When only one predictor is available
  • When researchers want to control for confounding variables and assess independent effects of several predictors
  • When assumptions of linearity are violated
  • When sample size is extremely small (n < 10)

Correct Answer: When researchers want to control for confounding variables and assess independent effects of several predictors

Q28. Which graphical diagnostic is most useful to assess the linearity assumption between predictors and response?

  • Histogram of predictors
  • Residuals vs. fitted values plot
  • Bar chart of categorical counts
  • Pareto chart

Correct Answer: Residuals vs. fitted values plot

Q29. In a multiple regression model, the coefficient for a predictor represents:

  • The unadjusted correlation between predictor and response
  • The expected change in the response for a one-unit change in the predictor, holding other predictors constant
  • The variance explained by that predictor alone
  • The p-value of the predictor

Correct Answer: The expected change in the response for a one-unit change in the predictor, holding other predictors constant

Q30. Between AIC and BIC for model selection, which statement is true?

  • AIC penalizes model complexity more strongly than BIC
  • BIC penalizes model complexity more strongly than AIC, favoring simpler models as sample size increases
  • Both criteria always select the same model
  • Lower BIC indicates worse model fit

Correct Answer: BIC penalizes model complexity more strongly than AIC, favoring simpler models as sample size increases

Leave a Comment