Record linkage systems and databases MCQs With Answer

Record linkage systems and databases MCQs With Answer

Introduction: Record linkage is a core methodology in pharmacoepidemiology and pharmacoeconomics for combining patient-level information from disparate sources—electronic health records, pharmacy dispensing, insurance claims, registries, and vital statistics—to create longitudinal datasets that support drug safety, utilization, and cost-effectiveness studies. Accurate linkage reduces duplication, identifies true exposure/outcome relationships, and minimizes bias from misclassification. This set of MCQs is designed for M.Pharm students to deepen understanding of deterministic and probabilistic linkage, blocking strategies, privacy-preserving techniques, evaluation metrics, and the practical impacts of linkage errors on pharmacoepidemiologic estimates and decision-making.

Q1. Which of the following best describes probabilistic record linkage as used in pharmacoepidemiologic research?

  • Matching records only when a unique identifier such as national ID is identical
  • Using predefined deterministic rules requiring exact matches on several fields
  • Assigning weighted agreement scores across multiple fields and classifying matches based on thresholds
  • Manually reviewing all record pairs to determine matches

Correct Answer: Assigning weighted agreement scores across multiple fields and classifying matches based on thresholds

Q2. In the Fellegi–Sunter probabilistic linkage model, what do the m and u probabilities represent?

  • m = probability of clerical review; u = probability of deterministic match
  • m = probability fields agree given records are matches; u = probability fields agree given records are non-matches
  • m = missing data proportion; u = unmatched record proportion
  • m = match threshold; u = unmatch threshold

Correct Answer: m = probability fields agree given records are matches; u = probability fields agree given records are non-matches

Q3. Which blocking strategy helps improve scalability while minimizing missed matches during linkage of large healthcare databases?

  • No blocking: compare every pair of records regardless of size
  • Single strict block on full postal address
  • Multi-pass blocking using different keys such as phonetic surname and birth year
  • Blocking only on the first name exact match

Correct Answer: Multi-pass blocking using different keys such as phonetic surname and birth year

Q4. Which error type in record linkage occurs when two records belonging to the same individual are not linked?

  • False match (false positive)
  • Missed match (false negative)
  • Clerical match
  • Duplicate false positive

Correct Answer: Missed match (false negative)

Q5. Which metric is most informative about the proportion of identified links that are true matches?

  • Sensitivity
  • Specificity
  • Positive predictive value (PPV)
  • Negligible error rate

Correct Answer: Positive predictive value (PPV)

Q6. Privacy-preserving record linkage (PPRL) often employs which of the following techniques to allow linkage without revealing raw identifiers?

  • Transferring full plaintext identifiers between institutions
  • Use of cryptographic hashes or Bloom filters representing quasi-identifiers
  • Manual exchange of masked spreadsheets
  • Publishing all identifiers in open-access repositories

Correct Answer: Use of cryptographic hashes or Bloom filters representing quasi-identifiers

Q7. When evaluating a linkage algorithm against a gold standard, which combination of metrics would best describe both completeness and correctness of linkage?

  • Specificity and negative predictive value
  • Sensitivity and positive predictive value
  • Clerical effort and runtime
  • Number of blocks and block size

Correct Answer: Sensitivity and positive predictive value

Q8. In deterministic linkage rules commonly used in pharmacy claims linkage, which rule is likely to be most specific but least sensitive?

  • Exact match on encrypted national identifier plus exact birthdate
  • Match on surname phonetic encoding and birth year within one year
  • Match on postal code and gender
  • Match on medication dispensing dates within ±7 days

Correct Answer: Exact match on encrypted national identifier plus exact birthdate

Q9. How can linkage error specifically bias pharmacoepidemiologic estimates of drug adverse event incidence?

  • Only increases statistical power without affecting incidence estimates
  • Missed linkage of outcome records to exposure records can lead to underestimation of incidence
  • False matches always reduce incidence estimates
  • Linkage errors rarely affect time-to-event analyses

Correct Answer: Missed linkage of outcome records to exposure records can lead to underestimation of incidence

Q10. Which preprocessing steps are most important before performing record linkage across clinical and pharmacy datasets?

  • Standardizing name formats, normalizing dates, and addressing missing values
  • Randomly shuffling records to anonymize order only
  • Removing all quasi-identifiers to prevent linkage
  • Converting all fields to binary indicator variables

Correct Answer: Standardizing name formats, normalizing dates, and addressing missing values

Q11. In the context of record linkage, what is “blocking” primarily intended to achieve?

  • To anonymize identifiers using cryptography
  • To partition data into smaller candidate sets to reduce pairwise comparisons
  • To ensure exact matches only are considered
  • To merge datasets after linkage is complete

Correct Answer: To partition data into smaller candidate sets to reduce pairwise comparisons

Q12. Which software/tool is specifically designed for probabilistic record linkage and often used in public health research?

  • SPSS Statistics
  • LinkPlus
  • Microsoft Excel
  • PowerPoint

Correct Answer: LinkPlus

Q13. What is a primary advantage of linking pharmacy dispensing data with hospitalization records for pharmacoepidemiology studies?

  • It eliminates the need for outcome adjudication
  • It allows assessment of medication exposure prior to clinical events and improves outcome ascertainment
  • It guarantees 100% capture of all adverse events
  • It removes all confounding by indication

Correct Answer: It allows assessment of medication exposure prior to clinical events and improves outcome ascertainment

Q14. Which of the following describes a clerical review in the linkage process?

  • Fully automated deterministic matching with no human input
  • Manual inspection of uncertain pairs to decide match status
  • Generating blocking keys automatically
  • Encrypting identifiers before hashing

Correct Answer: Manual inspection of uncertain pairs to decide match status

Q15. When setting thresholds in probabilistic linkage, what trade-off does lowering the match threshold create?

  • Increases specificity and decreases sensitivity
  • Increases sensitivity but may decrease positive predictive value due to more false matches
  • Reduces clerical workload automatically with no accuracy change
  • Guarantees fewer missed matches with no increase in false matches

Correct Answer: Increases sensitivity but may decrease positive predictive value due to more false matches

Q16. Which field combination is most robust for linking records when national identifiers are absent but pharmacy and clinical datasets are available?

  • First name only and medication name
  • Surname phonetic code, date of birth, and gender
  • Medication dose and insurance plan ID only
  • Admission diagnosis text only

Correct Answer: Surname phonetic code, date of birth, and gender

Q17. What role does sampling-based clerical review play in large-scale linkage projects?

  • It is unnecessary if blocking is used
  • It provides an estimate of linkage error rates and helps calibrate thresholds without reviewing all pairs
  • It increases the number of false positives intentionally
  • It replaces the need for any probabilistic modeling

Correct Answer: It provides an estimate of linkage error rates and helps calibrate thresholds without reviewing all pairs

Q18. How does use of Bloom filters improve privacy in record linkage compared with plain hashing?

  • Bloom filters reveal raw identifiers but compress them
  • Bloom filters allow approximate matching on tokenized identifiers without disclosing exact values
  • Bloom filters are reversible and thus less private than hashing
  • Bloom filters prevent any matching by design

Correct Answer: Bloom filters allow approximate matching on tokenized identifiers without disclosing exact values

Q19. In evaluating whether linked data can be used to estimate medication adherence over time, which linkage property is most critical?

  • High specificity only
  • High longitudinal linkage continuity (consistent person-level linkage across time)
  • Small block sizes regardless of match quality
  • High rate of clerical rejections

Correct Answer: High longitudinal linkage continuity (consistent person-level linkage across time)

Q20. Which statement best describes the effect of misclassification due to linkage error on comparative safety studies (e.g., hazard ratios)?

  • Random missed matches always bias hazard ratios away from the null
  • Differential linkage error by exposure or outcome can induce bias in either direction, potentially altering conclusions
  • Linkage error has no impact if sample size is large
  • False matches always lead to conservative estimates of association

Correct Answer: Differential linkage error by exposure or outcome can induce bias in either direction, potentially altering conclusions

Leave a Comment