This repository provides a Monte Carlo simulation tool for estimating how many lines (items) need to be double-coded to achieve reliable estimates of Cohen’s κ.
It is based on the paper A Less Overconservative Method for Reliability Estimation for Cohen’s Kappa by He, M., Baker, R. S., Hutt, S., & Zhang, J. (2024), presented at the Fourth International Conference on Quantitative Ethnography.
Given an expected population κ (POPULATION_KAPPA) and known total number of lines (POPULATION_SIZE), the code determines the minimum sample size needed so that, if the true population κ were only that value, the probability of observing a sample κ above a specified threshold (SAMPLE_THRESHOLD) is below a chosen tolerance (TARGET_PROB).
In other words, this code identifies a sample size that makes it very unlikely to observe a high sample κ if the true agreement were meaningfully lower, providing evidence that the full dataset κ is not far below the target threshold.
The simulation also accounts for each coder’s observed base rates (PREV_R1, PREV_R2), allowing realistic modeling of prevalence effects on κ.