Skip to content

Power analysis to determine the number of lines necessary to code to achieve a certain level of kappa based on the expected base rate and an initial level of agreement observed between the coders

Notifications You must be signed in to change notification settings

pcla-code/KappaPowerAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Kappa Power Analysis

This repository provides a Monte Carlo simulation tool for estimating how many lines (items) need to be double-coded to achieve reliable estimates of Cohen’s κ.
It is based on the paper A Less Overconservative Method for Reliability Estimation for Cohen’s Kappa by He, M., Baker, R. S., Hutt, S., & Zhang, J. (2024), presented at the Fourth International Conference on Quantitative Ethnography.

Given an expected population κ (POPULATION_KAPPA) and known total number of lines (POPULATION_SIZE), the code determines the minimum sample size needed so that, if the true population κ were only that value, the probability of observing a sample κ above a specified threshold (SAMPLE_THRESHOLD) is below a chosen tolerance (TARGET_PROB).

In other words, this code identifies a sample size that makes it very unlikely to observe a high sample κ if the true agreement were meaningfully lower, providing evidence that the full dataset κ is not far below the target threshold.

The simulation also accounts for each coder’s observed base rates (PREV_R1, PREV_R2), allowing realistic modeling of prevalence effects on κ.

About

Power analysis to determine the number of lines necessary to code to achieve a certain level of kappa based on the expected base rate and an initial level of agreement observed between the coders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages