I am interested in mechanistic interpretability of LLM componenets. Another interest is building robust detectors of AI-generated content and investigating fundamental difference between human-written and LLM-generated texts.
📝 My publications and preprints
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts
- Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA
- Quantifying Logical Consistency in Transformers via Query-Key Alignment
- Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts