Welcome to the public research archive of Metanthropic Lab. This repository hosts pre-prints, technical reports, and white papers authored by Ekjot Singh and collaborators.
Our goal is to provide accessible, open-source access to our research artifacts, supplementing official publication venues and promoting transparency in AI development.
1. The Fragility of Guardrails: Cognitive Jamming and Repetition Collapse in Safety-Steered LLMs
- Author: Ekjot Singh
- Abstract: Investigates the mechanistic underpinnings of safety guardrails in LLMs using Sparse Autoencoders. We identify specific features responsible for "refusal" behaviors and demonstrate "Cognitive Jamming," where over-steering these safety features induces catastrophic "Repetition Collapse," highlighting the brittleness of current alignment paradigms.
- Links: 📄 Read PDF
1. Dataset Distillation for the Pre-Training Era: Cross-Model Generalization via Linear Gradient Matching
- Author: Ekjot Singh
- Abstract: Introduces a novel approach to dataset distillation tailored for the pre-training era, focusing on cross-model generalization capabilities through linear gradient matching techniques.
- Links: 📄 Read PDF | 💻 Code Repository
2. Revisiting AlexNet: Achieving High-Accuracy on CIFAR-10 with Modern Optimization Techniques
- Author: Ekjot Singh
- Abstract: We revisit the original AlexNet architecture, adapting it for CIFAR-10 and achieving 95.7% accuracy by incorporating modern techniques like Batch Normalization, Adam optimizer, and advanced regularization.
- Links: 📄 Read PDF | 💻 Code Repository
Metanthropic Lab is focused on pushing the boundaries of machine learning research.
- Website: metanthropic.vercel.app
- Contact: metanthropiclabs@gmail.com
Ekjot Singh
- Email: ekjotmakhija@gmail.com
- Website: https://ekjot.me/
- GitHub: https://github.com/ekjotsinghmakhija
- LinkedIn: https://www.linkedin.com/in/ekjot-singh-153110268/
The source code, datasets, and technical artifacts in this repository are released under the Apache License 2.0, permitting reuse with attribution while providing explicit patent protection.
Research papers and documentation are provided for academic and educational purposes. If you utilize the methodologies or findings presented herein, please ensure appropriate citation of the respective authors and Metanthropic Lab.
For commercial licensing inquiries, partnership opportunities, or usage beyond standard open-source terms, please contact us directly.