Skip to content

metanthropic/research-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Metanthropic - Research Papers

Metanthropic Lab License

🔬 Overview

Welcome to the public research archive of Metanthropic Lab. This repository hosts pre-prints, technical reports, and white papers authored by Ekjot Singh and collaborators.

Our goal is to provide accessible, open-source access to our research artifacts, supplementing official publication venues and promoting transparency in AI development.


📚 Publications & Pre-prints

2025

Computation and Language (cs.CL)

1. The Fragility of Guardrails: Cognitive Jamming and Repetition Collapse in Safety-Steered LLMs

  • Author: Ekjot Singh
  • Abstract: Investigates the mechanistic underpinnings of safety guardrails in LLMs using Sparse Autoencoders. We identify specific features responsible for "refusal" behaviors and demonstrate "Cognitive Jamming," where over-steering these safety features induces catastrophic "Repetition Collapse," highlighting the brittleness of current alignment paradigms.
  • Links: 📄 Read PDF

Computer Vision (cs.CV)

1. Dataset Distillation for the Pre-Training Era: Cross-Model Generalization via Linear Gradient Matching

  • Author: Ekjot Singh
  • Abstract: Introduces a novel approach to dataset distillation tailored for the pre-training era, focusing on cross-model generalization capabilities through linear gradient matching techniques.
  • Links: 📄 Read PDF | 💻 Code Repository

2. Revisiting AlexNet: Achieving High-Accuracy on CIFAR-10 with Modern Optimization Techniques

  • Author: Ekjot Singh
  • Abstract: We revisit the original AlexNet architecture, adapting it for CIFAR-10 and achieving 95.7% accuracy by incorporating modern techniques like Batch Normalization, Adam optimizer, and advanced regularization.
  • Links: 📄 Read PDF | 💻 Code Repository

🌐 About the Lab

Metanthropic Lab is focused on pushing the boundaries of machine learning research.

Founder & Director, Principal Researcher

Ekjot Singh


⚖️ License & Citation

The source code, datasets, and technical artifacts in this repository are released under the Apache License 2.0, permitting reuse with attribution while providing explicit patent protection.

Research papers and documentation are provided for academic and educational purposes. If you utilize the methodologies or findings presented herein, please ensure appropriate citation of the respective authors and Metanthropic Lab.

For commercial licensing inquiries, partnership opportunities, or usage beyond standard open-source terms, please contact us directly.

About

Research papers for the lab

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published