- Pittsburgh, PA
-
23:14
- 5h behind - https://yu-shi.github.io/
- https://orcid.org/0000-0001-6335-1076
Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
verl: Volcano Engine Reinforcement Learning for LLMs
Textbook on reinforcement learning from human feedback
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
Lime: Explaining the predictions of any machine learning classifier
Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"
Helper for managing arXiv papers in Zotero
commoncrawl / nutch
Forked from Aloisius/nutchCommon Crawl fork of Apache Nutch
Universal Python binding for the LMDB 'Lightning' Database
A library that provides an embeddable, persistent key-value store for fast storage.
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
A scalable, mature and versatile web crawler based on Apache Storm
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Zstandard - Fast real-time compression algorithm
Puzzles for learning Triton, play it with minimal environment configuration!
A Visual Studio Code extension with support for the Ruff linter.
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".
The repository for the code of the UltraFastBERT paper
Acceptance rates for the major AI conferences