Skip to content
Change the repository type filter

All

    Repositories list

    • docetl

      Public
      A system for agentic LLM-powered data processing and ETL
      Python
      2542.6k246Updated Aug 1, 2025Aug 1, 2025
    • TWIX

      Public
      TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
      Python
      1220031Updated May 29, 2025May 29, 2025
    • BARGAIN

      Public
      Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
      Python
      32000Updated May 1, 2025May 1, 2025
    • Examples of docetl pipelines
      Python
      0200Updated Apr 22, 2025Apr 22, 2025
    • Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB
      JavaScript
      1800Updated Jul 26, 2024Jul 26, 2024
    • Introduction to Flordb with PyTorch and TensorFlow
      Jupyter Notebook
      0000Updated Apr 9, 2024Apr 9, 2024