Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 43 additions & 18 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,15 @@ jobs:
with:
python-version: "3.11"

- name: Install format tools
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
enable-cache: true

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
uv pip install --system -e ".[dev]"

- name: Check formatting
id: format-check
Expand Down Expand Up @@ -102,11 +107,15 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox
pip install -e ".[dev,test]"
uv pip install --system tox
uv pip install --system -e ".[dev,test]"

- name: Run unit tests and linting
run: |
Expand All @@ -132,11 +141,15 @@ jobs:
with:
python-version: "3.11"

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox
pip install -e ".[dev,test]"
uv pip install --system tox
uv pip install --system -e ".[dev,test]"

- name: Run live API tests
env:
Expand Down Expand Up @@ -189,10 +202,14 @@ jobs:
with:
python-version: "3.11"

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install tox
uv pip install --system tox

- name: Run plugin smoke test
run: tox -e plugin-smoke
Expand Down Expand Up @@ -249,10 +266,14 @@ jobs:
- name: Pull gemma2 model
run: docker exec ollama ollama pull gemma2:2b || true

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install tox
run: |
python -m pip install --upgrade pip
pip install tox
uv pip install --system tox

- name: Run Ollama integration tests
run: tox -e ollama-integration
Expand Down Expand Up @@ -331,11 +352,15 @@ jobs:
with:
python-version: "3.11"

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"

- name: Install format tools
run: |
python -m pip install --upgrade pip
# Install formatter tools with pinned versions
pip install pyink==24.3.0 isort==5.13.2 lint-imports==0.3.1
uv pip install --system pyink==24.3.0 isort==5.13.2 lint-imports==0.3.1

- name: Validate PR formatting
run: |
Expand Down Expand Up @@ -394,9 +419,9 @@ jobs:
echo "::notice::Live API tests skipped - API keys not configured"
exit 0
fi
python -m pip install --upgrade pip
pip install tox
pip install -e ".[dev,test]"
# uv is already installed and in PATH from previous step
uv pip install --system tox
uv pip install --system -e ".[dev,test]"
GEMINI_API_KEY="${{ secrets.GEMINI_API_KEY }}" \
LANGEXTRACT_API_KEY="${{ secrets.GEMINI_API_KEY }}" \
OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \
Expand Down
4 changes: 1 addition & 3 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,7 @@
# number of processors available to use.
jobs=0

# When enabled, pylint would attempt to guess common misconfiguration and emit
# user-friendly hints instead of false-positive error messages.
suggestion-mode=yes


# Pickle collected data for later comparisons.
persistent=yes
Expand Down
8 changes: 5 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Please provide as much detail as possible to help us understand and address your

### 1. Development Setup

To get started, clone the repository and install the necessary dependencies for development and testing. Detailed instructions can be found in the [Installation from Source](https://github.com/google/langextract#from-source) section of the `README.md`.
To get started, clone the repository and install the necessary dependencies for development and testing. We recommend using `uv` for faster and more reliable dependency management, but `pip` is also supported. Detailed instructions can be found in the [Installation from Source](https://github.com/google/langextract#from-source) section of the `README.md`.

**Windows Users**: The formatting scripts use bash. Please use one of:
- Git Bash (comes with Git for Windows)
Expand Down Expand Up @@ -72,8 +72,8 @@ formatting virtual environments or other non-source directories.
For automatic formatting checks before each commit:

```bash
# Install pre-commit
pip install pre-commit
# Install pre-commit (using uv)
uv pip install pre-commit

# Install the git hooks
pre-commit install
Expand Down Expand Up @@ -105,6 +105,8 @@ For full testing across Python versions:
tox # runs pylint + pytest on Python 3.10 and 3.11
```

If you have `uv` installed, tox will automatically use it for faster virtual environment creation and dependency installation.

### 5. Adding Custom Model Providers

If you want to add support for a new LLM provider, please refer to the [Provider System Documentation](langextract/providers/README.md). The recommended approach is to create an external plugin package rather than modifying the core library. This allows for:
Expand Down
5 changes: 4 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@ FROM python:3.10-slim
# Set working directory
WORKDIR /app

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# Install LangExtract from PyPI
RUN pip install --no-cache-dir langextract
RUN uv pip install --system --no-cache langextract

# Set default command
CMD ["python"]
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,12 +162,26 @@ See an example of the Vertex AI Batch API usage in [this example](docs/examples/

### From PyPI

**Using uv (Recommended)**:
```bash
uv pip install langextract
```

**Using pip**:
```bash
pip install langextract
```

*Recommended for most users. For isolated environments, consider using a virtual environment:*
*For isolated environments, consider using a virtual environment:*

**With uv**:
```bash
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install langextract
```

**With pip**:
```bash
python -m venv langextract_env
source langextract_env/bin/activate # On Windows: langextract_env\Scripts\activate
Expand All @@ -176,11 +190,29 @@ pip install langextract

### From Source

LangExtract uses modern Python packaging with `pyproject.toml` for dependency management:
### From Source

LangExtract uses modern Python packaging with `pyproject.toml` for dependency management.

**Using uv (Recommended)**:
```bash
git clone https://github.com/google/langextract.git
cd langextract

*Installing with `-e` puts the package in development mode, allowing you to modify the code without reinstalling.*
# For basic installation:
uv pip install -e .

# For development (includes linting tools):
uv pip install -e ".[dev]"

# For testing (includes pytest):
uv pip install -e ".[test]"

# To sync all dependencies with lock file:
uv sync --all-extras
```

**Using pip**:
```bash
git clone https://github.com/google/langextract.git
cd langextract
Expand Down
4 changes: 2 additions & 2 deletions langextract/providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@
]

# Track provider loading for lazy initialization
_plugins_loaded = False
_builtins_loaded = False
_plugins_loaded = False # pylint: disable=invalid-name
_builtins_loaded = False # pylint: disable=invalid-name


def load_builtins_once() -> None:
Expand Down
138 changes: 138 additions & 0 deletions scripts/benchmark_install.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import os
from pathlib import Path
import shutil
import subprocess
import sys
import time
from typing import Dict, Optional


def run_cmd(cmd: str, cwd: Path = Path("."), desc: str = "") -> Optional[float]:
"""
Execute a shell command and measure its execution time.

Args:
cmd: The shell command to execute.
cwd: Current working directory for the command.
desc: Description of the task for logging.

Returns:
Duration in seconds if successful, None if failed.
"""
print(f"Running: {desc}...")
start_time = time.time()
try:
subprocess.run(
cmd,
cwd=cwd,
shell=True, # Kept as True for complex chaining (source && pip)
# Alternatively, we could call the venv binary directly, but 'source' is idiomatic for venvs.
executable="/bin/bash" if sys.platform != "win32" else None,
check=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
except subprocess.CalledProcessError:
print(f"Error running command: {cmd}")
return None

duration = time.time() - start_time
print(f" -> Time: {duration:.2f}s")
return duration


def setup_venv(path: Path) -> None:
"""Create a fresh virtual environment at the specified path."""
if path.exists():
shutil.rmtree(path)
subprocess.run([sys.executable, "-m", "venv", str(path)], check=True)


def benchmark() -> None:
"""Run the complete pip vs uv benchmark suite."""
base_dir = Path.cwd()
venv_pip = base_dir / "bench_env_pip"
venv_uv = base_dir / "bench_env_uv"

results: Dict[str, Optional[float]] = {}

print("=== Starting Benchmark: pip vs uv ===\n")

# --- Scenario 1: pip (Cold) ---
setup_venv(venv_pip)
pip_cmd = (
f"source {venv_pip}/bin/activate && pip install .[all,dev,test]"
" --no-cache-dir"
)
results["pip_cold"] = run_cmd(pip_cmd, desc="pip install (Cold Cache)")

# --- Scenario 2: pip (Warm) ---
setup_venv(venv_pip)
# Standard pip warm run: pip usually caches wheels in ~/.cache/pip.
pip_cmd_warm = (
f"source {venv_pip}/bin/activate && pip install .[all,dev,test]"
)
results["pip_warm"] = run_cmd(pip_cmd_warm, desc="pip install (Warm Cache)")

# --- Scenario 3: uv pip (Cold) ---
setup_venv(venv_uv)
# --no-cache enforces a clean install for this command
uv_cmd = (
f"source {venv_uv}/bin/activate && uv pip install .[all,dev,test]"
" --no-cache"
)
results["uv_cold"] = run_cmd(uv_cmd, desc="uv pip install (Cold Cache)")

# --- Scenario 4: uv pip (Warm) ---
setup_venv(venv_uv)
# Warm run uses the default uv cache
uv_cmd_warm = (
f"source {venv_uv}/bin/activate && uv pip install .[all,dev,test]"
)
results["uv_warm"] = run_cmd(uv_cmd_warm, desc="uv pip install (Warm Cache)")

# --- Scenario 5: uv sync (Cold-ish) ---
# uv sync creates its own managed .venv
uv_venv = base_dir / ".venv"
if uv_venv.exists():
shutil.rmtree(uv_venv)

uv_sync_cmd = "uv sync --all-extras --no-cache"
results["uv_sync_cold"] = run_cmd(uv_sync_cmd, desc="uv sync (Cold Cache)")

# --- Scenario 6: uv sync (Warm) ---
if uv_venv.exists():
shutil.rmtree(uv_venv)

uv_sync_warm = "uv sync --all-extras"
results["uv_sync_warm"] = run_cmd(uv_sync_warm, desc="uv sync (Warm Cache)")

# Cleanup
if venv_pip.exists():
shutil.rmtree(venv_pip)
if venv_uv.exists():
shutil.rmtree(venv_uv)

print("\n=== Benchmark Results ===")
print(f"| Method | Scenario | Time (s) | Speedup vs pip |")
print(f"|--------|----------|----------|----------------|")

pip_c = results.get("pip_cold")
# Default to 0.0 avoids division error if benchmark failed
baseline = pip_c if pip_c is not None else 0.0

for key, val in results.items():
if val is None:
val_display = "Failed"
speedup = "-"
else:
val_display = f"{val:.2f}"
speedup = f"{baseline / val:.1f}x" if val > 0 and baseline > 0 else "-"

scenario = key.replace("_", " ").title()
method_name = key.split("_")[0]
print(f"| {method_name} | {scenario} | {val_display} | {speedup} |")


if __name__ == "__main__":
benchmark()
Loading
Loading