Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
2ce2399
docs(pypi): Improve README display and badge reliability
aksg87 Jul 22, 2025
4fe7580
feat: add trusted publishing workflow and prepare v1.0.0 release
aksg87 Jul 22, 2025
e696a48
Fix: Resolve libmagic ImportError (#6)
aksg87 Aug 1, 2025
5447637
docs: clarify output_dir behavior in medication_examples.md
kleeena Aug 1, 2025
9c47b34
Merge pull request #11 from google/fix/libmagic-dependency-issue
aksg87 Aug 1, 2025
175e075
Removed inline comment in medication example
kleeena Aug 2, 2025
9472099
Merge pull request #15 from kleeena/docs/update-medication_examples.md
aksg87 Aug 2, 2025
e6c3dcd
docs: add output_dir="." to all save_annotated_documents examples
aksg87 Aug 2, 2025
1fb1f1d
Merge pull request #17 from google/fix/output-dir-consistency
aksg87 Aug 2, 2025
7905f93
Fix typo in Ollama API parameter name
Mirza-Samad-Ahmed-Baig Aug 2, 2025
06afc9c
Fix security vulnerability and bugs in Ollama API integration
Mirza-Samad-Ahmed-Baig Aug 2, 2025
13fbd2c
build: add formatting & linting pipeline with pre-commit integration
aksg87 Aug 3, 2025
c8d2027
style: apply pyink, isort, and pre-commit formatting
aksg87 Aug 3, 2025
146a095
ci: enable format and lint checks in tox
aksg87 Aug 3, 2025
aa6da18
Merge pull request #24 from google/feat/code-formatting-pipeline
aksg87 Aug 3, 2025
ed65bca
Add LangExtractError base exception for centralized error handling
aksg87 Aug 3, 2025
6c4508b
Merge pull request #26 from google/feat/exception-hierarchy
aksg87 Aug 3, 2025
8b85225
fix: Remove LangFun and pylibmagic dependencies (v1.0.2)
aksg87 Aug 3, 2025
88520cc
Merge pull request #28 from google/fix/remove-breaking-dep-langfun
aksg87 Aug 3, 2025
75a6f12
Fix save_annotated_documents to handle string paths
aksg87 Aug 3, 2025
a415b94
Merge pull request #29 from google/fix-save-annotated-documents-mkdir
aksg87 Aug 3, 2025
8289b3a
feat: Add OpenAI language model support
aksg87 Aug 3, 2025
c8ef723
Merge pull request #31 from google/feature/add-oai-inference
aksg87 Aug 3, 2025
dfe8188
fix(ui): prevent current highlight border from being obscured. Chan…
tonebeta Aug 4, 2025
0d76530
Merge branch 'google:main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 4, 2025
87c511e
feat: Add live API integration tests (#39)
aksg87 Aug 4, 2025
dc61372
Add PR template validation workflow (#45)
aksg87 Aug 4, 2025
7fc809f
Merge branch 'main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 5, 2025
da771e6
fix: Change OllamaLanguageModel parameter from 'model' to 'model_id' …
aksg87 Aug 5, 2025
e83d5cf
feat: Add CITATION.cff file for proper software citation
aksg87 Aug 5, 2025
337beee
feat: Add Ollama integration with Docker examples and CI tests (#62)
aksg87 Aug 5, 2025
a7ef0bd
chore: Bump version to 1.0.4 for release
aksg87 Aug 5, 2025
87beb4f
build(deps): bump tj-actions/changed-files (#66)
dependabot[bot] Aug 5, 2025
db140d1
Add PR validation workflows and update contribution guidelines (#74)
aksg87 Aug 5, 2025
ed97f73
Fix custom comment in linked issue check (#77)
aksg87 Aug 5, 2025
ad1f27b
Add infrastructure file protection workflow (#76)
aksg87 Aug 5, 2025
41bc9ed
Allow maintainers to bypass community support requirement
aksg87 Aug 5, 2025
54e57db
Add manual trigger capability to validation workflows (#75)
aksg87 Aug 5, 2025
25ebc17
Fix fork PR labeling by using pull_request_target
aksg87 Aug 5, 2025
1290d63
Add workflow_dispatch trigger to CI workflow
aksg87 Aug 6, 2025
42687fc
Add secure label-based testing for fork PRs
aksg87 Aug 6, 2025
234081e
Add base_url to OpenAILanguageModel (#51)
mariano Aug 6, 2025
46b4f0d
Fix validation workflows that were skipping all checks
aksg87 Aug 6, 2025
6fb66cf
Add commit status to revalidation workflow
aksg87 Aug 6, 2025
47a251e
Fix boolean comparison in revalidation workflow
aksg87 Aug 7, 2025
b28e673
Add maintenance scripts for PR management
aksg87 Aug 7, 2025
6b02efb
Fix IPython import warnings and notebook detection (#86)
aksg87 Aug 7, 2025
e6dcc8e
Fix CI to validate PR branch formatting directly
aksg87 Aug 7, 2025
1c3c1a2
Add PR update automation workflows
aksg87 Aug 7, 2025
b60f0b2
Fix workflow formatting
aksg87 Aug 7, 2025
f888bd8
Minor changes
Mirza-Samad-Ahmed-Baig Aug 7, 2025
8659ef3
Merge branch 'fix-ollama-num-threads-typo'
Mirza-Samad-Ahmed-Baig Aug 7, 2025
ea71754
Fix chunking bug and improve test documentation (#88)
aksg87 Aug 7, 2025
82c6644
Fix: Resolve merge conflict and update docstrings in inference.py
Mirza-Samad-Ahmed-Baig Aug 7, 2025
ce0caa5
Changes
Mirza-Samad-Ahmed-Baig Aug 7, 2025
792fd3e
Merge branch 'main' into fix-ollama-num-threads-typo
Mirza-Samad-Ahmed-Baig Aug 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ contact_links:
url: https://g.co/vulnz
about: >
To report a security issue, please use https://g.co/vulnz. The Google Security Team will
respond within 5 working days of your report on https://g.co/vulnz.
respond within 5 working days of your report on https://g.co/vulnz.
39 changes: 39 additions & 0 deletions .github/scripts/add-new-checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash
# Copyright 2025 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Script to add new required status checks to an existing branch protection rule.
# This preserves all your current settings and just adds the new checks

echo "Adding new PR validation checks to existing branch protection..."

# Add the new checks to existing ones
echo "Adding new checks: enforce, size, and protect-infrastructure..."
gh api repos/:owner/:repo/branches/main/protection/required_status_checks/contexts \
--method POST \
--input - <<< '["enforce", "size", "protect-infrastructure"]'

echo ""
echo "✓ New checks added!"
echo ""
echo "Updated required status checks will include:"
echo "- test (3.10) [existing]"
echo "- test (3.11) [existing]"
echo "- test (3.12) [existing]"
echo "- Validate PR Template [existing]"
echo "- live-api-tests [existing]"
echo "- ollama-integration-test [existing]"
echo "- enforce [NEW - linked issue validation]"
echo "- size [NEW - PR size limit]"
echo "- protect-infrastructure [NEW - infrastructure file protection]"
55 changes: 55 additions & 0 deletions .github/scripts/add-size-labels.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/bin/bash
# Copyright 2025 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Add size labels to PRs based on their change count

echo "Adding size labels to PRs..."

# Get all open PRs with their additions and deletions
gh pr list --limit 50 --json number,additions,deletions --jq '.[]' | while read -r pr_data; do
pr_number=$(echo "$pr_data" | jq -r '.number')
additions=$(echo "$pr_data" | jq -r '.additions')
deletions=$(echo "$pr_data" | jq -r '.deletions')
total_changes=$((additions + deletions))

# Determine size label
if [ $total_changes -lt 50 ]; then
size_label="size/XS"
elif [ $total_changes -lt 150 ]; then
size_label="size/S"
elif [ $total_changes -lt 600 ]; then
size_label="size/M"
elif [ $total_changes -lt 1000 ]; then
size_label="size/L"
else
size_label="size/XL"
fi

echo "PR #$pr_number: $total_changes lines -> $size_label"

# Remove any existing size labels first
existing_labels=$(gh pr view $pr_number --json labels --jq '.labels[].name' | grep "^size/" || true)
if [ ! -z "$existing_labels" ]; then
echo " Removing existing label: $existing_labels"
gh pr edit $pr_number --remove-label "$existing_labels"
fi

# Add the new size label
gh pr edit $pr_number --add-label "$size_label"

sleep 1 # Avoid rate limiting
done

echo "Done adding size labels!"
42 changes: 42 additions & 0 deletions .github/scripts/revalidate-all-prs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash
# Copyright 2025 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Revalidate all open PRs

echo "Fetching all open PRs..."
PR_NUMBERS=$(gh pr list --limit 50 --json number --jq '.[].number')
TOTAL=$(echo "$PR_NUMBERS" | wc -w | tr -d ' ')

echo "Found $TOTAL open PRs"
echo "Starting revalidation..."
echo ""

COUNT=0
for pr in $PR_NUMBERS; do
COUNT=$((COUNT + 1))
echo "[$COUNT/$TOTAL] Triggering revalidation for PR #$pr..."
gh workflow run revalidate-pr.yml -f pr_number=$pr

# Small delay to avoid rate limiting
sleep 2
done

echo ""
echo "All workflows triggered!"
echo ""
echo "To monitor progress:"
echo " gh run list --workflow=revalidate-pr.yml --limit=$TOTAL"
echo ""
echo "To see results, check comments on each PR"
166 changes: 166 additions & 0 deletions .github/workflows/auto-update-pr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Copyright 2025 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Auto Update PR

on:
push:
branches: [main]
schedule:
# Run daily at 2 AM UTC to catch stale PRs
- cron: '0 2 * * *'
workflow_dispatch:
inputs:
pr_number:
description: 'PR number to update (optional, updates all if not specified)'
required: false
type: string

permissions:
contents: write # Required for updateBranch API
pull-requests: write
issues: write

jobs:
update-prs:
runs-on: ubuntu-latest
concurrency:
group: auto-update-pr-${{ github.event_name }}
cancel-in-progress: true
steps:
- name: Update PRs that are behind main
uses: actions/github-script@v7
with:
script: |
const prNumber = context.payload.inputs?.pr_number;

// Get list of open PRs
const prs = prNumber
? [(await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: parseInt(prNumber)
})).data]
: await github.paginate(github.rest.pulls.list, {
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
sort: 'updated',
direction: 'desc'
});

console.log(`Found ${prs.length} open PRs to check`);

// Constants for comment flood control
const UPDATE_COMMENT_COOLDOWN_DAYS = 7;
const COOLDOWN_MS = UPDATE_COMMENT_COOLDOWN_DAYS * 24 * 60 * 60 * 1000;

for (const pr of prs) {
// Skip bot PRs and drafts
if (pr.user.login.includes('[bot]')) {
console.log(`Skipping bot PR #${pr.number} from ${pr.user.login}`);
continue;
}
if (pr.draft) {
console.log(`Skipping draft PR #${pr.number}`);
continue;
}

try {
// Check if PR is behind main (base...head comparison)
const { data: comparison } = await github.rest.repos.compareCommits({
owner: context.repo.owner,
repo: context.repo.repo,
base: pr.base.ref, // main branch
head: `${pr.head.repo.owner.login}:${pr.head.ref}` // Fully qualified ref for forks
});

if (comparison.behind_by > 0) {
console.log(`PR #${pr.number} is ${comparison.behind_by} commits behind ${pr.base.ref}`);

// Check if the PR allows maintainer edits
if (pr.maintainer_can_modify) {
// Try to update the branch
try {
await github.rest.pulls.updateBranch({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: pr.number
});

console.log(`✅ Updated PR #${pr.number}`);

// Add a comment
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
body: `🔄 **Branch Updated**\n\nYour branch was ${comparison.behind_by} commits behind \`${pr.base.ref}\` and has been automatically updated. CI checks will re-run shortly.`
});
} catch (updateError) {
console.log(`Could not auto-update PR #${pr.number}: ${updateError.message}`);

// Determine the reason for failure
let failureReason = '';
if (updateError.status === 409 || updateError.message.includes('merge conflict')) {
failureReason = '\n\n**Note:** Automatic update failed due to merge conflicts. Please resolve them manually.';
} else if (updateError.status === 422) {
failureReason = '\n\n**Note:** Cannot push to fork. Please update manually.';
}

// Notify the contributor to update manually
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
body: `⚠️ **Branch Update Required**\n\nYour branch is ${comparison.behind_by} commits behind \`${pr.base.ref}\`.${failureReason}\n\nPlease update your branch:\n\n\`\`\`bash\ngit fetch origin ${pr.base.ref}\ngit merge origin/${pr.base.ref}\ngit push\n\`\`\`\n\nOr use GitHub's "Update branch" button if available.`
});
}
} else {
// Can't modify, just notify
console.log(`PR #${pr.number} doesn't allow maintainer edits`);

// Check if we already commented recently (within last 7 days)
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
since: new Date(Date.now() - COOLDOWN_MS).toISOString()
});

const hasRecentUpdateComment = comments.some(c =>
c.body?.includes('Branch Update Required') &&
c.user?.login === 'github-actions[bot]'
);

if (!hasRecentUpdateComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
body: `⚠️ **Branch Update Required**\n\nYour branch is ${comparison.behind_by} commits behind \`${pr.base.ref}\`. Please update your branch to ensure CI checks run with the latest code:\n\n\`\`\`bash\ngit fetch origin ${pr.base.ref}\ngit merge origin/${pr.base.ref}\ngit push\n\`\`\`\n\nNote: Enable "Allow edits by maintainers" to allow automatic updates.`
});
}
}
} else {
console.log(`PR #${pr.number} is up to date`);
}
} catch (error) {
console.error(`Error processing PR #${pr.number}:`, error.message);
}
}

// Log rate limit status
const { data: rateLimit } = await github.rest.rateLimit.get();
console.log(`API rate limit remaining: ${rateLimit.rate.remaining}/${rateLimit.rate.limit}`);
Loading
Loading