fix(feature-extraction): support multilingual / XLM-R sentence embedders by s-zx · Pull Request #33 · s-zx/edgeFlow.js

s-zx · 2026-05-12T11:39:12Z

Two changes that unblock loading non-English sentence embedders such as paraphrase-multilingual-MiniLM-L12-v2:

PipelineConfig.tokenizerUrl — make the tokenizer URL overridable. The constructor previously hard-coded all-MiniLM-L6-v2's English-only tokenizer.json regardless of which model URL the user passed in.
Conditional token_type_ids — only emit this input when the loaded ONNX model declares it (looked up from session.inputNames via ModelMetadata). XLM-R, RoBERTa and multilingual MiniLM omit token_type_ids; unconditionally feeding it tripped onnxruntime-web with "invalid input" errors.

Summary

Brief description of the changes.

Motivation

Why is this change needed?

Changes

Change 1
Change 2

Testing

Unit tests pass (npm run test:unit)
TypeScript compiles (npx tsc --noEmit)
Lint passes (npm run lint)
Tested in browser (if applicable)

Breaking Changes

List any breaking changes, or "None".

Two changes that unblock loading non-English sentence embedders such as paraphrase-multilingual-MiniLM-L12-v2: 1. PipelineConfig.tokenizerUrl — make the tokenizer URL overridable. The constructor previously hard-coded all-MiniLM-L6-v2's English-only tokenizer.json regardless of which model URL the user passed in. 2. Conditional token_type_ids — only emit this input when the loaded ONNX model declares it (looked up from session.inputNames via ModelMetadata). XLM-R, RoBERTa and multilingual MiniLM omit token_type_ids; unconditionally feeding it tripped onnxruntime-web with "invalid input" errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-12T11:39:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
edge-flow-js	Error		May 12, 2026 11:39am

Bumps to 0.2.0 (minor — new PipelineConfig.tokenizerUrl field shipped in #33 is additive backwards-compatible API surface). Also clears two pre-existing build/test failures that were blocking the release pipeline (unrelated to #33): - src/pipelines/question-answering.ts — remove dead private method tokenOffsetToCharOffset; TS6133 under noUnusedLocals. - tests/unit/runtime.test.ts — registerAllBackends() is sync void, so expect(...).resolves on its return value crashed vitest. Switch to expect(() => registerAllBackends()).not.toThrow(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

s-zx merged commit ea82d43 into main May 12, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(feature-extraction): support multilingual / XLM-R sentence embedders#33

fix(feature-extraction): support multilingual / XLM-R sentence embedders#33
s-zx merged 1 commit into
mainfrom
fix/multilingual-sentence-embedder

s-zx commented May 12, 2026

Uh oh!

vercel Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s-zx commented May 12, 2026

Summary

Motivation

Changes

Testing

Breaking Changes

Uh oh!

vercel Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant