Skip to content

Conversation

kavorite
Copy link

  • Add n_seq_max parameter to Llama class to enable batch embeddings (defaults to 1 for backward compatibility)
  • Add return_numpy support to convert between numpy arrays and lists with zero copies
  • Update normalize_embedding() to keep numpy arrays as numpy arrays for zero-copy efficiency
  • Update test_embed_numpy to use n_seq_max=16 for batch embedding tests

Enables batch embedding support which was previously failing with llama_decode errors due to n_seq_max=1 limitation. This also fixes a bug in a repo I was working on that consumes this functionality to mass index GitHub repos for semantic multivector search on the machine under my desk (luh mao).

- Add n_seq_max parameter to `Llama` class to enable batch embeddings
  (defaults to 1 for backward compatibility)
- Add `return_numpy` support to convert between numpy arrays and lists
  with zero copies
- Update `normalize_embedding()` to keep numpy arrays as numpy arrays for
  zero-copy efficiency
- Update `test_embed_numpy` to use `n_seq_max=16` for batch embedding
  tests

Enables batch embedding support which was previously failing with
llama_decode errors due to `n_seq_max=1` limitation.
Replace 'Any' with proper Union types and add @overload signatures
to provide precise type hints based on input type (str vs List[str]),
return_numpy flag, and return_count flag.

This enables better IDE autocomplete and type checking for callers.
@kavorite
Copy link
Author

this is addressed in #2058, which is cleaner. I think perhaps I will keep this locally until that is merged; if it is reopened it will address the numpy piece only, and hopefully with less bloat due to spuriously altered formatting

@kavorite kavorite closed this Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant