huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.2k
Star 9.9k

Code
Issues 218
Pull requests 26
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

218 Open 1,254 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Tool use does not work on Neuron backend

#3114 opened Mar 14, 2025 by LouisHernandez17

3 of 4 tasks

NotImplementedError: Vlm do not work with prefix caching yet

#3110 opened Mar 13, 2025 by AndriiBihun

2 of 4 tasks

google/gemma-3-27b-it context lenght issue

#3105 opened Mar 13, 2025 by nskpro-cmd

Sharding Error with max_total_tokens and max_input_tokens options in Gemma3-27B-it model

#3104 opened Mar 13, 2025 by calycekr

Moondream2 | TGI Model support | Intel GPU

#3102 opened Mar 12, 2025 by rskasturi

Multi-node inference

#3097 opened Mar 11, 2025 by hrbigelow

[Upstream dependence changes] The behavior about env var in hf-hub has changed.

#3088 opened Mar 8, 2025 by HairlessVillager

Running container rootless does not work anymore

#3082 opened Mar 7, 2025 by scriptator

2 of 4 tasks

Add support for phi-4-mini and phi-4-multimodal

#3071 opened Mar 5, 2025 by farzanehnakhaee70

Do not force port 9000 for prometheus

#3062 opened Feb 26, 2025 by AndreasMadsen

Support for RISC-V?

#3059 opened Feb 26, 2025 by JocelynPanPan

Adapt the response_format closer to OpenAIs format

#3058 opened Feb 25, 2025 by jorado

Inexplicable 'incomplete generation' error

#3050 opened Feb 21, 2025 by mwm5945

2 of 4 tasks

Llama 3.3 70B Weird , gibberish outputs in production setup

#3043 opened Feb 20, 2025 by andresC98

2 of 4 tasks

VRAM usage increases in version 3.1.0

#3038 opened Feb 19, 2025 by aW3st

2 of 4 tasks

TGI metrics don't have model_name label to indicate which model the metrics belong to wontfix

This will not be worked on

#3026 opened Feb 17, 2025 by yashaswipiplani

Unsupported model type xlm-roberta

#3020 opened Feb 13, 2025 by elvizlai

2 of 4 tasks

Warmup fails with Google Flan T5 models

#3019 opened Feb 13, 2025 by TomerG711

2 of 4 tasks

WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090

#3014 opened Feb 11, 2025 by bmilesp

Resource underutilization, thread thrashing: CPU affinity ignores allowed CPUs and cannot be switched off

#3011 opened Feb 11, 2025 by askervin

3 of 4 tasks

Warmup fails for Qwen 2 VL on AMD

#3009 opened Feb 11, 2025 by almersawi

1 of 4 tasks

Quantized BNB-4bit models are not working.

#3005 opened Feb 10, 2025 by v3ss0n

2 of 4 tasks

Nonsense responses with n-gram speculative decoding

#2997 opened Feb 6, 2025 by olliestanley

1 of 4 tasks

Request failed during generation: Server error: Value out of range: -29146814772

#2994 opened Feb 5, 2025 by AlperYildirim1

2 of 4 tasks

Mistral Small 3 : chat template with python functions causes error

#2987 opened Feb 3, 2025 by v3ss0n

2 tasks done

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly