Skip to content

[Feature] Prompt compressions to reduce token usage #8819

@Archelunch

Description

@Archelunch

What feature would you like to see?

Add a PromptCompressor module to DSPy that compresses prompts before they are sent to a LLM without losing quality. This idea is inspired by LLMLingua project.

Goal: create module that can use LLMLingua's or compatible with them models. It will be for local usage, but maybe in future it can be upgraded to use remote compression model hosted somewhere else.

Example of usage:

import dspy

# create compressor 
compressor = dspy.PromptCompressor(
    model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
    target_ratio=0.55,           # desired compressed size as fraction of original tokens
    preserve_fields=["username, history"],        # list of regex/keys to preserve fully (e.g., user names)
    max_output_tokens=None,      # hard limit to compressed result if desired
    device="cpu",                # or "cuda"
    cache=True                   # cache results for identical inputs
)

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"), compressor=compressor)
qa = dspy.Predict("question -> answer")
result = qa(question="What is the capital of Belgium?")

# result contains compressed metadata
print(result.compression_metadata)
# {
#   "original_tokens": 128,
#   "reduced_tokens": 56,
#   "ratio": 0.4375,
#   "model_used": "microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
# }

or we can define inside Signature fields that we want to be compressed.

class CheckCitationFaithfulness(dspy.Signature):
    """Verify that the text is based on the provided context."""

    context: str = dspy.InputField(desc="facts here are assumed to be true", compression=True)
    text: str = dspy.InputField(compression=False)
    faithfulness: bool = dspy.OutputField()
    evidence: dict[str, list[str]] = dspy.OutputField(desc="Supporting evidence for claims")

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions