Conversation
| ModelMeta(name="gpt-4o-2024-05-13", context=128_000, output=4_096), | ||
| ModelMeta(name="gpt-4o-2024-08-06", context=128_000, output=16_384), | ||
| ModelMeta(name="gpt-4o-2024-11-20", context=128_000, output=16_384), | ||
| ModelMeta(name="o3-mini-2025-01-31", context=200_000, output=100_000), |
There was a problem hiding this comment.
I know that the idea of output token limits can be a little fuzzy for reasoning models (since some of the tokens may be eaten up by the reasoning phase) — do we account for that somehow?
There was a problem hiding this comment.
no we don't, but we didn't before either. it's a good point but it's not something I'm trying to solve / look into here.
bc2/core/control/chunk.py
Outdated
| ), | ||
| # 2. The original existing text should be *complete*, while the addition is not! | ||
| existing_t.original, | ||
| # 3. The initial delimiters are empty, so fill them in from the addition. |
There was a problem hiding this comment.
Very small nit but I think this comment should say
If the initial delimiters are empty, fill them in from the addition
If I understand correctly?
There was a problem hiding this comment.
I could see why you think that, it's true, but not what I'm emphasizing ... the very first delimiters property we see will be empty because we don't know what the value is and the property must be filled in from the first. In subsequent runs, the property won't be empty and won't be filled in from the addition.
| new_text = cast(RedactedText, new).redacted | ||
| else: | ||
| raise ValueError(f"Unsupported return type: {self.return_type}") | ||
| return residual(old_text, new_text, needle_size=window_size) |
There was a problem hiding this comment.
I'm curious from a design perspective, is there any reason to keep this function in common/align.py? This seems like it might be the only place we use it now.
bc2/data/prompts/redact.txt
Outdated
|
|
||
| For people in the following list, replace their name or any associated nickname with the pre-specified placeholder. Do not change this placeholder in any way, use it exactly as it was provided. | ||
| {preset_aliases} | ||
| Replace all names specified by the `RealName` element in the following XML with the pre-specified placeholder given by the `Placeholder` element. Do not change this placeholder in any way, use it exactly as it was provided. Consider variants of the `RealName` if they refer to the same person. |
There was a problem hiding this comment.
Will this handle placeholders for non-human entities (e.g., for a given race; or a restaurant or address) between documents?
There was a problem hiding this comment.
sorry I should've emphasized not to look at anything outside of chunk.py becuase it's still a WIP as I sort out other issues! the prompt included, this is not ready.
NameMapwhich was used for different purposes in different places. Break these into purpose-specific types with explicit names.control:chunkwhich will manage chopping up an input text and feeding it to a redactor. THe redactor can itself be a constrained version of a pipe (via a newcontrol:composemodule) which lets us run the existinginspectinfrastructure as designed, instead of relying on another custom implementation that's intertwined with the redact module.To do:
composemodule with thePipelineto improve both of them, especially w/r/t type checkinginspectare properly accumulated inside theContextobjectContextinspect results are properly fed back into the chunk processorFixes #84