Skip to content

Fix remaining OSINT signal text truncation#68

Open
schergr wants to merge 5 commits intocalesthio:masterfrom
schergr:fix/osint-signal-truncation
Open

Fix remaining OSINT signal text truncation#68
schergr wants to merge 5 commits intocalesthio:masterfrom
schergr:fix/osint-signal-truncation

Conversation

@schergr
Copy link

@schergr schergr commented Mar 21, 2026

Summary

  • Remove 120-char truncation in delta engine when building OSINT signals
  • Remove 80-char truncation in memory snapshots for urgent Telegram posts
  • Remove 120-char truncation in ideas/LLM context for OSINT posts
  • Improve signal formatting in Telegram alerts (bulleted list instead of inline)

The prior fix (753c676) removed truncation at source ingestion and alert formatting, but signals were still arriving at the alerter pre-truncated from upstream. The sendMessage chunker already handles Telegram's 4096-char API limit.

Test plan

  • Trigger a sweep with urgent OSINT posts and verify full text appears in Telegram alert
  • Confirm alert messages are properly chunked if they exceed 4096 chars
  • Verify delta engine correctly deduplicates signals with full-length text

🤖 Generated with Claude Code

Greg Scher and others added 2 commits March 20, 2026 16:49
Posts were being cut to 300 chars (source ingestion) and 150 chars
(alert evaluation), losing valuable OSINT context. The sendMessage
chunker already handles the 4096-char Telegram API limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prior fix (753c676) only removed truncation at source ingestion and
alert formatting. Signals were still being cut to 120 chars in the delta
engine, 80 chars in memory snapshots, and 120 chars in the ideas LLM
context — so OSINT posts arrived at the alerter already truncated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@schergr schergr requested a review from calesthio as a code owner March 21, 2026 17:01
Copilot AI review requested due to automatic review settings March 21, 2026 17:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes remaining upstream truncation of urgent Telegram/OSINT signal text so full post content can flow through delta computation, memory snapshots, LLM context, and Telegram alert rendering (with improved “Signals” formatting).

Changes:

  • Removed substring/slice truncation in Telegram source ingestion, delta engine signal construction, and memory snapshot compaction.
  • Updated LLM “ideas” sweep compaction to include full urgent OSINT post text.
  • Improved Telegram alert formatting for signals (more items + bulleted list output).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/llm/ideas.mjs Stops truncating urgent OSINT post text included in LLM ideas context.
lib/delta/memory.mjs Stores full urgent post text in compacted memory snapshots.
lib/delta/engine.mjs Emits full urgent post text in newly-detected OSINT signals.
lib/alerts/telegram.mjs Expands/reshapes OSINT signal text shown in alerts and formats signals as bullets.
apis/sources/telegram.mjs Stops truncating Telegram message text extracted via Bot API and web preview parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@calesthio
Copy link
Owner

Thanks for opening this. The direction makes sense, but there are two issues I think should be fixed before this is merged:

  1. Telegram alert formatting now sends full raw OSINT post text through parse_mode: Markdown without escaping. In the rule-based OSINT surge path, evaluation.signals can now contain full Telegram post bodies, and _formatTieredAlert() renders them as bullet lines. Real post text commonly contains _, brackets, parentheses, and similar Markdown-significant characters. That means alerts can render incorrectly or be rejected by the Bot API altogether. Please either escape Markdown-sensitive characters before formatting or send this section without Markdown parsing.

  2. The ideas LLM context no longer has a length bound for urgent OSINT posts. Keeping full text in storage/delta/memory is reasonable, but compactSweepForLLM() is supposed to stay compact and now it can be dominated by a handful of long Telegram posts. That creates regression risk for latency, cost, and provider-side input-limit failures. Please keep full text upstream, but add an overall size/token cap when building the ideas prompt.

Once those two are addressed, this looks much closer to mergeable.

Greg Scher and others added 2 commits March 23, 2026 12:57
Addresses PR review: escape Markdown-sensitive characters in
_formatTieredAlert signal bullets to prevent Telegram Bot API
rejections, and add a 1500-char budget for URGENT_OSINT in
compactSweepForLLM to bound prompt size while keeping full text upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace single &calesthio#39; handler with generic numeric/hex entity decoder
  so &calesthio#39; and other unpadded entities are properly converted
- Dedup urgent OSINT posts against all hot memory runs (last 3 sweeps)
  instead of only the previous sweep, preventing posts that drop out
  of one sweep from reappearing as "new" in the next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@schergr
Copy link
Author

schergr commented Mar 24, 2026

anything else you need?

@calesthio
Copy link
Owner

calesthio commented Mar 25, 2026

Added a follow-up commit on top of this branch to close the remaining review issues:

  • switched broad OSINT dedup in lib/delta/engine.mjs to prefer stable post identity (postId, or channel/chat + date + text) instead of only the lossy semantic hash
  • preserved channel/post identity in lib/delta/memory.mjs so cross-run dedup has enough information to suppress exact reposts without hiding genuinely new updates
  • aligned signal escaping in lib/alerts/telegram.mjs with the bot's existing legacy Markdown parse mode instead of MarkdownV2-style escaping

Rechecked the branch after the patch: sweep still completes, dashboard inject still runs, the new-post dedup false negative is fixed, and this stays scoped to Telegram/delta/ideas paths without touching jarvis core UI code.

Copy link
Owner

@calesthio calesthio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the updated branch including the follow-up fix commit. The truncation removal adds real value, and with the dedup identity + Markdown escaping fixes in place I don’t see a remaining blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants