Skip to content

Conversation

@csfet9
Copy link
Contributor

@csfet9 csfet9 commented Jan 2, 2026

Summary

  • Reasoning model support: Strip thinking tags from local LLM responses, enabling Qwen3, DeepSeek, and other reasoning models
  • Docker retry-start script: Wait for dependencies (LLM Studio, database) before starting Hindsight

Changes

Reasoning Model Support (llm_wrapper.py)

Strips thinking tags from local LLM responses:

  • <think>...</think>
  • <thinking>...</thinking>
  • <reasoning>...</reasoning>
  • |startthink|...|endthink|

This enables reasoning models like Qwen3 to work with Hindsight's JSON extraction pipeline. Non-breaking change - only affects responses that contain these tags.

Docker Retry Start Script (retry-start.sh)

New startup script that waits for dependencies:

  • Checks LLM Studio at /v1/models endpoint
  • Checks database connectivity (skipped for embedded pg0)
  • Configurable retries via HINDSIGHT_RETRY_MAX (default: infinite)
  • Configurable interval via HINDSIGHT_RETRY_INTERVAL (default: 10s)

Prevents startup failures when LLM Studio takes time to load models.

Test plan

  • Tested reasoning model support with Qwen3 8B on LM Studio
  • Verified thinking tags are stripped correctly
  • Tested retry-start with LLM Studio dependency
  • Verified embedded pg0 detection works
  • Health endpoint returns healthy after startup

## Reasoning Model Support
- Strip thinking tags from local LLM responses (<think>, <thinking>, <reasoning>, |startthink|/|endthink|)
- Enables Qwen3, DeepSeek, and other reasoning models to work with JSON extraction
- Non-breaking: only affects responses that contain thinking tags

## Docker Retry Start Script
- New retry-start.sh waits for dependencies before starting Hindsight
- Checks LLM Studio availability at /v1/models endpoint
- Checks database connectivity (skipped for embedded pg0)
- Configurable via HINDSIGHT_RETRY_MAX and HINDSIGHT_RETRY_INTERVAL env vars
- Prevents startup failures when LLM Studio isn't ready yet

Tested on Apple Silicon M4 Max with Qwen3 8B via LM Studio.
@@ -0,0 +1,78 @@
#!/bin/bash
# Retry wrapper - waits for dependencies before starting hindsight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you modify the existing file start-all (instead of creating a new one) and indicate what is the problem you are trying to solve?

# Strip reasoning model thinking tags (various formats)
# Supports: <think>, <thinking>, <reasoning>, |startthink|/|endthink|
if content:
original_len = len(content)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the flag in the llm call to not include thinking tokens in the output? this way the user will not pay for those output tokens and we don't have to leverage on this heuristic and problematic algorithm

csfet9 and others added 4 commits January 3, 2026 01:45
- Remove stale pg0 instance data after pre-caching binaries to avoid
  port conflicts (was using hardcoded port 5555 from build time)
- Remove unused cache copy logic from start-all.sh
- Add database backup instructions to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@nicoloboschi nicoloboschi merged commit eea0f27 into vectorize-io:main Jan 5, 2026
16 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants