Skip to content

[ARCHITECTURE] Replace Ollama with llama.cpp native C++ integration #425

@mikejmorgan-ai

Description

@mikejmorgan-ai

Problem

Current architecture uses Ollama (Go-based, 500MB+) as a separate installed application. This makes Cortex feel like a bolted-on tool rather than native OS integration.

Solution (from Ed's feedback)

Integrate llama.cpp directly as a C++ library:

  • Native binary integration (no separate process)
  • Links directly into system components
  • Smaller footprint, faster startup
  • Feels like part of the OS, not an app
  • Build custom binaries for natural language processing

Why llama.cpp over Ollama

Ollama llama.cpp
Go binary, 500MB+ C++ library, <50MB
Separate process Links into daemon
Feels like installed app Feels like OS component
Network API overhead Direct function calls

Technical Notes

Acceptance Criteria

  • llama.cpp integrated as shared library or static link
  • Cortex daemon can call inference without spawning external process
  • Works offline with no cloud dependency
  • Startup time < 100ms for inference readiness
  • Memory footprint < 100MB with model loaded
  • Documentation for building from source

Related Issues

Bounty: $150 (+ $150 bonus after funding)

Paid on merge to main.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions