Skip to content

Data Model

Bruce D'Ambrosio edited this page Nov 23, 2025 · 5 revisions

Reasoning needs a working memory. Cognitive Scientists have lots of elaborate models of working memory that have influenced the design of LLM cognitive agents, but I get confused trying to understand how they all fit together. So.

Cognitive workbench implements (currently) 4 layers at which it can work with memory.

Base layer: pieces of text, called Notes, and sets of these, called Collections.

These come with basic CRUD and set-theoretic operations like create-note, create-collection, add (Note or Collection to Collection), remove (Note from Collection), union, set-difference, intersection, is-empty, size, etc. These primitives operate structurally on the Notes and Collections, they do not operate on Note content (well, trivially is-empty and size do). Except for a few primitives (e.g. is-empty), all primitives and tools output either a Note or a Collection.

next layer up is operations on content that don't assume any format other than raw text.

these include index and search. all Notes are indexed (embedded) and are searchable using search-note. Similarly all collections can be searched via search-collections. a particular collection can be indexed using index and searched using search-collection target: . Large notes are segmented and indexed by each segment.Search Primitives:

  • search-notes: Global discovery across all Notes (no target needed). Returns Collection of structured Notes with text preview, metadata.source_id, metadata.uri, metadata.score, metadata.type.
  • search-collections: Global discovery across all Collections (no target needed). Returns Collection of structured Notes with text preview, metadata.source_id, metadata.uri, metadata.score, metadata.type.
  • search-within-collection: Search within a specific indexed Collection (requires target Collection, must be indexed first). Returns Collection of structured Notes with text preview, metadata.source_id, metadata.uri, metadata.score, metadata.type.

All search primitives return structured Notes matching query-web/semantic-scholar format:

  • text: First paragraph preview (200 chars max)
  • format: "text" or "json"
  • metadata.source_id: Original Note/Collection ID
  • metadata.uri: URI field (Note/Collection ID or extracted URI from source)
  • metadata.score: Search relevance score (0.0-1.0)
  • metadata.type: "Note" or "Collection"
  • char_count: Length of text preview

above that come operations that assume JSON note content and provide minimal sql-like content operations.

SQL-like Collection Operations (require dict/JSON Notes):

  • project: Extract specific fields (SELECT columns) → new Collection with subset of fields
  • pluck: Extract single field as simple values → Collection of values
  • filter-structured: Filter by field conditions (WHERE clauses) → filtered Collection
  • sort: Sort by field value (ORDER BY) → sorted Collection
  • join: Combine two Collections on matching field (INNER JOIN) → merged Collection

Use cases:

  • project: Extract URLs from search results, get title+year from papers
  • pluck: Get just titles as simple list, extract scores for analysis
  • filter-structured: Papers after 2020, results with score>0.5, venue contains "NeurIPS"
  • sort: Rank by score (descending), chronological by year, alphabetical by title
  • join: Merge papers with citation data, combine user info with profiles

Clone this wiki locally