Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ Current defaults:
- markdown thematic breaks act as section-boundary hints, carrying a short lead-in into the next chunk instead of becoming their own retrieval chunks
- markdown images keep alt text primary in chunk text while recording image references as chunk metadata, and whitelisted HTML blocks currently cover `img` plus `details` / `summary`
- markdown fallback is selective: ordinary supported prose still chunks normally, but policy-rejected markdown like unsupported raw-HTML-only or reference-definition-only content does not fall back through the plain paragraph chunker
- conventional search now uses modest field-aware ranking, prefers title hits over body-only hits when both are relevant, and builds query-aware snippets with multi-term highlights instead of a single fixed-width first-term window
- `makeContext(...)` suppresses redundant same-document chunk text, groups annotated output by document, and skips annotated sections that only have room for labels

Supported today:
Expand All @@ -148,6 +149,7 @@ Supported today:
- use `FetchKitLibrary()` with a default in-memory backend or inject custom `FetchDocumentStore` and `FetchIndex` implementations explicitly
- use a real Core Data-backed `FetchDocumentStore` in `FetchKit` with the first thin macOS SearchKit index backend
- persist and retry pending index-sync work through `FetchKitLibrary.pendingIndexSyncs()` and `retryPendingIndexSyncs(...)`
- return conventional-search results with query-aware snippets and field-aware ranking across title and body matches
- narrow retrieval with typed metadata filters
- preserve meaningful markdown structure for retrieval, including heading paths, list semantics, quote-heavy documents, code-heavy documents, short section breaks, images, and a narrow raw-HTML whitelist
- turn ranked search results into plain or annotated context text for downstream UI or model consumers
Expand Down
8 changes: 5 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,18 +173,19 @@ Planned
### Scope

- [ ] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end.
- [x] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end.
- [ ] Decide whether the SearchKit-backed path needs a dedicated CI lane beyond the current local-only verification helper.
- [ ] Keep the public `FetchKitLibrary` surface polished as the conventional-search side moves from foundation into quality work.

### Tickets

- [ ] Refine ranking behavior for conventional search so the first SearchKit backend feels less like a raw index adapter and more like a library product.
- [ ] Improve snippet behavior and result presentation without bloating `FetchCore` into a larger query or rendering DSL.
- [x] Refine ranking behavior for conventional search so the first SearchKit backend feels less like a raw index adapter and more like a library product.
- [x] Improve snippet behavior and result presentation without bloating `FetchCore` into a larger query or rendering DSL.
- [ ] Revisit the local-only SearchKit verification decision once the opt-in macOS lane has stayed stable long enough to justify a dedicated CI experiment.

### Exit Criteria

- [ ] Conventional-search results feel intentionally ranked and include useful snippet behavior for ordinary app callers.
- [x] Conventional-search results feel intentionally ranked and include useful snippet behavior for ordinary app callers.
- [ ] The team has a clear answer on whether the SearchKit lane should stay local-only, move to a dedicated CI path, or remain intentionally manual.
- [ ] `FetchKitLibrary` still reads like a small Swift-native facade instead of exposing backend detail drift.

Expand Down Expand Up @@ -233,3 +234,4 @@ Planned
- Tightened the persistent `FetchKitLibrary` surface around one resolved storage location, with Application Support defaults plus a direct directory override for local callers.
- Recorded that the GitHub-hosted `macos-15` Natural Language verification attempt timed out, so Apple-asset coverage stays local-only for now.
- Audited the Core Data-backed `FetchKit` store after a GitHub-hosted Swift Testing crash, recorded the executor-assumption findings, moved Core Data verification onto XCTest, and switched the durable store over to a private-queue Core Data context with the framework's async `perform` path.
- Refined conventional-search result quality with modest field-aware ranking plus query-aware multi-term snippets across the in-memory and SearchKit-backed `FetchKit` paths.
101 changes: 51 additions & 50 deletions Sources/FetchKit/InMemoryFetchIndex.swift
Original file line number Diff line number Diff line change
Expand Up @@ -56,18 +56,19 @@ actor InMemoryFetchIndex: FetchIndex {
)
}

guard let bestMatch = matches.max(by: { $0.score < $1.score }) else {
guard !matches.isEmpty else {
return nil
}

let score = matches.reduce(0) { $0 + $1.score }
let snippetMatch = preferredSnippetMatch(from: matches)

return FetchSearchResult(
document: document.searchDocument,
score: bestMatch.score,
snippet: makeSnippet(
text: bestMatch.text,
lowerBound: bestMatch.lowerBound,
upperBound: bestMatch.upperBound
)
score: score,
snippet: snippetMatch.flatMap { match in
FetchSearchSupport.buildSnippet(from: match.text, query: query)
}
)
}

Expand Down Expand Up @@ -99,20 +100,26 @@ actor InMemoryFetchIndex: FetchIndex {

switch query.kind {
case .exactPhrase:
guard let range = lowercaseText.range(of: lowercaseQuery) else {
let phrase = FetchSearchSupport.exactPhraseText(from: query)
guard !phrase.isEmpty, lowercaseText.range(of: phrase) != nil else {
Comment on lines +103 to +104
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve exact-phrase behavior across index backends

The in-memory path now strips quotes before exact-phrase matching, so a query like "bright apple" can match there, but the SearchKit backend still wraps the raw query text in quotes (SearchKitFetchIndex.searchString(for:kind:)), which makes quoted input a literal-quote phrase query. This introduces backend-dependent behavior for the same FetchSearchQuery input and can silently drop results when callers include quotes in .exactPhrase searches.

Useful? React with 👍 / 👎.

return nil
}

return SearchMatch(
field: field,
text: text,
lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound),
upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound),
score: boostedScore(base: 1.0, field: field)
score: boostedScore(base: 1.0, field: field, kind: query.kind)
)
case .prefix:
return prefixMatch(field: field, text: text, lowercaseText: lowercaseText, lowercaseQuery: lowercaseQuery)
case .allTerms, .naturalLanguage:
return allTermsMatch(field: field, text: text, lowercaseText: lowercaseText, lowercaseQuery: lowercaseQuery)
return allTermsMatch(
field: field,
text: text,
lowercaseText: lowercaseText,
lowercaseQuery: lowercaseQuery,
kind: query.kind
)
}
}

Expand All @@ -127,23 +134,23 @@ actor InMemoryFetchIndex: FetchIndex {
return nil
}

guard let range = lowercaseText.range(of: lowercaseQuery) else {
guard lowercaseText.range(of: lowercaseQuery) != nil else {
return nil
}

return SearchMatch(
field: field,
text: text,
lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound),
upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound),
score: boostedScore(base: 0.9, field: field)
score: boostedScore(base: 0.9, field: field, kind: .prefix)
)
}

private func allTermsMatch(
field: FetchSearchField,
text: String,
lowercaseText: String,
lowercaseQuery: String
lowercaseQuery: String,
kind: FetchSearchKind
) -> SearchMatch? {
let terms = lowercaseQuery
.split(whereSeparator: \.isWhitespace)
Expand All @@ -156,54 +163,48 @@ actor InMemoryFetchIndex: FetchIndex {
return nil
}

guard let firstTerm = terms.first, let range = lowercaseText.range(of: firstTerm) else {
guard let firstTerm = terms.first, lowercaseText.range(of: firstTerm) != nil else {
return nil
}

return SearchMatch(
field: field,
text: text,
lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound),
upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound),
score: boostedScore(base: 0.8 + (0.02 * Double(terms.count)), field: field)
score: boostedScore(base: 0.8 + (0.02 * Double(terms.count)), field: field, kind: kind)
)
}

private func boostedScore(base: Double, field: FetchSearchField) -> Double {
switch field {
case .title:
base + 0.1
case .body:
base
}
private func boostedScore(
base: Double,
field: FetchSearchField,
kind: FetchSearchKind
) -> Double {
base
* FetchSearchSupport.fieldWeight(for: field)
* FetchSearchSupport.queryKindWeight(for: kind)
}

private func makeSnippet(text: String, lowerBound: Int, upperBound: Int) -> FetchSnippet {
let snippetRange = snippetBounds(for: text, lowerBound: lowerBound, upperBound: upperBound)
let snippetText = String(text[snippetRange])
let snippetLowerBound = text.distance(from: snippetRange.lowerBound, to: text.index(text.startIndex, offsetBy: lowerBound))
let snippetUpperBound = text.distance(from: snippetRange.lowerBound, to: text.index(text.startIndex, offsetBy: upperBound))

return FetchSnippet(
text: snippetText,
matchRanges: [
FetchMatchRange(
lowerBound: snippetLowerBound,
upperBound: snippetUpperBound
),
]
)
}
private func preferredSnippetMatch(from matches: [SearchMatch]) -> SearchMatch? {
matches.max {
if $0.field == $1.field {
return $0.score < $1.score
}

private func snippetBounds(for text: String, lowerBound: Int, upperBound: Int) -> Range<String.Index> {
let startIndex = text.index(text.startIndex, offsetBy: max(0, lowerBound - 24))
let endIndex = text.index(text.startIndex, offsetBy: min(text.count, upperBound + 24))
return startIndex..<endIndex
if $0.field == .body {
return false
}

if $1.field == .body {
return true
}

return $0.score < $1.score
}
}
}

private struct SearchMatch {
let field: FetchSearchField
let text: String
let lowerBound: Int
let upperBound: Int
let score: Double
}
Loading
Loading