diff --git a/README.md b/README.md index 16884b9..5da3c9c 100644 --- a/README.md +++ b/README.md @@ -138,6 +138,7 @@ Current defaults: - markdown thematic breaks act as section-boundary hints, carrying a short lead-in into the next chunk instead of becoming their own retrieval chunks - markdown images keep alt text primary in chunk text while recording image references as chunk metadata, and whitelisted HTML blocks currently cover `img` plus `details` / `summary` - markdown fallback is selective: ordinary supported prose still chunks normally, but policy-rejected markdown like unsupported raw-HTML-only or reference-definition-only content does not fall back through the plain paragraph chunker +- conventional search now uses modest field-aware ranking, prefers title hits over body-only hits when both are relevant, and builds query-aware snippets with multi-term highlights instead of a single fixed-width first-term window - `makeContext(...)` suppresses redundant same-document chunk text, groups annotated output by document, and skips annotated sections that only have room for labels Supported today: @@ -148,6 +149,7 @@ Supported today: - use `FetchKitLibrary()` with a default in-memory backend or inject custom `FetchDocumentStore` and `FetchIndex` implementations explicitly - use a real Core Data-backed `FetchDocumentStore` in `FetchKit` with the first thin macOS SearchKit index backend - persist and retry pending index-sync work through `FetchKitLibrary.pendingIndexSyncs()` and `retryPendingIndexSyncs(...)` +- return conventional-search results with query-aware snippets and field-aware ranking across title and body matches - narrow retrieval with typed metadata filters - preserve meaningful markdown structure for retrieval, including heading paths, list semantics, quote-heavy documents, code-heavy documents, short section breaks, images, and a narrow raw-HTML whitelist - turn ranked search results into plain or annotated context text for downstream UI or model consumers diff --git a/ROADMAP.md b/ROADMAP.md index d4edb2b..39b35f6 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -173,18 +173,19 @@ Planned ### Scope - [ ] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end. +- [x] Refine conventional-search ranking and snippet behavior now that the first SearchKit backend works end to end. - [ ] Decide whether the SearchKit-backed path needs a dedicated CI lane beyond the current local-only verification helper. - [ ] Keep the public `FetchKitLibrary` surface polished as the conventional-search side moves from foundation into quality work. ### Tickets -- [ ] Refine ranking behavior for conventional search so the first SearchKit backend feels less like a raw index adapter and more like a library product. -- [ ] Improve snippet behavior and result presentation without bloating `FetchCore` into a larger query or rendering DSL. +- [x] Refine ranking behavior for conventional search so the first SearchKit backend feels less like a raw index adapter and more like a library product. +- [x] Improve snippet behavior and result presentation without bloating `FetchCore` into a larger query or rendering DSL. - [ ] Revisit the local-only SearchKit verification decision once the opt-in macOS lane has stayed stable long enough to justify a dedicated CI experiment. ### Exit Criteria -- [ ] Conventional-search results feel intentionally ranked and include useful snippet behavior for ordinary app callers. +- [x] Conventional-search results feel intentionally ranked and include useful snippet behavior for ordinary app callers. - [ ] The team has a clear answer on whether the SearchKit lane should stay local-only, move to a dedicated CI path, or remain intentionally manual. - [ ] `FetchKitLibrary` still reads like a small Swift-native facade instead of exposing backend detail drift. @@ -233,3 +234,4 @@ Planned - Tightened the persistent `FetchKitLibrary` surface around one resolved storage location, with Application Support defaults plus a direct directory override for local callers. - Recorded that the GitHub-hosted `macos-15` Natural Language verification attempt timed out, so Apple-asset coverage stays local-only for now. - Audited the Core Data-backed `FetchKit` store after a GitHub-hosted Swift Testing crash, recorded the executor-assumption findings, moved Core Data verification onto XCTest, and switched the durable store over to a private-queue Core Data context with the framework's async `perform` path. +- Refined conventional-search result quality with modest field-aware ranking plus query-aware multi-term snippets across the in-memory and SearchKit-backed `FetchKit` paths. diff --git a/Sources/FetchKit/InMemoryFetchIndex.swift b/Sources/FetchKit/InMemoryFetchIndex.swift index 4082fce..944a2f4 100644 --- a/Sources/FetchKit/InMemoryFetchIndex.swift +++ b/Sources/FetchKit/InMemoryFetchIndex.swift @@ -56,18 +56,19 @@ actor InMemoryFetchIndex: FetchIndex { ) } - guard let bestMatch = matches.max(by: { $0.score < $1.score }) else { + guard !matches.isEmpty else { return nil } + let score = matches.reduce(0) { $0 + $1.score } + let snippetMatch = preferredSnippetMatch(from: matches) + return FetchSearchResult( document: document.searchDocument, - score: bestMatch.score, - snippet: makeSnippet( - text: bestMatch.text, - lowerBound: bestMatch.lowerBound, - upperBound: bestMatch.upperBound - ) + score: score, + snippet: snippetMatch.flatMap { match in + FetchSearchSupport.buildSnippet(from: match.text, query: query) + } ) } @@ -99,20 +100,26 @@ actor InMemoryFetchIndex: FetchIndex { switch query.kind { case .exactPhrase: - guard let range = lowercaseText.range(of: lowercaseQuery) else { + let phrase = FetchSearchSupport.exactPhraseText(from: query) + guard !phrase.isEmpty, lowercaseText.range(of: phrase) != nil else { return nil } return SearchMatch( + field: field, text: text, - lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound), - upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound), - score: boostedScore(base: 1.0, field: field) + score: boostedScore(base: 1.0, field: field, kind: query.kind) ) case .prefix: return prefixMatch(field: field, text: text, lowercaseText: lowercaseText, lowercaseQuery: lowercaseQuery) case .allTerms, .naturalLanguage: - return allTermsMatch(field: field, text: text, lowercaseText: lowercaseText, lowercaseQuery: lowercaseQuery) + return allTermsMatch( + field: field, + text: text, + lowercaseText: lowercaseText, + lowercaseQuery: lowercaseQuery, + kind: query.kind + ) } } @@ -127,15 +134,14 @@ actor InMemoryFetchIndex: FetchIndex { return nil } - guard let range = lowercaseText.range(of: lowercaseQuery) else { + guard lowercaseText.range(of: lowercaseQuery) != nil else { return nil } return SearchMatch( + field: field, text: text, - lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound), - upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound), - score: boostedScore(base: 0.9, field: field) + score: boostedScore(base: 0.9, field: field, kind: .prefix) ) } @@ -143,7 +149,8 @@ actor InMemoryFetchIndex: FetchIndex { field: FetchSearchField, text: String, lowercaseText: String, - lowercaseQuery: String + lowercaseQuery: String, + kind: FetchSearchKind ) -> SearchMatch? { let terms = lowercaseQuery .split(whereSeparator: \.isWhitespace) @@ -156,54 +163,48 @@ actor InMemoryFetchIndex: FetchIndex { return nil } - guard let firstTerm = terms.first, let range = lowercaseText.range(of: firstTerm) else { + guard let firstTerm = terms.first, lowercaseText.range(of: firstTerm) != nil else { return nil } return SearchMatch( + field: field, text: text, - lowerBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound), - upperBound: lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound), - score: boostedScore(base: 0.8 + (0.02 * Double(terms.count)), field: field) + score: boostedScore(base: 0.8 + (0.02 * Double(terms.count)), field: field, kind: kind) ) } - private func boostedScore(base: Double, field: FetchSearchField) -> Double { - switch field { - case .title: - base + 0.1 - case .body: - base - } + private func boostedScore( + base: Double, + field: FetchSearchField, + kind: FetchSearchKind + ) -> Double { + base + * FetchSearchSupport.fieldWeight(for: field) + * FetchSearchSupport.queryKindWeight(for: kind) } - private func makeSnippet(text: String, lowerBound: Int, upperBound: Int) -> FetchSnippet { - let snippetRange = snippetBounds(for: text, lowerBound: lowerBound, upperBound: upperBound) - let snippetText = String(text[snippetRange]) - let snippetLowerBound = text.distance(from: snippetRange.lowerBound, to: text.index(text.startIndex, offsetBy: lowerBound)) - let snippetUpperBound = text.distance(from: snippetRange.lowerBound, to: text.index(text.startIndex, offsetBy: upperBound)) - - return FetchSnippet( - text: snippetText, - matchRanges: [ - FetchMatchRange( - lowerBound: snippetLowerBound, - upperBound: snippetUpperBound - ), - ] - ) - } + private func preferredSnippetMatch(from matches: [SearchMatch]) -> SearchMatch? { + matches.max { + if $0.field == $1.field { + return $0.score < $1.score + } - private func snippetBounds(for text: String, lowerBound: Int, upperBound: Int) -> Range { - let startIndex = text.index(text.startIndex, offsetBy: max(0, lowerBound - 24)) - let endIndex = text.index(text.startIndex, offsetBy: min(text.count, upperBound + 24)) - return startIndex.. [FetchSearchResult] { + query: FetchSearchQuery + ) throws -> [FieldSearchMatch] { guard let search = SKSearchCreate( managedIndex.index, searchString as CFString, @@ -232,10 +232,10 @@ public actor SearchKitFetchIndex: FetchIndex { return [] } - var results: [FetchSearchResult] = [] - let fetchCount = max(limit, 16) + var results: [FieldSearchMatch] = [] + let fetchCount = max(query.limit, 16) - while results.count < limit { + while results.count < query.limit { var documentIDs = Array(repeating: 0, count: fetchCount) var scores = Array(repeating: 0, count: fetchCount) var foundCount: CFIndex = 0 @@ -254,18 +254,18 @@ public actor SearchKitFetchIndex: FetchIndex { } for offset in 0.. FetchSearchResult? { + query: FetchSearchQuery + ) throws -> FieldSearchMatch? { guard let document = SKIndexCopyDocumentForDocumentID( managedIndex.index, documentID @@ -300,75 +300,24 @@ public actor SearchKitFetchIndex: FetchIndex { } let snippetSource = field == .title ? (fetchDocument.title ?? fetchDocument.body) : fetchDocument.body - return FetchSearchResult( + return FieldSearchMatch( document: fetchDocument, score: score, - snippet: makeSnippet(from: snippetSource, query: query) - ) - } - - private func makeSnippet(from text: String, query: String) -> FetchSnippet? { - let terms = query - .replacingOccurrences(of: "\"", with: "") - .replacingOccurrences(of: "&", with: " ") - .replacingOccurrences(of: "*", with: "") - .split(whereSeparator: \.isWhitespace) - .map(String.init) - .filter { !$0.isEmpty } - - guard let term = terms.first else { - return nil - } - - let lowercaseText = text.lowercased() - guard let range = lowercaseText.range(of: term.lowercased()) else { - return FetchSnippet(text: String(text.prefix(80))) - } - - let lowerBound = lowercaseText.distance(from: lowercaseText.startIndex, to: range.lowerBound) - let upperBound = lowercaseText.distance(from: lowercaseText.startIndex, to: range.upperBound) - let snippetRange = snippetBounds(for: text, lowerBound: lowerBound, upperBound: upperBound) - let snippetText = String(text[snippetRange]) - let highlightLowerBound = text.distance( - from: snippetRange.lowerBound, - to: text.index(text.startIndex, offsetBy: lowerBound) - ) - let highlightUpperBound = text.distance( - from: snippetRange.lowerBound, - to: text.index(text.startIndex, offsetBy: upperBound) - ) - - return FetchSnippet( - text: snippetText, - matchRanges: [ - FetchMatchRange( - lowerBound: highlightLowerBound, - upperBound: highlightUpperBound - ), - ] + snippet: FetchSearchSupport.buildSnippet(from: snippetSource, query: query), + field: field ) } - private func snippetBounds( - for text: String, - lowerBound: Int, - upperBound: Int - ) -> Range { - let start = text.index(text.startIndex, offsetBy: max(0, lowerBound - 24)) - let end = text.index(text.startIndex, offsetBy: min(text.count, upperBound + 24)) - return start.. FetchSearchResult { guard let existing else { - return new + return new.result } - let score = max(existing.score, new.score) - let snippet = existing.snippet ?? new.snippet + let score = existing.score + new.score + let snippet = preferredSnippet(existing: existing, new: new) return FetchSearchResult( document: existing.document, score: score, @@ -376,6 +325,37 @@ public actor SearchKitFetchIndex: FetchIndex { ) } + private func preferredSnippet( + existing: FetchSearchResult, + new: FieldSearchMatch + ) -> FetchSnippet? { + if new.field == .body, new.snippet != nil { + return new.snippet + } + + return existing.snippet ?? new.snippet + } + + private func normalize( + matches: [FieldSearchMatch], + for field: FetchSearchField, + kind: FetchSearchKind + ) -> [FieldSearchMatch] { + guard let maxScore = matches.map(\.score).max(), maxScore > 0 else { + return matches + } + + let weight = FetchSearchSupport.fieldWeight(for: field) * FetchSearchSupport.queryKindWeight(for: kind) + return matches.map { match in + FieldSearchMatch( + document: match.document, + score: (match.score / maxScore) * weight, + snippet: match.snippet, + field: match.field + ) + } + } + private func makeSearchKitDocument(id: FetchDocumentID) -> SKDocument { SKDocumentCreate( "swiftlyfetch" as CFString, @@ -423,6 +403,21 @@ public actor SearchKitFetchIndex: FetchIndex { } } +private struct FieldSearchMatch { + let document: FetchDocument + let score: Double + let snippet: FetchSnippet? + let field: FetchSearchField + + var result: FetchSearchResult { + FetchSearchResult( + document: document, + score: score, + snippet: snippet + ) + } +} + private final class ManagedIndex { let index: SKIndex private let mutableData: CFMutableData? diff --git a/Sources/FetchKit/SearchSupport.swift b/Sources/FetchKit/SearchSupport.swift new file mode 100644 index 0000000..35427da --- /dev/null +++ b/Sources/FetchKit/SearchSupport.swift @@ -0,0 +1,293 @@ +import FetchCore +import Foundation + +enum FetchSearchSupport { + static func normalizedTerms(from query: FetchSearchQuery) -> [String] { + let normalizedText = query.text.trimmingCharacters(in: .whitespacesAndNewlines) + guard !normalizedText.isEmpty else { + return [] + } + + let cleaned = normalizedText + .replacingOccurrences(of: "\"", with: " ") + .replacingOccurrences(of: "&", with: " ") + .replacingOccurrences(of: "*", with: " ") + + var seen = Set() + return cleaned + .split(whereSeparator: \.isWhitespace) + .map(String.init) + .map { $0.lowercased() } + .filter { !$0.isEmpty && seen.insert($0).inserted } + } + + static func exactPhraseText(from query: FetchSearchQuery) -> String { + query.text + .trimmingCharacters(in: .whitespacesAndNewlines) + .replacingOccurrences(of: "\"", with: "") + .lowercased() + } + + static func fieldWeight(for field: FetchSearchField) -> Double { + switch field { + case .title: + 1.2 + case .body: + 1.0 + } + } + + static func queryKindWeight(for kind: FetchSearchKind) -> Double { + switch kind { + case .exactPhrase: + 1.3 + case .prefix: + 1.1 + case .allTerms: + 1.0 + case .naturalLanguage: + 0.95 + } + } + + static func buildSnippet( + from text: String, + query: FetchSearchQuery, + preferredLength: Int = 120, + leadingContext: Int = 36 + ) -> FetchSnippet? { + let terms = normalizedTerms(from: query) + guard !terms.isEmpty else { + return nil + } + + let matches = allMatches(in: text, terms: terms) + guard !matches.isEmpty else { + return FetchSnippet(text: fallbackSnippetText(from: text, limit: preferredLength)) + } + + let snippetBounds = bestSnippetBounds( + in: text, + matches: matches, + preferredLength: preferredLength, + leadingContext: leadingContext + ) + let snippetText = truncatedSnippetText( + from: text, + bounds: snippetBounds + ) + let matchRanges = matches.compactMap { snippetRange(for: $0, within: snippetBounds, in: text) } + + return FetchSnippet( + text: snippetText, + matchRanges: matchRanges + ) + } + + private static func allMatches(in text: String, terms: [String]) -> [TermMatch] { + let lowercaseText = text.lowercased() + var matches: [TermMatch] = [] + + for term in terms { + var searchStart = lowercaseText.startIndex + + while searchStart < lowercaseText.endIndex, + let range = lowercaseText.range(of: term, range: searchStart.. Range { + let textCount = text.count + var bestWindow: SnippetWindow? + + for anchor in matches { + let proposedStart = max(0, anchor.lowerBound - leadingContext) + let proposedEnd = min(textCount, max(anchor.upperBound, proposedStart + preferredLength)) + let candidate = evaluateWindow( + in: text, + matches: matches, + startOffset: proposedStart, + endOffset: proposedEnd + ) + + if bestWindow.map({ candidate.isBetter(than: $0) }) ?? true { + bestWindow = candidate + } + } + + let selected = bestWindow ?? evaluateWindow( + in: text, + matches: matches, + startOffset: 0, + endOffset: min(textCount, preferredLength) + ) + + return selected.range + } + + private static func evaluateWindow( + in text: String, + matches: [TermMatch], + startOffset: Int, + endOffset: Int + ) -> SnippetWindow { + let adjustedStart = adjustSnippetStart(in: text, offset: startOffset) + let adjustedEnd = adjustSnippetEnd(in: text, offset: endOffset) + let range = text.index(text.startIndex, offsetBy: adjustedStart).. adjustedStart } + let distinctTerms = Set(includedMatches.map(\.term)).count + + return SnippetWindow( + range: range, + startOffset: adjustedStart, + endOffset: adjustedEnd, + distinctTermCount: distinctTerms, + totalMatchCount: includedMatches.count + ) + } + + private static func adjustSnippetStart(in text: String, offset: Int) -> Int { + guard offset > 0 else { + return 0 + } + + var current = offset + while current > 0 { + let index = text.index(text.startIndex, offsetBy: current) + let character = text[text.index(before: index)] + if character.isWhitespace || character.isSentenceBoundaryPunctuation { + break + } + current -= 1 + } + + return current + } + + private static func adjustSnippetEnd(in text: String, offset: Int) -> Int { + guard offset < text.count else { + return text.count + } + + var current = offset + while current < text.count { + let index = text.index(text.startIndex, offsetBy: current) + let character = text[index] + if character.isWhitespace || character.isSentenceBoundaryPunctuation { + break + } + current += 1 + } + + return current + } + + private static func snippetRange( + for match: TermMatch, + within snippetBounds: Range, + in text: String + ) -> FetchMatchRange? { + let snippetStartOffset = text.distance(from: text.startIndex, to: snippetBounds.lowerBound) + let snippetEndOffset = text.distance(from: text.startIndex, to: snippetBounds.upperBound) + + guard match.lowerBound < snippetEndOffset, match.upperBound > snippetStartOffset else { + return nil + } + + let clampedLower = max(match.lowerBound, snippetStartOffset) + let clampedUpper = min(match.upperBound, snippetEndOffset) + + return FetchMatchRange( + lowerBound: clampedLower - snippetStartOffset, + upperBound: clampedUpper - snippetStartOffset + ) + } + + private static func fallbackSnippetText(from text: String, limit: Int) -> String { + let prefix = String(text.prefix(limit)).trimmingCharacters(in: .whitespacesAndNewlines) + guard text.count > limit else { + return prefix + } + + return prefix + "…" + } + + private static func truncatedSnippetText( + from text: String, + bounds: Range + ) -> String { + let hasLeadingOmission = bounds.lowerBound > text.startIndex + let hasTrailingOmission = bounds.upperBound < text.endIndex + let base = String(text[bounds]).trimmingCharacters(in: .whitespacesAndNewlines) + + var snippet = base + if hasLeadingOmission { + snippet = "…" + snippet + } + if hasTrailingOmission { + snippet += "…" + } + + return snippet + } +} + +private struct TermMatch { + let term: String + let lowerBound: Int + let upperBound: Int +} + +private struct SnippetWindow { + let range: Range + let startOffset: Int + let endOffset: Int + let distinctTermCount: Int + let totalMatchCount: Int + + func isBetter(than other: SnippetWindow) -> Bool { + if distinctTermCount != other.distinctTermCount { + return distinctTermCount > other.distinctTermCount + } + + if totalMatchCount != other.totalMatchCount { + return totalMatchCount > other.totalMatchCount + } + + if startOffset != other.startOffset { + return startOffset < other.startOffset + } + + return (endOffset - startOffset) < (other.endOffset - other.startOffset) + } +} + +private extension Character { + var isSentenceBoundaryPunctuation: Bool { + self == "." || self == "," || self == ":" || self == ";" || self == "!" || self == "?" + } +} diff --git a/Tests/FetchKitTests/FetchKitLibraryTests.swift b/Tests/FetchKitTests/FetchKitLibraryTests.swift index a85630c..b5344a2 100644 --- a/Tests/FetchKitTests/FetchKitLibraryTests.swift +++ b/Tests/FetchKitTests/FetchKitLibraryTests.swift @@ -118,6 +118,90 @@ struct FetchKitLibraryTests { #expect(results[0].snippet?.text.contains("bright") == true) } + @Test("FetchKitLibrary prefers title matches over body-only matches") + func fetchKitLibraryPrefersTitleMatches() async throws { + let library = FetchKitLibrary() + + try await library.addDocuments([ + FetchDocumentRecord( + id: "doc-title", + title: "Apple Guide", + body: "General orchard notes." + ), + FetchDocumentRecord( + id: "doc-body", + title: "Orchard Notes", + body: "This document talks about apple harvest timing." + ), + ]) + + let results = try await library.search("apple", fields: [.title, .body], limit: 5) + + #expect(results.count == 2) + #expect(results.map(\.document.id) == ["doc-title", "doc-body"]) + } + + @Test("FetchKitLibrary snippets highlight multiple query terms") + func fetchKitLibraryHighlightsMultipleQueryTerms() async throws { + let library = FetchKitLibrary() + + try await library.addDocument( + FetchDocumentRecord( + id: "doc-apple", + title: "Apple Guide", + body: "Apples stay bright and crisp through the fall harvest season." + ) + ) + + let results = try await library.search("bright crisp", fields: [.body], limit: 1) + let snippet = try #require(results.first?.snippet) + + #expect(snippet.text.localizedCaseInsensitiveContains("bright")) + #expect(snippet.text.localizedCaseInsensitiveContains("crisp")) + #expect(snippet.matchRanges.count >= 2) + } + + @Test("FetchKitLibrary snippets show truncation markers when context is cropped") + func fetchKitLibrarySnippetShowsTruncationMarkers() async throws { + let library = FetchKitLibrary() + + try await library.addDocument( + FetchDocumentRecord( + id: "doc-apple", + title: "Apple Guide", + body: "Introductory orchard notes cover storage, pruning, rootstock selection, irrigation strategy, and pollination planning before the bright apple section becomes especially relevant for fall harvest planning and storage." + ) + ) + + let results = try await library.search("bright apple section", fields: [.body], limit: 1) + let snippet = try #require(results.first?.snippet) + + #expect(snippet.text.hasPrefix("…")) + #expect(snippet.text.hasSuffix("…")) + } + + @Test("FetchKitLibrary exact phrase queries outrank prefix-style body matches") + func fetchKitLibraryExactPhraseOutranksPrefixMatches() async throws { + let library = FetchKitLibrary() + + try await library.addDocuments([ + FetchDocumentRecord( + id: "doc-phrase", + title: "Harvest Guide", + body: "The exact bright apple phrase appears together here." + ), + FetchDocumentRecord( + id: "doc-prefix", + title: "Harvest Guide", + body: "Bright fruit notes mention apples nearby but not as an exact phrase." + ), + ]) + + let results = try await library.search("\"bright apple\"", kind: .exactPhrase, fields: [.body], limit: 5) + + #expect(results.map(\.document.id) == ["doc-phrase"]) + } + @Test("FetchKitLibrary surfaces pending indexing changes when the index apply step fails") func fetchKitLibrarySurfacesPendingIndexingChanges() async throws { let store = RecordingFetchDocumentStore() diff --git a/Tests/FetchKitTests/SearchKitFetchIndexTests.swift b/Tests/FetchKitTests/SearchKitFetchIndexTests.swift index 57f0432..a246f0e 100644 --- a/Tests/FetchKitTests/SearchKitFetchIndexTests.swift +++ b/Tests/FetchKitTests/SearchKitFetchIndexTests.swift @@ -52,6 +52,133 @@ final class SearchKitFetchIndexTests: XCTestCase { XCTAssertEqual(bodyResults.first?.snippet?.text.contains("juicy"), true) } + func testSearchKitFetchIndexPrefersTitleMatchesOverBodyOnlyMatches() async throws { + let index = try SearchKitFetchIndex( + configuration: .init( + storage: .inMemory, + indexNamePrefix: "SearchKitFetchIndexTests-\(UUID().uuidString)" + ) + ) + + try await index.apply( + FetchIndexingChangeset([ + .upsert( + FetchIndexDocument( + id: "doc-title", + title: "Apple Guide", + body: "General orchard notes." + ) + ), + .upsert( + FetchIndexDocument( + id: "doc-body", + title: "Orchard Notes", + body: "This document covers apple harvest timing." + ) + ), + ]) + ) + + let results = try await index.search( + FetchSearchQuery("apple", kind: .naturalLanguage, fields: [.title, .body], limit: 5) + ) + + XCTAssertEqual(results.map(\.document.id), ["doc-title", "doc-body"]) + } + + func testSearchKitFetchIndexHighlightsMultipleQueryTermsInSnippets() async throws { + let index = try SearchKitFetchIndex( + configuration: .init( + storage: .inMemory, + indexNamePrefix: "SearchKitFetchIndexTests-\(UUID().uuidString)" + ) + ) + + try await index.apply( + FetchIndexingChangeset([ + .upsert( + FetchIndexDocument( + id: "doc-apple", + title: "Apple Guide", + body: "Apples stay bright and crisp through the fall harvest season." + ) + ), + ]) + ) + + let results = try await index.search( + FetchSearchQuery("bright crisp", kind: .naturalLanguage, fields: [.body], limit: 1) + ) + + XCTAssertEqual(results.count, 1) + XCTAssertEqual(results.first?.snippet?.text.localizedCaseInsensitiveContains("bright"), true) + XCTAssertEqual(results.first?.snippet?.text.localizedCaseInsensitiveContains("crisp"), true) + XCTAssertGreaterThanOrEqual(results.first?.snippet?.matchRanges.count ?? 0, 2) + } + + func testSearchKitFetchIndexShowsSnippetTruncationMarkers() async throws { + let index = try SearchKitFetchIndex( + configuration: .init( + storage: .inMemory, + indexNamePrefix: "SearchKitFetchIndexTests-\(UUID().uuidString)" + ) + ) + + try await index.apply( + FetchIndexingChangeset([ + .upsert( + FetchIndexDocument( + id: "doc-apple", + title: "Apple Guide", + body: "Introductory orchard notes cover storage, pruning, rootstock selection, irrigation strategy, and pollination planning before the bright apple section becomes especially relevant for fall harvest planning and storage." + ) + ), + ]) + ) + + let results = try await index.search( + FetchSearchQuery("bright apple section", kind: .naturalLanguage, fields: [.body], limit: 1) + ) + + XCTAssertEqual(results.count, 1) + XCTAssertEqual(results.first?.snippet?.text.hasPrefix("…"), true) + XCTAssertEqual(results.first?.snippet?.text.hasSuffix("…"), true) + } + + func testSearchKitFetchIndexExactPhraseQueriesPreferExactPhraseDocuments() async throws { + let index = try SearchKitFetchIndex( + configuration: .init( + storage: .inMemory, + indexNamePrefix: "SearchKitFetchIndexTests-\(UUID().uuidString)" + ) + ) + + try await index.apply( + FetchIndexingChangeset([ + .upsert( + FetchIndexDocument( + id: "doc-phrase", + title: "Harvest Guide", + body: "The exact bright apple phrase appears together here." + ) + ), + .upsert( + FetchIndexDocument( + id: "doc-prefix", + title: "Harvest Guide", + body: "Bright fruit notes mention apples nearby but not as an exact phrase." + ) + ), + ]) + ) + + let results = try await index.search( + FetchSearchQuery("bright apple", kind: .exactPhrase, fields: [.body], limit: 5) + ) + + XCTAssertEqual(results.map(\.document.id), ["doc-phrase"]) + } + func testSearchKitFetchIndexRemovesDocumentsFromSearchResults() async throws { let index = try SearchKitFetchIndex( configuration: .init( diff --git a/docs/maintainers/fetchkit-product-plan.md b/docs/maintainers/fetchkit-product-plan.md index d751f8e..3749cb3 100644 --- a/docs/maintainers/fetchkit-product-plan.md +++ b/docs/maintainers/fetchkit-product-plan.md @@ -82,6 +82,7 @@ Current status: - the Search Kit crash isolation pass found that `SKIndex` teardown needed unretained adoption on create/open, and the direct opt-in Search Kit verification lane is green again under both `swift test` and `xcodebuild test` - that Search Kit verification lane is still local-only for now, while the repo defers any dedicated CI story for it - the persistent `FetchKitLibrary` construction path is now intentionally caller-shaped around one storage location, with an Application Support default plus a direct directory override, instead of asking app code to assemble separate Core Data and Search Kit URLs itself +- the first refinement pass on conventional-search result quality is now in place: SearchKit scores are normalized per field, title hits get a modest weight bump, cross-field matches accumulate instead of collapsing to the single best field, and snippets now highlight multiple query terms instead of showing only the first term in a fixed-width window - the CI investigation on GitHub-hosted macOS found that the Core Data-backed store path could abort under Swift Testing with `Incorrect actor executor assumption`, even after global test parallelism was disabled - that investigation surfaced two store-shape fixes worth keeping regardless of the runner: the durable Core Data store should use a private-queue background context instead of `viewContext`, and it should use Core Data's async `perform` API directly instead of manually bridging context work through checked continuations - the Core Data-backed store coverage now lives on XCTest rather than Swift Testing so the package keeps the newer test surface where it is stable while reserving the older runner for framework-heavy Core Data verification @@ -157,10 +158,9 @@ That pass landed: The next work is refinement, not first architecture: -- improve conventional-search ranking behavior -- improve snippet behavior and result presentation - keep the persistent `FetchKitLibrary` surface polished as real callers exercise it - decide later whether the local-only SearchKit verification lane deserves dedicated CI +- decide whether the current ranking and snippet heuristics are already enough for ordinary callers or whether real corpora show a need for another refinement pass ## First Core Data Entity Shape