Skip to content

[clang] SourceManager: Cache offsets for LastFileIDLookup to speed up getFileID #146782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions clang/include/clang/Basic/SourceManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -767,6 +767,8 @@ class SourceManager : public RefCountedBase<SourceManager> {
/// LastFileIDLookup records the last FileID looked up or created, because it
/// is very common to look up many tokens from the same file.
mutable FileID LastFileIDLookup;
mutable SourceLocation::UIntTy LastLookupStartOffset;
mutable SourceLocation::UIntTy LastLookupEndOffset; // exclude

/// Holds information for \#line directives.
///
Expand Down Expand Up @@ -1901,9 +1903,8 @@ class SourceManager : public RefCountedBase<SourceManager> {

FileID getFileID(SourceLocation::UIntTy SLocOffset) const {
// If our one-entry cache covers this offset, just return it.
if (isOffsetInFileID(LastFileIDLookup, SLocOffset))
if (SLocOffset >= LastLookupStartOffset && SLocOffset < LastLookupEndOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd somewhat prefer that isOffsetInFileID be taught this trick instead of here. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve reverted the changes to isOffsetInFileID in this patch to maximize the overall performance gain.

I have some ideas for improving isOffsetInFileID and might explore them later -- though it’s not clear whether isOffsetInFileID will still be a hot method after this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow! That is sad. I guess 1927 says that isOffsetInFileID is a very hot method, so that tells me that this check succeeds comparatively rarely in THAT function. Makes me wonder how valuable it is HERE, though I guess with all the work slow is doing, it matters...

return LastFileIDLookup;

return getFileIDSlow(SLocOffset);
}

Expand Down
27 changes: 18 additions & 9 deletions clang/lib/Basic/SourceManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ void SourceManager::clearIDTables() {
LastLineNoFileIDQuery = FileID();
LastLineNoContentCache = nullptr;
LastFileIDLookup = FileID();
LastLookupStartOffset = LastLookupEndOffset = 0;

IncludedLocMap.clear();
if (LineTable)
Expand Down Expand Up @@ -639,9 +640,11 @@ FileID SourceManager::createFileIDImpl(ContentCache &File, StringRef Filename,
LocalSLocEntryTable.push_back(
SLocEntry::get(NextLocalOffset,
FileInfo::get(IncludePos, File, FileCharacter, Filename)));
LastLookupStartOffset = NextLocalOffset;
// We do a +1 here because we want a SourceLocation that means "the end of the
// file", e.g. for the "no newline at the end of the file" diagnostic.
NextLocalOffset += FileSize + 1;
LastLookupEndOffset = NextLocalOffset;
updateSlocUsageStats();

// Set LastFileIDLookup to the newly created file. The next getFileID call is
Expand Down Expand Up @@ -811,8 +814,9 @@ FileID SourceManager::getFileIDSlow(SourceLocation::UIntTy SLocOffset) const {
/// loaded one.
FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
assert(SLocOffset < NextLocalOffset && "Bad function choice");
assert(SLocOffset >= LocalSLocEntryTable[0].getOffset() &&
assert(SLocOffset >= LocalSLocEntryTable[0].getOffset() && SLocOffset > 0 &&
"Invalid SLocOffset");
assert(LastFileIDLookup.ID >= 0 && "Only cache local file sloc entry");

// After the first and second level caches, I see two common sorts of
// behavior: 1) a lot of searched FileID's are "near" the cached file
Expand All @@ -832,13 +836,11 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
unsigned LessIndex = 0;
// upper bound of the search range.
unsigned GreaterIndex = LocalSLocEntryTable.size();
if (LastFileIDLookup.ID >= 0) {
// Use the LastFileIDLookup to prune the search space.
if (LocalSLocEntryTable[LastFileIDLookup.ID].getOffset() < SLocOffset)
LessIndex = LastFileIDLookup.ID;
else
GreaterIndex = LastFileIDLookup.ID;
}
// Use the LastFileIDLookup to prune the search space.
if (LastLookupStartOffset < SLocOffset)
LessIndex = LastFileIDLookup.ID;
else
GreaterIndex = LastFileIDLookup.ID;

// Find the FileID that contains this.
unsigned NumProbes = 0;
Expand All @@ -849,7 +851,12 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
FileID Res = FileID::get(int(GreaterIndex));
// Remember it. We have good locality across FileID lookups.
LastFileIDLookup = Res;
NumLinearScans += NumProbes+1;
LastLookupStartOffset = LocalSLocEntryTable[GreaterIndex].getOffset();
LastLookupEndOffset =
GreaterIndex + 1 >= LocalSLocEntryTable.size()
? NextLocalOffset
: LocalSLocEntryTable[GreaterIndex + 1].getOffset();
NumLinearScans += NumProbes + 1;
return Res;
}
if (++NumProbes == 8)
Expand All @@ -873,6 +880,8 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
// At this point, LessIndex is the index of the *first element greater than*
// SLocOffset. The element we are actually looking for is the one immediately
// before it.
LastLookupStartOffset = LocalSLocEntryTable[LessIndex - 1].getOffset();
LastLookupEndOffset = LocalSLocEntryTable[LessIndex].getOffset();
return LastFileIDLookup = FileID::get(LessIndex - 1);
}

Expand Down