Skip to content

[clang] SourceManager: Cache offsets for LastFileIDLookup to speed up getFileID #146782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2025

Conversation

hokein
Copy link
Collaborator

@hokein hokein commented Jul 2, 2025

getFileID is a hot method. By caching the offset range in LastFileIDLookup, we can more quickly check whether a given offset falls within it, avoiding calling isOffsetInFileID.

https://llvm-compile-time-tracker.com/compare.php?from=0588e8188c647460b641b09467fe6b13a8d510d5&to=64843a500f0191b79a8109da9acd7e80d961c7a3&stat=instructions:u

@hokein hokein requested review from AaronBallman and erichkeane July 2, 2025 21:12
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-clang

Author: Haojian Wu (hokein)

Changes

getFileID is a hot method. By caching the offset range in LastFileIDLookup, we can more quickly check whether a given offset falls within it, avoiding calling isOffsetInFileID.

https://llvm-compile-time-tracker.com/compare.php?from=0588e8188c647460b641b09467fe6b13a8d510d5&to=64843a500f0191b79a8109da9acd7e80d961c7a3&stat=instructions:u


Full diff: https://github.com/llvm/llvm-project/pull/146782.diff

2 Files Affected:

  • (modified) clang/include/clang/Basic/SourceManager.h (+3-2)
  • (modified) clang/lib/Basic/SourceManager.cpp (+16-8)
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index eefd4885534c8..a90cc70825ffd 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -767,6 +767,8 @@ class SourceManager : public RefCountedBase<SourceManager> {
   /// LastFileIDLookup records the last FileID looked up or created, because it
   /// is very common to look up many tokens from the same file.
   mutable FileID LastFileIDLookup;
+  mutable SourceLocation::UIntTy LastLookupStartOffset;
+  mutable SourceLocation::UIntTy LastLookupEndOffset; // exclude
 
   /// Holds information for \#line directives.
   ///
@@ -1901,9 +1903,8 @@ class SourceManager : public RefCountedBase<SourceManager> {
 
   FileID getFileID(SourceLocation::UIntTy SLocOffset) const {
     // If our one-entry cache covers this offset, just return it.
-    if (isOffsetInFileID(LastFileIDLookup, SLocOffset))
+    if (SLocOffset >= LastLookupStartOffset && SLocOffset < LastLookupEndOffset)
       return LastFileIDLookup;
-
     return getFileIDSlow(SLocOffset);
   }
 
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index 82096421f8579..8c1d9662d4579 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -334,6 +334,7 @@ void SourceManager::clearIDTables() {
   LastLineNoFileIDQuery = FileID();
   LastLineNoContentCache = nullptr;
   LastFileIDLookup = FileID();
+  LastLookupStartOffset = LastLookupEndOffset = 0;
 
   IncludedLocMap.clear();
   if (LineTable)
@@ -639,9 +640,11 @@ FileID SourceManager::createFileIDImpl(ContentCache &File, StringRef Filename,
   LocalSLocEntryTable.push_back(
       SLocEntry::get(NextLocalOffset,
                      FileInfo::get(IncludePos, File, FileCharacter, Filename)));
+  LastLookupStartOffset = NextLocalOffset;
   // We do a +1 here because we want a SourceLocation that means "the end of the
   // file", e.g. for the "no newline at the end of the file" diagnostic.
   NextLocalOffset += FileSize + 1;
+  LastLookupEndOffset = NextLocalOffset;
   updateSlocUsageStats();
 
   // Set LastFileIDLookup to the newly created file.  The next getFileID call is
@@ -832,13 +835,11 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
   unsigned LessIndex = 0;
   // upper bound of the search range.
   unsigned GreaterIndex = LocalSLocEntryTable.size();
-  if (LastFileIDLookup.ID >= 0) {
-    // Use the LastFileIDLookup to prune the search space.
-    if (LocalSLocEntryTable[LastFileIDLookup.ID].getOffset() < SLocOffset)
-      LessIndex = LastFileIDLookup.ID;
-    else
-      GreaterIndex = LastFileIDLookup.ID;
-  }
+  // Use the LastFileIDLookup to prune the search space.
+  if (LastLookupStartOffset < SLocOffset)
+    LessIndex = LastFileIDLookup.ID;
+  else
+    GreaterIndex = LastFileIDLookup.ID;
 
   // Find the FileID that contains this.
   unsigned NumProbes = 0;
@@ -849,7 +850,12 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
       FileID Res = FileID::get(int(GreaterIndex));
       // Remember it.  We have good locality across FileID lookups.
       LastFileIDLookup = Res;
-      NumLinearScans += NumProbes+1;
+      LastLookupStartOffset = LocalSLocEntryTable[GreaterIndex].getOffset();
+      LastLookupEndOffset =
+          GreaterIndex + 1 >= LocalSLocEntryTable.size()
+              ? NextLocalOffset
+              : LocalSLocEntryTable[GreaterIndex + 1].getOffset();
+      NumLinearScans += NumProbes + 1;
       return Res;
     }
     if (++NumProbes == 8)
@@ -873,6 +879,8 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
   // At this point, LessIndex is the index of the *first element greater than*
   // SLocOffset. The element we are actually looking for is the one immediately
   // before it.
+  LastLookupStartOffset = LocalSLocEntryTable[LessIndex - 1].getOffset();
+  LastLookupEndOffset = LocalSLocEntryTable[LessIndex].getOffset();
   return LastFileIDLookup = FileID::get(LessIndex - 1);
 }
 

@@ -1901,9 +1903,8 @@ class SourceManager : public RefCountedBase<SourceManager> {

FileID getFileID(SourceLocation::UIntTy SLocOffset) const {
// If our one-entry cache covers this offset, just return it.
if (isOffsetInFileID(LastFileIDLookup, SLocOffset))
if (SLocOffset >= LastLookupStartOffset && SLocOffset < LastLookupEndOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd somewhat prefer that isOffsetInFileID be taught this trick instead of here. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve reverted the changes to isOffsetInFileID in this patch to maximize the overall performance gain.

I have some ideas for improving isOffsetInFileID and might explore them later -- though it’s not clear whether isOffsetInFileID will still be a hot method after this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow! That is sad. I guess 1927 says that isOffsetInFileID is a very hot method, so that tells me that this check succeeds comparatively rarely in THAT function. Makes me wonder how valuable it is HERE, though I guess with all the work slow is doing, it matters...

@hokein hokein merged commit e2510b1 into llvm:main Jul 4, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants