Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add option to persist rule index #1880

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jbedard
Copy link
Contributor

@jbedard jbedard commented Aug 20, 2024

What type of PR is this?
Feature

What package or component does this PR mostly affect?
cmd/gazelle

What does this PR do? Why is it needed?

Add the ability to persist the RuleIndex across runs to allow updating only the specified directories when indexing is required.

Which issues(s) does this PR fix?

Ref: #1181

walk/walk.go Outdated Show resolved Hide resolved
@jbedard jbedard marked this pull request as ready for review August 20, 2024 21:39
fmeum pushed a commit that referenced this pull request Aug 21, 2024
**What type of PR is this?**
Other

**What package or component does this PR mostly affect?**
all

**What does this PR do? Why is it needed?**

Prefactor for #1880
@jbedard jbedard force-pushed the index-persist branch 3 times, most recently from 10667ba to 936cb51 Compare August 22, 2024 08:34
}
defer indexDbFile.Close()

indexDbContent, err := io.ReadAll(indexDbFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large does this file get? I'm wondering whether json.NewDecoder could be more appropriate here. Doesn't matter for the first iteration though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this could get fairly large and I've been wondering if we'll want something more compact then json. But I figured if this is EXPERIMENTAL and we can break it whenever we want then plain json is a good place to start?

resolve/index.go Outdated
return err
}

indexDbFile, err := os.OpenFile(indexDbPath, os.O_RDWR|os.O_CREATE, 0666)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use os.Create instead so that the file is truncated if it exists.

@@ -96,6 +99,7 @@ func (ucr *updateConfigurer) RegisterFlags(fs *flag.FlagSet, cmd string, c *conf
fs.StringVar(&ucr.memProfile, "memprofile", "", "write memory profile to `file`")
fs.Var(&gzflag.MultiFlag{Values: &ucr.knownImports}, "known_import", "import path for which external resolution is skipped (can specify multiple times)")
fs.StringVar(&ucr.repoConfigPath, "repo_config", "", "file where Gazelle should load repository configuration. Defaults to WORKSPACE.")
fs.StringVar(&uc.ruleIndexFile, "indexdb", "", "EXPERIMENTAL: file to cache the rule index")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for the naming of the cli arg? I used "indexdb" because this is caching the RuleIndex data 🤷
Maybe we should use a --experimental prefix?

@jbedard
Copy link
Contributor Author

jbedard commented Aug 22, 2024

Just to re-iterate a few points and limitations of this:

This is EXPERIMENTAL with no api guarantees. 99% chance there will be breaking changes. You should never commit this file to source-control.

When this EXPERIMENTAL feature is used:

  • the resolver.Imports/Embeds() calls for the entire workspace (the "rule index") is written to disk
  • when specifying dirs (default: .) on the command line only those directories are updated in the "rule index"
  • only the dirs specified on the command line will be traversed, therefore:
    • only those dirs will have Imports/Embeds() invoked
    • only those dirs will have Fix(), GenerateRules() invoked
    • only those dirs will have BUILDs updated, warnings generated etc.
    • only those dirs will have things like resolve warnings outputted

If resolver.Imports/Embeds() have side-effects then this won't work very well when they are cached and not invoked. Don't do that.

There is no cache invalidation here. It is entirely up to the user to pass the correct list of dirs that require reprocessing.

FUTURE: if those updated dirs changed what rules they generate and the Imports/Embeds() of those rules change it is possible that other rules outside dirs are now referencing something incorrect. I think if Imports/Embeds() of those dirs change we might want to re-run Fix + GenerateRules for all directories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants