Skip to content

Script to repartition index files #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zooba opened this issue Apr 28, 2025 · 0 comments · Fixed by #73
Closed

Script to repartition index files #5

zooba opened this issue Apr 28, 2025 · 0 comments · Fixed by #73
Assignees
Labels
enhancement New feature or request

Comments

@zooba
Copy link
Member

zooba commented Apr 28, 2025

Index files (hosted on python.org) can be split up to minimise the initial download when clients are accessing it. This is a simple chain - the next element contains a URL (relative to the location of the containing file, so typically this will just be a filename), and if no matching version is available in the first one, the next one will be loaded.

Each index file is sorted by the client (by descending 'sort-version') before checking for matches (such that py install 3 will prefer 3.13 over 3.12 regardless of which appears first), but does not sort across index files (such that if 3.13 appears in the next index only, 3.12 would be selected).

So the ideal is to have the current versions of each likely specifier in the first file, with those less likely or specific enough to exclude every option from earlier files in later ones. Certainly py install 3 and py install 3.x (for non-EOL x) should always find the correct match in the first index.

Rather than making a complex database for handling this1, we chain static files. But periodically we should merge and re-split these files for optimal performance. We should have a script in this repo to do that, ideally:

  • takes a URL/path as input
  • downloads the full chain of indexes and combines them all
  • re-sorts all available installs by sort-version and then some other sensible key2
  • divides all installs into three new files based on their sort-version

A reasonable division would be:

  • all latest x.y.z versions
  • all non-latest x.y versions since a reasonable point (currently 3.10)
  • all the rest ("legacy")

(Worth noting that this isn't the breakdown we have right now - we only have the "since 3.10" and "all the rest" indexes. Provided the URL of the first index doesn't change, we can always insert more later.)

Footnotes

  1. In other words, rather than standing up expensive infrastructure...

  2. Probably 'company' and 'tag', or 'id'.

@zooba zooba added the enhancement New feature or request label Apr 28, 2025
@zooba zooba self-assigned this May 6, 2025
zooba added a commit to zooba/pymanager-python that referenced this issue May 6, 2025
@zooba zooba closed this as completed in #73 May 7, 2025
zooba added a commit that referenced this issue May 7, 2025
Fixes #5
This script allows ingesting one or more indexes, sorting them, and writing out the entries into one or more new index files according to a set of rules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant