perf: reduce import time - draft #8831
Draft
+139
−51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Motivated by recent issues (#8649, #8650, and #8704), I investigated the import times of Haystack.
I used some raw scripts (attached in this draft PR) and commands like
python -X importtime -c "import haystack"
to analyze cumulative import times of packages.Findings
LazyImport
context manager behaves like aDeferImportError
context manager. If a package you import is available, it gets imported immediately. If not, the import error is caught and only raised later when calling.check()
. Maybe this is something known, but it was surprising to me.torch
andtransformers
.An exploratory solution
I explored low-effort ways to improve this:
__init__.py
of packages actually lazyutils
package, that triggered thetorch
importResults
Tested on Ubuntu 22.04, Python 3.10.12, averaged over 30 runs.
I focus on User time and System time, as they capture actual CPU time, irrespective of parallelism and system load (reference).
Dependencies
pip install haystack-ai
Next Steps
I believe that these results can encourage discussion.
LazyImport
or transform it into a truly lazy importer (difficult if we want to keep its API unchanged)?__init__.py
of packages?