-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Thanks a lot for designing, implementing and making ISPAQ available here on github - it's proven very useful for computing and managing data quality metrics for a database that I'm working on!
I would like to report one point regarding ispaq's resource usage: As a lot of the operations in ispaq run on a single core, I am right now starting a few runs of ispaq in parallel from the terminal. But while computing some of the basic metrics, probably within some python libraries (numpy?), ispaq uses all available cores at once and hence slows down the system for all users (and then I get an understandable complaint by the system admin...).
I still don't fully understand which function calls use parallelization as I didn't see any explicit parallelization in ispaq, but for now I was able to limit each run to one thread by setting the following environment variables at the very start of ispaq.py:
import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"I hope this can help others who run ispaq on a system that they share with other users. Maybe there could be a command line option to limit the number of cores used by ispaq?