-
Notifications
You must be signed in to change notification settings - Fork 280
Open
Description
Hello,
I am trying to run the following script:
https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm
I use the script below:
OMP_NUM_THREADS=32 python run_clm_no_trainer.py --model facebook/opt-1.3b
--quantize --sq --alpha 0.5 --ipex --output_dir "saved_results" --int8_bf16_mixed
However, on htop I see that only a single thread is being used. Even if I set torch.set_num_threads(32). It is extremely slow, making smoothquant unusable in my case.
I have a system with Intel® Xeon® Gold 5218 Processor.
Am I missing something? Thanks!
Metadata
Metadata
Assignees
Labels
No labels