Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Commit d82e3bd

Browse files
markurtzeldarkurticbfinerananmarquesKSGulin
authored
[cherry-pick] Release 0.12.1 fixes (#744)
* Avoid numerically unstable log (#694) * fix QAT->Quant conversion of repeated Gemm layers with no activation QDQ (#698) * Revert rn residual quant (#691) * Revert ResNet definition to not quantize input to add op in residual branches. * Correct typo. Co-authored-by: Mark Kurtz <[email protected]> * Fix: Add linebreak before 'Supplied' for better readability (#701) * Bump notebook in /research/information_retrieval/doc2query (#679) Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10. --- updated-dependencies: - dependency-name: notebook dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Kurtz <[email protected]> Co-authored-by: Michael Goin <[email protected]> * Added integration to masked_language_modeling training command (#707) * Switch off fp16 on QAT start (#703) * Switch off fp16 on QAT start * address: review comments * Disable fp16 when torch version is lesser than `1.9` * Fix transformer prediction step (#716) * Fix for prediction step when teacher model has more inputs than student. * Updated signature of prediction_step method. * Style and quality fixes. * bump main to 0.13 (#696) Co-authored-by: dhuang <[email protected]> * Fix: default python log calls to debug level (#719) * Feature/integrations (#688) * added tutorials to root readme split by domain * readme update * edited text/structure * grammar edits * fix QATWrapper not properly overwritting qconfig properties for symmetric activations (#724) * re-add fix symmetric zero points for unit8 quantization (#604) (#725) * Fix 'self' and 'disable' not working for transformers distillation (#731) * Click refactor for SparseML-PyTorch integration with Image Classification models (#711) * Click refactor for SparseML-PyTorch integration * Click refactor for `Pruning Sensitivity` analysis (#714) * Click refactor for SparseML-PyTorch pr_sensitivity analysis integration * Review comments from @KSGulin * Click refactor for SparseML-PyTorch `lr-analysis` integration (#713) * Click refactor for SparseML-PyTorch lr-analysis integration * Review comments from @KSGulin * Click refactor for SparseML PyTorch `export` integration (#712) * Click refactor for SparseML-PyTorch export integration * Review comments from @KSGulin * Addressed all review comments from @bfineran, @dbogunowicz and @KSGulin * Regenerate and Update the train-cli docstring due to changes in a few cli-args * `nm_argparser.py` not needed anymore * removed `nm_argparser.py` from init * Remove All CLI args aliases and updated doctrings accordingly * [Fix] Follow-up fix for #731 (Fix 'self' and 'disable' not working for transformers distillation) (#737) * initial commit * added more files and fixed quality * Update trainer.py * Added flag to exclude quantization of embedding activations. (#738) * Added flag to exclude quantization of embedding activations. * Updated testing to contemplate quantize_embedding_activations flag. * Updated testing to contemplate quantize_embedding_activations flag. * Updated debugging * Revert "Updated debugging" This reverts commit 449703d. * Corrected order of arguments to pass assertion. * Update src/sparseml/version.py Co-authored-by: Eldar Kurtic <[email protected]> Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Alexandre Marques <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuang <[email protected]> Co-authored-by: Ricky Costa <[email protected]> Co-authored-by: dbogunowicz <[email protected]>
1 parent 81f5f33 commit d82e3bd

File tree

25 files changed

+1906
-2211
lines changed

25 files changed

+1906
-2211
lines changed

integrations/huggingface-transformers/README.md

Lines changed: 138 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,11 @@ limitations under the License.
1616

1717
# SparseML Hugging Face Transformers Integration
1818

19-
This directory combines the SparseML recipe-driven approach with the
20-
[huggingface/transformers](https://github.com/huggingface/transformers) repository.
21-
By integrating the robust training flows in the `transformers` repository with the SparseML code base,
22-
we enable model sparsification techniques on popular NLP models such as [BERT](https://arxiv.org/abs/1810.04805)
23-
creating smaller and faster deployable versions.
24-
The techniques include, but are not limted to:
19+
This directory combines the SparseML recipe-driven approach with the [huggingface/transformers](https://github.com/huggingface/transformers) repository. By integrating the robust training flows in the `transformers` repository with the SparseML code base, we enable model sparsification techniques on popular NLP models such as [BERT](https://arxiv.org/abs/1810.04805) creating smaller and faster deployable versions. The techniques include, but are not limited to:
20+
2521
- Pruning
2622
- Quantization
27-
- Pruning and Quantization
23+
- Knowledge Distillation
2824
- Sparse Transfer Learning
2925

3026
## Highlights
@@ -34,87 +30,161 @@ Coming soon!
3430
## Tutorials
3531

3632
- [Sparsifying BERT Models Using Recipes](https://github.com/neuralmagic/sparseml/blob/main/integrations/huggingface-transformers/tutorials/sparsifying_bert_using_recipes.md)
33+
- [Sparse Transfer Learning With BERT](https://github.com/neuralmagic/sparseml/blob/main/integrations/huggingface-transformers/tutorials/bert_sparse_transfer_learning.md)
3734

3835
## Installation
3936

40-
To begin, run the following command in the root directory of this integration (`cd integrations/huggingface-transformers`):
4137
```bash
42-
bash setup_integration.sh
38+
pip install sparseml[torch]
4339
```
4440

45-
The `setup_integration.sh` file will clone the transformers repository with the SparseML integration as a subfolder.
46-
After the repo has successfully cloned, transformers and datasets will be installed along with any necessary dependencies.
47-
4841
It is recommended to run Python 3.8 as some of the scripts within the transformers repository require it.
4942

50-
## Quick Tour
43+
**Note**: Transformers will not immediately install with this command. Instead, a sparsification-compatible version of Transformers will install on the first invocation of the Transformers code in SparseML.
44+
45+
## SparseML CLI
46+
47+
The SparseML installation provides a CLI for sparsifying your models for a specific task; appending the `--help` argument will provide a full list of options for training in SparseML:
48+
49+
```bash
50+
sparseml.transformers.[task] --help
51+
```
52+
53+
e.g. `sparseml.transformers.question_answering --help`
54+
55+
output:
56+
57+
```bash
58+
--output_dir: The directory in which to store the outputs from the training runs such as results, the trained model, and supporting files.
59+
--model_name_or_path: The path or SparseZoo stub for the model to load for training.
60+
--recipe: The path or SparseZoo stub for the recipe to use to apply sparsification algorithms or sparse transfer learning to the model.
61+
--distill_teacher: The path or SparseZoo stub for the teacher to load for distillation.
62+
--dataset_name or --task_name: The dataset or task to load for training.
63+
```
64+
65+
## Sparse Transfer Learning | Question Answering Example
5166

52-
Recipes encode the instructions and hyperparameters for sparsifying a model using modifiers to the training process.
53-
The modifiers can range from pruning and quantization to learning rate and weight decay.
54-
When appropriately combined, it becomes possible to create highly sparse and accurate models.
67+
### Dense Teacher Creation
5568

56-
This integration adds a `--recipe` argument to the [`run_qa.py`](https://github.com/neuralmagic/transformers/blob/master/examples/pytorch/question-answering/run_qa.py) script among others.
57-
The argument loads an appropriate recipe while preserving the rest of the training pipeline.
58-
Popular recipes used with this argument are found in the [`recipes` folder](./recipes).
59-
Distillation arguments to support student-teacher distillation are additionally added to the scripts as they help improve the recovery while sparsifying.
60-
Otherwise, all other arguments and functionality remain the same as the original repository.
69+
To enable distillation, you will first create a dense teacher model that the sparse model will learn from while transferring. **If you already have a Transformers-compatible model, you can use this as the dense teacher in place of training one from scratch.** The following command will use the dense BERT base model from the SparseZoo and fine-tune it on the SQuAD dataset, resulting in a model that achieves 88.5% F1 on the validation set:
6170

62-
For example, pruning and quantizing a model on the SQuAD dataset can be done by running the following command from within the root of this integration's folder:
6371
```bash
64-
python transformers/examples/pytorch/question-answering/run_qa.py \
65-
--model_name_or_path bert-base-uncased \
66-
--dataset_name squad \
67-
--do_train \
68-
--do_eval \
69-
--evaluation_strategy epoch \
70-
--per_device_train_batch_size 16 \
71-
--learning_rate 5e-5 \
72-
--max_seq_length 384 \
73-
--doc_stride 128 \
74-
--output_dir MODELS_DIR/bert-base-12layers_prune80 \
75-
--cache_dir cache \
76-
--preprocessing_num_workers 6 \
77-
--fp16 \
78-
--num_train_epochs 30 \
79-
--recipe recipes/bert-base-12layers_prune80.md \
80-
--onnx_export_path MODELS_DIR/bert-base-12layers_prune80/onnx \
81-
--save_strategy epoch \
82-
--save_total_limit 2
72+
sparseml.transformers.question_answering \
73+
--output_dir models/teacher \
74+
--model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none \
75+
--recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-question_answering \
76+
--dataset_name squad \
77+
--per_device_train_batch_size 16 \
78+
--per_device_eval_batch_size 24 \
79+
--preprocessing_num_workers 6 \
80+
--do_train \
81+
--do_eval \
82+
--evaluation_strategy epoch \
83+
--fp16 \
84+
--seed 42 \
85+
--per_device_train_batch_size 16 \
86+
--per_device_eval_batch_size 24 \
87+
--save_strategy epoch \
88+
--save_total_limit 1
8389
```
8490

85-
### Structure
91+
With the dense teacher trained to convergence, you can begin the sparse transfer learning with distillation with a recipe. The dense teacher will distill knowledge into the sparse architecture, therefore increasing its performance while ideally converging to the dense solution’s accuracy.
8692

87-
The following table lays out the root-level files and folders along with a description for each.
93+
💡**PRO TIP**💡: Recipes encode the instructions and hyperparameters for sparsifying a model using modifiers to the training process. The modifiers can range from pruning and quantization to learning rate and weight decay. When appropriately combined, it becomes possible to create highly sparse and accurate models.
8894

89-
| Folder/File Name | Description |
90-
|----------------------|-----------------------------------------------------------------------------------------------------------------------|
91-
| recipes | Typical recipes for sparsifying NLP models along with any downloaded recipes from the SparseZoo. |
92-
| tutorials | Tutorial walkthroughs for how to sparsify NLP models using recipes. |
93-
| transformers | Integration repository folder used to train and sparsify NLP models (`setup_integration.sh` must run first). |
94-
| README.md | Readme file. |
95-
| setup_integration.sh | Setup file for the integration run from the command line. |
95+
Once the command has completed, you will have a sparse checkpoint located in `models/sparse_quantized`.
9696

97-
### Exporting for Inference
97+
### Transfer Learn the Model
9898

99-
After sparsifying a model, the `run_qa.py` script can be run with the `--onnx_export_path` argument to convert the model into an [ONNX](https://onnx.ai/) deployment format.
100-
The export process is modified such that the quantized and pruned models are corrected and folded properly.
99+
The following command will use the 80% sparse-quantized BERT model from the SparseZoo and fine-tune it on the SQuAD dataset, resulting in a model that achieves an F1 of 88.5% on the validation set. Keep in mind that the `--distill_teacher` argument is set to pull a dense SQuAD model from the SparseZoo to enable it to run independent of the dense teacher step. If you trained a dense teacher, change this out for the path to your model folder:
101100

102-
For example, the following command can be run from within the integration's folder to export a trained/sparsified model's checkpoint:
103101
```bash
104-
python transformers/examples/pytorch/question-answering/run_qa.py \
105-
--model_name_or_path MODELS_DIR/bert-base-12layers_prune80 \
106-
--dataset_name squad \
107-
--do_eval \
108-
--per_device_eval_batch_size 64 \
109-
--max_seq_length 384 \
110-
--doc_stride 128 \
111-
--output_dir MODELS_DIR/bert-base-12layers_prune80/eval \
112-
--cache_dir cache \
113-
--preprocessing_num_workers 6 \
114-
--onnx_export_path MODELS_DIR/bert-base-12layers_prune80/onnx
102+
sparseml.transformers.question_answering \
103+
--output_dir models/sparse_quantized \
104+
--model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni \
105+
--recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/12layer_pruned80_quant-none-vnni?recipe_type=transfer-question_answering \
106+
--distill_teacher zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none \
107+
--dataset_name squad \
108+
--per_device_train_batch_size 12 \
109+
--per_device_eval_batch_size 24 \
110+
--preprocessing_num_workers 6 \
111+
--do_train \
112+
--do_eval \
113+
--evaluation_strategy epoch \
114+
--fp16 \
115+
--seed 21636 \
116+
--per_device_train_batch_size 16 \
117+
--per_device_eval_batch_size 24 \
118+
--preprocessing_num_workers 6 \
119+
--save_strategy epoch \
120+
--save_total_limit 1
115121
```
116122

117-
The DeepSparse Engine [accepts ONNX formats](https://docs.neuralmagic.com/sparseml/source/onnx_export.html) and is engineered to significantly speed up inference on CPUs for the sparsified models from this integration.
118-
Examples for loading, benchmarking, and deploying can be found in the [DeepSparse repository here](https://github.com/neuralmagic/deepsparse).
123+
### Exporting to ONNX
124+
125+
The DeepSparse Engine uses the ONNX format to load neural networks and then deliver breakthrough performance for CPUs by leveraging the sparsity and quantization within a network.
126+
127+
The SparseML installation provides a `sparseml.transformers.export_onnx` command that you can use to load the training model folder and create a new model.onnx file within. Be sure the `--model_path` argument points to your trained model. By default, it is set to the result from transfer learning a sparse-quantized BERT model:
128+
129+
```bash
130+
sparseml.transformers.export_onnx \
131+
--model_path models/sparse_quantized \
132+
--task 'question-answering' \
133+
--sequence_length 384
134+
```
135+
136+
### DeepSparse Engine Deployment
137+
138+
Now that the model is in an ONNX format, it is ready for deployment with the DeepSparse Engine.
139+
140+
Run the following command to install it:
141+
142+
```bash
143+
pip install deepsparse
144+
```
145+
146+
Once DeepSparse is installed on your deployment environment, two options are supported for deployment:
147+
- A Python API that will fit into our current deployment pipelines.
148+
- The DeepSparse Server that enables a no-code CLI solution to run your model via FastAPIs HTTP server.
149+
150+
### 🐍 Python API
151+
152+
The Python code below gives an example for using the DeepSparse Python pipeline API with different tasks. Be sure to change out the `model_path` argument for the model folder of your trained model:
153+
154+
Python Pipeline:
155+
156+
```python
157+
from deepsparse.transformers import pipeline
158+
159+
model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
160+
161+
qa_pipeline = pipeline(
162+
task="question-answering",
163+
model_path=model_path
164+
)
165+
166+
inference = qa_pipeline(question="What's my name?", context="My name is Snorlax")
167+
print(inference)
168+
```
169+
printout:
170+
171+
{'score': 0.9947717785835266, 'start': 11, 'end': 18, 'answer': 'Snorlax'}
172+
173+
### 🔌DeepSparse Server
174+
175+
To use the DeepSparse Server, first install the required dependencies using pip:
176+
177+
```bash
178+
pip install deepsparse[server]
179+
```
180+
181+
Once installed, the CLI command given below for serving a BERT model is available. The commands are set up to be able to run independently of the prior stages. Once launched, you can view info over the server and the available APIs at `http://0.0.0.0:5543` on the deployment machine.
182+
183+
```bash
184+
deepsparse.server \
185+
--task question_answering \
186+
--batch_size 1 \
187+
--model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
188+
```
119189

120-
**Note: there is currently a known issue where conversion of the BERT models from PyTorch into ONNX is not preserving the accuracy of the model for some tasks and datasets. If you encounter this issue, try rolling back to the 0.9.0 release. As a resolution is being actively investigated, this note will be removed when the issue has been remediated.**
190+
For more details, check out the [Getting Started with the DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server).

research/information_retrieval/doc2query/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ nbformat==5.1.3
5656
nest-asyncio==1.5.1
5757
networkx==2.5.1
5858
nltk==3.6.6
59-
notebook==6.4.1
59+
notebook==6.4.10
6060
numpy==1.21.0
6161
onnx==1.7.0
6262
onnxruntime==1.8.0

0 commit comments

Comments
 (0)