You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 3, 2025. It is now read-only.
By integrating the robust training flows in the `transformers` repository with the SparseML code base,
22
-
we enable model sparsification techniques on popular NLP models such as [BERT](https://arxiv.org/abs/1810.04805)
23
-
creating smaller and faster deployable versions.
24
-
The techniques include, but are not limted to:
19
+
This directory combines the SparseML recipe-driven approach with the [huggingface/transformers](https://github.com/huggingface/transformers) repository. By integrating the robust training flows in the `transformers` repository with the SparseML code base, we enable model sparsification techniques on popular NLP models such as [BERT](https://arxiv.org/abs/1810.04805) creating smaller and faster deployable versions. The techniques include, but are not limited to:
20
+
25
21
- Pruning
26
22
- Quantization
27
-
-Pruning and Quantization
23
+
-Knowledge Distillation
28
24
- Sparse Transfer Learning
29
25
30
26
## Highlights
@@ -34,87 +30,161 @@ Coming soon!
34
30
## Tutorials
35
31
36
32
-[Sparsifying BERT Models Using Recipes](https://github.com/neuralmagic/sparseml/blob/main/integrations/huggingface-transformers/tutorials/sparsifying_bert_using_recipes.md)
33
+
-[Sparse Transfer Learning With BERT](https://github.com/neuralmagic/sparseml/blob/main/integrations/huggingface-transformers/tutorials/bert_sparse_transfer_learning.md)
37
34
38
35
## Installation
39
36
40
-
To begin, run the following command in the root directory of this integration (`cd integrations/huggingface-transformers`):
41
37
```bash
42
-
bash setup_integration.sh
38
+
pip install sparseml[torch]
43
39
```
44
40
45
-
The `setup_integration.sh` file will clone the transformers repository with the SparseML integration as a subfolder.
46
-
After the repo has successfully cloned, transformers and datasets will be installed along with any necessary dependencies.
47
-
48
41
It is recommended to run Python 3.8 as some of the scripts within the transformers repository require it.
49
42
50
-
## Quick Tour
43
+
**Note**: Transformers will not immediately install with this command. Instead, a sparsification-compatible version of Transformers will install on the first invocation of the Transformers code in SparseML.
44
+
45
+
## SparseML CLI
46
+
47
+
The SparseML installation provides a CLI for sparsifying your models for a specific task; appending the `--help` argument will provide a full list of options for training in SparseML:
48
+
49
+
```bash
50
+
sparseml.transformers.[task] --help
51
+
```
52
+
53
+
e.g. `sparseml.transformers.question_answering --help`
54
+
55
+
output:
56
+
57
+
```bash
58
+
--output_dir: The directory in which to store the outputs from the training runs such as results, the trained model, and supporting files.
59
+
--model_name_or_path: The path or SparseZoo stub for the model to load for training.
60
+
--recipe: The path or SparseZoo stub for the recipe to use to apply sparsification algorithms or sparse transfer learning to the model.
61
+
--distill_teacher: The path or SparseZoo stub for the teacher to load for distillation.
62
+
--dataset_name or --task_name: The dataset or task to load for training.
63
+
```
64
+
65
+
## Sparse Transfer Learning | Question Answering Example
51
66
52
-
Recipes encode the instructions and hyperparameters for sparsifying a model using modifiers to the training process.
53
-
The modifiers can range from pruning and quantization to learning rate and weight decay.
54
-
When appropriately combined, it becomes possible to create highly sparse and accurate models.
67
+
### Dense Teacher Creation
55
68
56
-
This integration adds a `--recipe` argument to the [`run_qa.py`](https://github.com/neuralmagic/transformers/blob/master/examples/pytorch/question-answering/run_qa.py) script among others.
57
-
The argument loads an appropriate recipe while preserving the rest of the training pipeline.
58
-
Popular recipes used with this argument are found in the [`recipes` folder](./recipes).
59
-
Distillation arguments to support student-teacher distillation are additionally added to the scripts as they help improve the recovery while sparsifying.
60
-
Otherwise, all other arguments and functionality remain the same as the original repository.
69
+
To enable distillation, you will first create a dense teacher model that the sparse model will learn from while transferring. **If you already have a Transformers-compatible model, you can use this as the dense teacher in place of training one from scratch.** The following command will use the dense BERT base model from the SparseZoo and fine-tune it on the SQuAD dataset, resulting in a model that achieves 88.5% F1 on the validation set:
61
70
62
-
For example, pruning and quantizing a model on the SQuAD dataset can be done by running the following command from within the root of this integration's folder:
With the dense teacher trained to convergence, you can begin the sparse transfer learning with distillation with a recipe. The dense teacher will distill knowledge into the sparse architecture, therefore increasing its performance while ideally converging to the dense solution’s accuracy.
86
92
87
-
The following table lays out the root-level files and folders along with a description for each.
93
+
💡**PRO TIP**💡: Recipes encode the instructions and hyperparameters for sparsifying a model using modifiers to the training process. The modifiers can range from pruning and quantization to learning rate and weight decay. When appropriately combined, it becomes possible to create highly sparse and accurate models.
| recipes | Typical recipes for sparsifying NLP models along with any downloaded recipes from the SparseZoo. |
92
-
| tutorials | Tutorial walkthroughs for how to sparsify NLP models using recipes. |
93
-
| transformers | Integration repository folder used to train and sparsify NLP models (`setup_integration.sh` must run first). |
94
-
| README.md | Readme file. |
95
-
| setup_integration.sh | Setup file for the integration run from the command line. |
95
+
Once the command has completed, you will have a sparse checkpoint located in `models/sparse_quantized`.
96
96
97
-
### Exporting for Inference
97
+
### Transfer Learn the Model
98
98
99
-
After sparsifying a model, the `run_qa.py` script can be run with the `--onnx_export_path` argument to convert the model into an [ONNX](https://onnx.ai/) deployment format.
100
-
The export process is modified such that the quantized and pruned models are corrected and folded properly.
99
+
The following command will use the 80% sparse-quantized BERT model from the SparseZoo and fine-tune it on the SQuAD dataset, resulting in a model that achieves an F1 of 88.5% on the validation set. Keep in mind that the `--distill_teacher` argument is set to pull a dense SQuAD model from the SparseZoo to enable it to run independent of the dense teacher step. If you trained a dense teacher, change this out for the path to your model folder:
101
100
102
-
For example, the following command can be run from within the integration's folder to export a trained/sparsified model's checkpoint:
The DeepSparse Engine [accepts ONNX formats](https://docs.neuralmagic.com/sparseml/source/onnx_export.html) and is engineered to significantly speed up inference on CPUs for the sparsified models from this integration.
118
-
Examples for loading, benchmarking, and deploying can be found in the [DeepSparse repository here](https://github.com/neuralmagic/deepsparse).
123
+
### Exporting to ONNX
124
+
125
+
The DeepSparse Engine uses the ONNX format to load neural networks and then deliver breakthrough performance for CPUs by leveraging the sparsity and quantization within a network.
126
+
127
+
The SparseML installation provides a `sparseml.transformers.export_onnx` command that you can use to load the training model folder and create a new model.onnx file within. Be sure the `--model_path` argument points to your trained model. By default, it is set to the result from transfer learning a sparse-quantized BERT model:
128
+
129
+
```bash
130
+
sparseml.transformers.export_onnx \
131
+
--model_path models/sparse_quantized \
132
+
--task 'question-answering' \
133
+
--sequence_length 384
134
+
```
135
+
136
+
### DeepSparse Engine Deployment
137
+
138
+
Now that the model is in an ONNX format, it is ready for deployment with the DeepSparse Engine.
139
+
140
+
Run the following command to install it:
141
+
142
+
```bash
143
+
pip install deepsparse
144
+
```
145
+
146
+
Once DeepSparse is installed on your deployment environment, two options are supported for deployment:
147
+
- A Python API that will fit into our current deployment pipelines.
148
+
- The DeepSparse Server that enables a no-code CLI solution to run your model via FastAPIs HTTP server.
149
+
150
+
### 🐍 Python API
151
+
152
+
The Python code below gives an example for using the DeepSparse Python pipeline API with different tasks. Be sure to change out the `model_path` argument for the model folder of your trained model:
To use the DeepSparse Server, first install the required dependencies using pip:
176
+
177
+
```bash
178
+
pip install deepsparse[server]
179
+
```
180
+
181
+
Once installed, the CLI command given below for serving a BERT model is available. The commands are set up to be able to run independently of the prior stages. Once launched, you can view info over the server and the available APIs at `http://0.0.0.0:5543` on the deployment machine.
**Note: there is currently a known issue where conversion of the BERT models from PyTorch into ONNX is not preserving the accuracy of the model for some tasks and datasets. If you encounter this issue, try rolling back to the 0.9.0 release. As a resolution is being actively investigated, this note will be removed when the issue has been remediated.**
190
+
For more details, check out the [Getting Started with the DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server).
0 commit comments