[QEff. Finetune]: Ft integrated tests #694

quic-dhirajku · 2025-12-24T10:30:15Z

Added test script to perform end to end Finetuning tests for SFT dataset. Need to add changes to the repo for Seq completion task as well. Current run uses CPU to perform finetuning.

- Added a logger which will log onto console and file. This code is similar to existing QEff. Finetuning logger code. - Also added dist_utils which serves as utility code when dealing with distributed training. - Added logger test cases for sanity checks. --------- Signed-off-by: meetkuma <[email protected]>

…quic#645) - Added functionality to register dataset, model, optimizer, trainer objects in a registry and fetch the class of given object based on configuration provided. - Also, added simple test cases to verify the functionality. --------- Signed-off-by: meetkuma <[email protected]>

) Adding a Script for Registering and Retrieving Optimizer Classes The script includes: get_optimizer() Returns the optimizer class and kwargs. Additionally, there is a test_optimizer.py script that validates the functionality of the optimizer registration and retrieval process. --------- Signed-off-by: Tanisha Chawada <[email protected]>

…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Adding a Script for Registering and Retrieving Callback Classes It has create_callback() function which creates an instance of callback. Additionally, there is a test_callbacks.py script that validates the functionality and retrieval process. --------- Signed-off-by: Tanisha Chawada <[email protected]>

) Added Config_manager to parse the training, model and dataset related arguments. --------- Signed-off-by: Tanisha Chawada <[email protected]>

…"" (quic#666) Reverts quic#656

Signed-off-by: Tanisha Chawada <[email protected]>

Split testcase into functional and loss assertion, and enable on CI Reference metrics data is updated to latest. --------- Signed-off-by: Ann Kuruvilla <[email protected]> Signed-off-by: Tanisha <[email protected]> Co-authored-by: Tanisha <[email protected]>

…LI for CB (quic#646) InputHandler has changes to create position_ids based on CB batch size. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Added step by step instructions for adding custom op in Qeff --------- Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

Added torchvision 0.22.0 cpu version to environment Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

This PR updates QEff to support QPC generation on systems without the Platform SDK by refactoring the module loading behavior. Users can now compile models and generate QPCs using QEff with only the Apps SDK installed. Background: Previously, both Apps SDK and Platform SDK were required to compile and generate QPCs using QEff. The goal is to allow QPC generation with only the Apps SDK installed for systems without Ultra cards. Changes: Refactored init.py and generation/cloud_infer.py to use lazy loading via importlib for qaicrt and aicapi. This ensures that Platform SDK-dependent modules are only loaded when explicitly needed, avoiding import errors during initialization and QPC generation. Signed-off-by: Sharvari Medhe <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

### Memory Optimization Added periodic memory cleanup to FP16ClipTransform and SplitTensorsTransform to reduce memory usage during large tensor processing. Also avoids redundant external data loading when already present. ### Time Optimized ONNX Transform via Class Merging and Thread Pooling It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a **thread pool**, replacing the previous sequential loop to parallelize tensor operations. #### Performance Benchmarks:- | Model | Original Duration (s) | Optimized Duration (s) | |----------------|------------------------|-------------------------| | LLaMA 3.1 8B | 88.35 | 58.55 | | LLaMA 3.1 70B | 1029.82 | 727.37 | > **Note:** Thread count is set to `os.cpu_count() * 4` to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities. --------- Signed-off-by: abhishek-singh591 <[email protected]>

### Objective: This PR introduces the KV blocking technique for CausalLM models where the K/V cache is read and processed block by block in the attention computation. Number of desired KV blocks are defined at model initialization in the "from_pretrained" call to export the ONNX with required number of KV blocks. As a result, the following changes are introduced: ### Changes: 1. SoftMax needs to be changed from regular SoftMax to online SoftMax where the running maximum and cumulative denominators are tracked and updated once each block is processed to retain mathematical accuracy compared to regular SoftMax. 2. Changes to CTXGather and CTXGatherCB custom ops to read only 1 block worth of data in each cache gather/read. 3. Changes to read_only function in QEffDynamicCache to allow reading of a cache block by block rather than full K/V cache. 4. Generation of attention mask per block. 5. Changes to eager_attention_forward implementation in the llama model to allow BlockedKV attention and online SoftMax implementation. 6. Wrapping the num_kv_blocks variable inside qaic_config to keep consistent calling style. 7. A new PyTorch transform to pass the num_kv_blocks variable to QEffLlamaAttention block. 8. A new constant added for num_kv_blocks. 9. Added tests to switch the BlockedKV feature on and off. Please review and feel free to suggest changes and tests. --------- Signed-off-by: Vaibhav Verma <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

Adding CB support for VLMs: 1. Llava 2. Llava_Next 3. Gemma3 4. Mistral3 5. InternVL2_5 6. InternVL3_5 7. Molmo --------- Signed-off-by: Asmita Goswami <[email protected]> Co-authored-by: Mamta Singh <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

Signed-off-by: Abukhoyer Shaik <[email protected]>

…ring compilation process (quic#623) In these changes, instead of passing CCL lists during model loading, I passed a flag called ccl_enabled to specify whether CCL feature is enabled or not and moved passing CCL lists to compilation process. --------- Signed-off-by: Vahid Janfaza <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

# Support for Diffusers Architecture in Efficient Transformers ## Overview This pull request introduces **Diffusers architecture support** to the **Efficient Transformers** framework, enabling seamless integration of diffusion models. ## Key Highlights 1. **Support of model [black-forest-labs/FLUX1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)** 2. **Flexible Configuration** - Supports JSON-based configuration files for easy compilation and execution. 3. **Performance Benchmarking** - Implements a performance matrix for Diffusers models to enable benchmarking for each modules. 4. **Testing Framework** - Includes initial test scripts for Diffusers (In progress). 5. **Support of onnx subfunction graph using flag use_onnx_function** 6. **Support parallel compilation of modules using flag `parallel_compile`** --------- Signed-off-by: Amit Raj <[email protected]> Signed-off-by: Amit Raj <[email protected]> Signed-off-by: tv-karthikeya <[email protected]> Signed-off-by: vtirumal <[email protected]> Co-authored-by: tv-karthikeya <[email protected]> Co-authored-by: Amit Raj <[email protected]> Co-authored-by: Karthikeya <[email protected]>

Signed-off-by: abhishek-singh591 <[email protected]>

Signed-off-by: Abukhoyer Shaik <[email protected]>

# We should be using disaggragate serving for GPTOSS model for best performance - GPT-OSS model has 128/4 for 120b and 32/4 ratio of total_experts/experts_per_tok - We use read all experts only once always strategy in prefill-only model - And we treat weights activtions meaning read only chosen experts for decode-only model # Prefill-only model ## Blocking default behviour when `prefill_only=True` in compile API - NUM_Q_BLOCKS=<int> set number of Q blocks in attention - NUM_FFN_BLOCKS=<int> set number of blocks in FFN - ENABLE_OPT_SWA=0 or 1 to enable/disable optimized SWA. when enabled we will be using only valid KVs for given block in Attention reducing MACs - prefix_caching is not supported with this mode ## Chunking pass `enable_chunking=True` and `prefill_only=True` in compile API - Optimized SWA i.e. reading only valid KV as per diagonal attention mask is enabled for this version by default - This model can be used for prefix_caching by passing `kv_cache_batch_size=<int>` in compile API # Decode-only model ## Retain Sliding window length of KV for sliding window layers, default behavour when `prefill_seq_len=1` in compile API - This reduces the amount of DDR used by the model - CB is enabled for this version pass `continous_batching=True` in `from_pretrained` call and strictly pass `full_batch_size=<int>` and optinally `kv_cache_batch_size=<int>` if needed ## Full KV for sliding window layers pass `retain_full_kv=True` along with `prefill_seq_len=1` in compile API - This uses higher DDR as we are retaining ctx_len KV even for sliding window layers but will be reading only sliding window len kv in attention - CB is enabled for this version pass `continous_batching=True` in `from_pretrained` call and strictly pass `full_batch_size=<int>` and optinally `kv_cache_batch_size=<int>` if needed - This is enabled for the usecase of multi-turn chat, where we will be running prefill-> decode and then use cache of prefill as well as decode combined to again run prefill, so we want to retain full KV for sliding window layers NOTE: * decode-only model currently fails compilation with `use_onnx_subfunctions=True` so avoid using it * 120B model needs NPI, there are two versions of NPI one with and without subfunction both are uploaded here, pass it as `node_precision_info=<path to file>` * It is advised to use `use_onnx_subfunctions=True` with prefill-only model, otherwise the compilation times are too high, with this the model is supposed to export and fail during compile as it needs assert sdk, so user is supposed to run this compilation manually by pasting the command printed in the error --------- Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Signed-off-by: Mamta Singh <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Vinayak Baddi <[email protected]> Co-authored-by: Vinayak Baddi <[email protected]> Co-authored-by: Mamta Singh <[email protected]> Co-authored-by: Mamta Singh <[email protected]>

Added test script to perform end to end Finetuning tests for SFT dataset. Need to add changes to the repo for Seq completion task as well. Current run uses CPU to perform finetuning. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

…e directly for loading the LORA adapters instead of manually doing it. SFTTrainer class init supports peftadapter loading so removed that part from tests. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

quic-meetkuma · 2025-12-31T08:57:12Z

QEfficient/finetune/experimental/tests/test_integrated.py

+    """Parametrized tests for different model and dataset configurations."""
+
+    @pytest.fixture(autouse=True)
+    def setup_and_cleanup(self):


Should we use def setup and def teardown predefined methods? These are executed before and after each tests.

quic-meetkuma · 2025-12-31T09:04:33Z

QEfficient/finetune/experimental/tests/test_integrated.py

+            pytest.fail(f"Unknown task type: {task_type}")
+
+        # Create configuration
+        master_config = MasterConfig(


I have a suggestion. The config_manager should have the capability to either dump the default config to disk at given location or return default config object, which you can use here or in some other test cases as well. Let me know your thoughts.

If we do that way, then we need to extend the config's tests as well. See if this can be done in this PR or next PR.

quic-meetkuma · 2025-12-31T09:06:15Z

QEfficient/finetune/experimental/tests/test_integrated.py

+    @pytest.mark.parametrize(
+        "model_name,task_type,max_eval_step,max_train_step,dataset_name,data_path_fixture,use_peft,config_name",
+        [
+            pytest.param(


Can we convert these configs and function's input argument into some dataclass structure and define those dataclasses like constants at the start of file and use here?

quic-meetkuma · 2025-12-31T09:06:38Z

QEfficient/finetune/experimental/tests/test_integrated.py

+        """
+        from trl import SFTConfig
+
+        # # Get data path if fixture is specified


If these are no longer needed then please remove it.

quic-meetkuma · 2025-12-31T09:07:10Z

QEfficient/finetune/experimental/tests/test_integrated.py

+        #     data_path = request.getfixturevalue(data_path_fixture)
+
+        # Determine auto_class_name based on task type
+        if task_type == "CAUSAL_LM":


I think there has to be some type of enums in our code base for this.

Bdw there is already "type" key in training section of config which is "sft" for most of the cases. See if we can use the same here.

quic-meetkuma · 2025-12-31T09:18:31Z

QEfficient/finetune/experimental/tests/test_integrated.py

+        logger.warning("Trainer instantiated")
+        # Run Training
+        logger.warning(f"Starting training for {config_name}...")
+        train_result = trainer.train()


add a try catch.

quic-meetkuma · 2025-12-31T09:19:52Z

QEfficient/finetune/experimental/tests/test_integrated.py

+        logger.warning(f"Training loss: {train_result.training_loss:.4f}")
+
+        # Test Inference
+        if task_type == "CAUSAL_LM":


Instead of having if else condition for different task_type, should we split the tests for each tasks? The code duplication should be handled properly by having small reusable function. What do you think?

quic-meetkuma · 2025-12-31T09:37:28Z

QEfficient/finetune/experimental/tests/test_trainer.py

+            args=sft_config,
+            train_dataset=dummy_dataset,
+            processing_class=tokenizer,
+            peft_config=peft_config,


Here we are expecting the SFTTrainer class to convert the model into peft model. So after this, check whether model has been modified to become a peft model? e.g. trainable params, presence of lora weights etc.

quic-meetkuma · 2025-12-31T09:37:48Z

QEfficient/finetune/experimental/tests/test_trainer.py

+        hf_model = HFModel(**model_config)
+        model = hf_model.load_model()
+        # Load PEFT Config
+        peft_config = LoraConfig(peft_model_config)


Try to move this to some utility file

quic-meetkuma · 2025-12-31T09:39:29Z

QEfficient/finetune/experimental/tests/test_integrated.py

+from QEfficient.utils.logging_utils import logger
+
+
+class TestParametrizedConfigurations:


These tests are integration tests. In future we may need to write tests which also checks the individual steps loss values. How to do that? How to reuse this code when we write those comparative tests?

Created a constants.py file for the values as well as the ENUMs as mentioned in comments. Created certain util functions for modularity, will add the final changes in utils later on. TODO : Using Registry and ComponentFactory to load every module in the integrated tests is left. Accidentaly added test_trainer in the previous commit so removing it now. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

quic-meetkuma and others added 23 commits November 28, 2025 17:09

"[QEff.finetuning] Adding config_manager and its test cases." (quic#656

609771a

) Added Config_manager to parse the training, model and dataset related arguments. --------- Signed-off-by: Tanisha Chawada <[email protected]>

Revert " "[QEff.finetuning] Adding config_manager and its test cases.…

9666cfb

…"" (quic#666) Reverts quic#656

"[QEff.finetuning} Rebasing: hf_config_mananger." (quic#667)

77abade

Signed-off-by: Tanisha Chawada <[email protected]>

[BUGFIX] Patch for issues with export via replicate_kv_heads script C…

87e421c

…LI for CB (quic#646) InputHandler has changes to create position_ids based on CB batch size. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

Add custom op examples and documentation (quic#638)

607f963

Added step by step instructions for adding custom op in Qeff --------- Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

Added torchvision (quic#650)

a0610b6

Added torchvision 0.22.0 cpu version to environment Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Hem Agnihotri <[email protected]>

[Jenkins]: jenkins Timeout increased (quic#654)

70ab05c

Signed-off-by: Abukhoyer Shaik <[email protected]>

Subfunction fixes for KV cache transform (quic#655)

9bd536b

Signed-off-by: abhishek-singh591 <[email protected]>

[Test]: subfunction test moved to qaic Test Stage (quic#665)

e4bfa95

Signed-off-by: Abukhoyer Shaik <[email protected]>

quic-dhirajku requested review from quic-akuruvil and quic-meetkuma December 24, 2025 10:30

quic-dhirajku requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners December 24, 2025 10:30

Modified tests to just load PeftConfig and pass it to SFTTrainerModul…

4751c0c

…e directly for loading the LORA adapters instead of manually doing it. SFTTrainer class init supports peftadapter loading so removed that part from tests. Signed-off-by: Dhiraj Kumar Sah <[email protected]>

quic-meetkuma requested changes Dec 31, 2025

View reviewed changes

quic-dhirajku force-pushed the ft_experimental branch from 866a140 to 389f15a Compare January 2, 2026 13:15

quic-meetkuma requested review from quic-swatia and tchawada January 6, 2026 10:06

quic-meetkuma changed the title ~~Ft integrated tests~~ [QEff. Finetune]: Ft integrated tests Jan 6, 2026

quic-meetkuma added the fine-tuning label Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QEff. Finetune]: Ft integrated tests #694

[QEff. Finetune]: Ft integrated tests #694

quic-dhirajku commented Dec 24, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

quic-meetkuma Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

		from QEfficient.utils.logging_utils import logger


		class TestParametrizedConfigurations:

[QEff. Finetune]: Ft integrated tests #694

Are you sure you want to change the base?

[QEff. Finetune]: Ft integrated tests #694

Conversation

quic-dhirajku commented Dec 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants