[RFC] Ease-of-use Task Register for Autotune API #1540

xin3he · 2024-01-15T06:11:22Z

xin3he
Jan 15, 2024
Collaborator

Ease-of-use Task Register for Autotune API

Principles

Only require setting the target task name for autotune evaluation.
Easy to extend for customized task and welcome contribution to INC.

Requirement

Provides task name when registering.
Requires model as input and float as output.

@register_task(name="text-generation")
def eval_func(model, model_name=None):
    eval_dataset = init_dataset("xxx")
    accuracy = Accuracy()
    for data, label in eval_dataset:
        output = model(data)
        accuracy.update(output, label)
    return accuracy

Repo Architecture

torch
- tasks # make sure all algos here have user interface.
  - text_generation # lm_eval, lm_code_eval
  - image_classification # cifar, imagenet
  - ... # welcome for contribution

Examples

lm_eval

@register_task(name="lm_eval")
def eval_func(model, model_name, tasks=["lambada_openai"]):
    from intel_extension_for_transformers.llm.evaluation.lm_eval import evaluate
    results = evaluate(
        model="hf-causal",
        model_args="pretrained=" + model_name + ",tokenizer=" + model_name + ",dtype=float32",
        user_model=model,
        batch_size=32,
        tasks=tasks,
    )
    return results["accuracy"]

lm_code_eval

@register_task(name="lm_code_eval")
def eval_func(model, model_name, tasks=None):
    from intel_extension_for_transformers.llm.evaluation.lm_code_eval import evaluate
    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    results = evaluate(
        model=user_model,
        tokenizer=tokenizer,
        tasks=",".join(tasks),
        batch_size=args.batch_size,
    )
    return results["accuracy"]

Usage

model = autotune(
    model,
    conf,
    example_inputs, # example_inputs for jit.trace
    run_fn, # calibration function
    task="lm_eval", # registered evaluation task
    task_args={
        "model_name": "facebook/opt-125m",
        "tasks": ["lambada_openai", "hellaswag", "winogrande", "piqa", "wikitext"],
    }
)

Answered by xin3he

Jan 16, 2024

Decision:

Not to implement

Reasons:

For CV tasks, the imagenet dataset is not available for auto downloading, so it's not easy to use. (from Yi)
For LLM tasks, LLM customized eval_func should be defined in ITREX, INC keeps current API. (from Feng)

View full answer

thuang6 · 2024-01-15T08:14:09Z

thuang6
Jan 15, 2024
Maintainer

One general comments: Given current pile calibration dataset, some Chinese evaluation tasks like CEval/CMMLU highly likely perform poor. If we consider to support different types of evaluation (Chinese, Math, Code), we need to have a new better (most likely mixed) calibration dataset.

1 reply

xin3he Jan 15, 2024
Collaborator Author

I see, I'm also considering whether letting the run_fn(calibration) re-use the task name register.

xin3he · 2024-01-15T08:24:03Z

xin3he
Jan 15, 2024
Collaborator Author

I also consider adding task_name, like lm_eval_small by limit the sample numbers, or args, like limit=100, for a quick autotune.

0 replies

changwangss · 2024-01-15T08:35:34Z

changwangss
Jan 15, 2024
Collaborator

May I know what's the purpose of autotune ?, if you want to fallback op1, op2 to fp32 dtype one by one, I don't suggest use lm-eval as evaluation func, because lm-eval is usually used by LLM models, calibration and evaluation also need much time, if you want to tune many times, it will OOM.

3 replies

xin3he Jan 15, 2024
Collaborator Author

Do you mean the memory will accumulate when tuning each time?

changwangss Jan 15, 2024
Collaborator

yes，(torchscript mode)jit model for ipex int8 will be released until the program is finished running.

xin3he Jan 16, 2024
Collaborator Author

Yes, it's a known issue of IPEX int8, and we can research to resolve it by only converting the model (w/o jit.trace) for accuracy validation.
From my perspective, this RFC is focusing on the ease-of-use of autotune, not the autotune shortages. The issue you mentioned can be discussed later. Thanks for raising that.

xin3he · 2024-01-16T07:23:10Z

xin3he
Jan 16, 2024
Collaborator Author

Decision:

Not to implement

Reasons:

For CV tasks, the imagenet dataset is not available for auto downloading, so it's not easy to use. (from Yi)
For LLM tasks, LLM customized eval_func should be defined in ITREX, INC keeps current API. (from Feng)

0 replies

[RFC] Ease-of-use Task Register for Autotune API #1540

Uh oh!

Uh oh!

xin3he Jan 15, 2024 Collaborator

Ease-of-use Task Register for Autotune API

Principles

Requirement

Examples

lm_eval

lm_code_eval

Usage

Replies: 4 comments · 4 replies

Uh oh!

thuang6 Jan 15, 2024 Maintainer

Uh oh!

xin3he Jan 15, 2024 Collaborator Author

Uh oh!

xin3he Jan 15, 2024 Collaborator Author

Uh oh!

changwangss Jan 15, 2024 Collaborator

Uh oh!

xin3he Jan 15, 2024 Collaborator Author

Uh oh!

Uh oh!

changwangss Jan 15, 2024 Collaborator

Uh oh!

Uh oh!

xin3he Jan 16, 2024 Collaborator Author

Uh oh!

Uh oh!

xin3he Jan 16, 2024 Collaborator Author

xin3he
Jan 15, 2024
Collaborator

Replies: 4 comments 4 replies

thuang6
Jan 15, 2024
Maintainer

xin3he Jan 15, 2024
Collaborator Author

xin3he
Jan 15, 2024
Collaborator Author

changwangss
Jan 15, 2024
Collaborator

xin3he Jan 15, 2024
Collaborator Author

changwangss Jan 15, 2024
Collaborator

xin3he Jan 16, 2024
Collaborator Author

xin3he
Jan 16, 2024
Collaborator Author