Skip to content

utu/practice/README.md 处描述上传自定义本地数据集存在错误 #230

@itlantu

Description

@itlantu

错误描述

Image

Step1 Option 2: Upload Custom Local Datasets (上传自定义数据集)处描述dataset,source,question,answer为必填字段,但使用官方提供的数据模板,调用scripts/data/upload_dataset.py后实测报错

单条数据

{"dataset": "YourDataset", "source": "training_free_grpo", "question": "What is 2+2?", "answer": "4"}

报错

Traceback (most recent call last):
  File "C:\Users\IT_la\Desktop\mbpp_youtu\youtu-agent\scripts\data\upload_dataset.py", line 78, in <module>
    main()
    ~~~~^^
  File "C:\Users\IT_la\Desktop\mbpp_youtu\youtu-agent\scripts\data\upload_dataset.py", line 74, in main
    upload_dataset(args.file_path, args.dataset_name, data_format=args.data_format)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\IT_la\Desktop\mbpp_youtu\youtu-agent\scripts\data\upload_dataset.py", line 47, in upload_dataset
    dataset_sample = convert_format_llamafactory(data)
  File "C:\Users\IT_la\Desktop\mbpp_youtu\youtu-agent\scripts\data\upload_dataset.py", line 25, in convert_format_llamafactory
    assert len(question) > 0, "Either 'instruction' or 'input' must be provided."
           ^^^^^^^^^^^^^^^^^
AssertionError: Either 'instruction' or 'input' must be provided.
Image

错误分析

分析了报错来源scripts/data/upload_dataset.py文件

发现相关函数convert_format_llamafactory中产生AssertionError的代码为:

 question = [data.get("instruction", None), data.get("input", None)]
 question = [s for s in question if s is not None and s.strip() != ""]
 assert len(question) > 0, "Either 'instruction' or 'input' must be provided."

修改数据中的question字段为instructioninput即可正常上传数据

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions