Skip to content

Multi-label classification always returns empty results in run_classification.py example script #43116

@ziorufus

Description

@ziorufus

System Info

Dell workstation with NVIDIA Titan Xp (12 GB RAM), driver version 535.261.03, CUDA 12.2.
Ubuntu Linux 24.04, Python 3.12.

Who can help?

I'm using run_classification.py example script (in pytorch/text-classification folder), but when running with multi-labelled data it always returns empty values.

The file predict_results.txt contains:

index	prediction
0	[]
1	[]
2	[]
3	[]
4	[]
5	[]
...

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This is the command:

WANDB_DISABLED=true python examples/pytorch/text-classification/run_classification.py \
	--text_column_name text \
	--train_file train.json \
	--validation_file dev.json \
	--model_name_or_path google-bert/bert-base-uncased \
	--shuffle_train_dataset \
	--do_train \
	--output_dir out \
	--num_train_epochs 3 \
	--per_device_train_batch_size 96 \
	--per_device_eval_batch_size 96 \
	--do_predict \
	--test_file test.json \
	--overwrite_output_dir \
	--do_eval \
	--label_column_name labels

This is the format of the data:

[
  {
    "text": "case c-116/15: action brought on 6 march 2015 \u2014 european parliament ...",
    "labels": [
      "4359",
      "5181"
    ]
  },
  {
    "text": "case c-20/15 p: appeal brought on 19 january 2015 ...",
    "labels": [
      "1484",
      "5541",
      "889"
    ]
  },
...
]

Full data can be found here: https://dh-server.fbk.eu/test-lex/

Expected behavior

The file predict_results.txt should contain the predictions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions