Hi,
Sorry for asking too much, but I also encountered problems when running inference, as shown below. Can you help me fix this?
I ran this cmd:
sbatch bashscript/run_infer_demo.sh
and then they resulted in these problems:
usage: model_vqa_med_CoT.py [-h] [--model-name MODEL_NAME]
[--mm_dense_connector_type MM_DENSE_CONNECTOR_TYPE]
[--num_l NUM_L] [--image-folder IMAGE_FOLDER]
[--question-file QUESTION_FILE]
[--answers-file ANSWERS_FILE]
[--mm-projector MM_PROJECTOR] [--contrastive]
[--vision-tower VISION_TOWER]
[--conv-mode CONV_MODE] [--use_rag USE_RAG]
[--num-chunks NUM_CHUNKS]
[--step_given STEP_GIVEN] [--chunk-idx CHUNK_IDX]
[--answer-prompter]
usage: model_vqa_med_CoT.py [-h] [--model-name MODEL_NAME]
[--mm_dense_connector_type MM_DENSE_CONNECTOR_TYPE]
[--num_l NUM_L] [--image-folder IMAGE_FOLDER]
[--question-file QUESTION_FILE]
[--answers-file ANSWERS_FILE]
[--mm-projector MM_PROJECTOR] [--contrastive]
[--vision-tower VISION_TOWER]
[--conv-mode CONV_MODE] [--use_rag USE_RAG]
[--num-chunks NUM_CHUNKS]
[--step_given STEP_GIVEN] [--chunk-idx CHUNK_IDX]
[--answer-prompter]
model_vqa_med_CoT.py: error: argument --step_given: invalid int value: 'None'
model_vqa_med_CoT.py: error: argument --step_given: invalid int value: 'None'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 210, in _process_chunk
return [fn(*args) for args in chunk]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 210, in
return [fn(*args) for args in chunk]
^^^^^^^^^
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 56, in run_job
subprocess.run(cmd, shell=True, check=True)
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa_med_CoT.py --model-name /weights_finetuned/CoT-100 --mm_dense_connector_type 1 --num_l 6 --question-file /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/test_29_12_25.json --image-folder /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/images/ --answers-file /test_answer/CoT-100-chunk0.jsonl --num-chunks 2 --step_given None --conv-mode cot --use_rag True --chunk-idx 0 ' returned non-zero exit status 2.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 78, in
main()
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 68, in main
list(executor.map(run_job_with_args, range(args.num_chunks))) # Use run_job_with_args instead of lambda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 620, in _chain_from_iterable_of_lists
for element in iterable:
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
subprocess.CalledProcessError: Command 'CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa_med_CoT.py --model-name /weights_finetuned/CoT-100 --mm_dense_connector_type 1 --num_l 6 --question-file /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/test_29_12_25.json --image-folder /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/images/ --answers-file /test_answer/CoT-100-chunk0.jsonl --num-chunks 2 --step_given None --conv-mode cot --use_rag True --chunk-idx 0 ' returned non-zero exit status 2.
Traceback (most recent call last):
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_eval_CoT.py", line 191, in
candidate = json.load(open(args.candidate, 'r'))
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'candidate.json'
In general, I now face two problems: during training and during inference.
Hope you guys can help me. Thanks.
Hi,
Sorry for asking too much, but I also encountered problems when running inference, as shown below. Can you help me fix this?
I ran this cmd:
sbatch bashscript/run_infer_demo.sh
and then they resulted in these problems:
usage: model_vqa_med_CoT.py [-h] [--model-name MODEL_NAME]
[--mm_dense_connector_type MM_DENSE_CONNECTOR_TYPE]
[--num_l NUM_L] [--image-folder IMAGE_FOLDER]
[--question-file QUESTION_FILE]
[--answers-file ANSWERS_FILE]
[--mm-projector MM_PROJECTOR] [--contrastive]
[--vision-tower VISION_TOWER]
[--conv-mode CONV_MODE] [--use_rag USE_RAG]
[--num-chunks NUM_CHUNKS]
[--step_given STEP_GIVEN] [--chunk-idx CHUNK_IDX]
[--answer-prompter]
usage: model_vqa_med_CoT.py [-h] [--model-name MODEL_NAME]
[--mm_dense_connector_type MM_DENSE_CONNECTOR_TYPE]
[--num_l NUM_L] [--image-folder IMAGE_FOLDER]
[--question-file QUESTION_FILE]
[--answers-file ANSWERS_FILE]
[--mm-projector MM_PROJECTOR] [--contrastive]
[--vision-tower VISION_TOWER]
[--conv-mode CONV_MODE] [--use_rag USE_RAG]
[--num-chunks NUM_CHUNKS]
[--step_given STEP_GIVEN] [--chunk-idx CHUNK_IDX]
[--answer-prompter]
model_vqa_med_CoT.py: error: argument --step_given: invalid int value: 'None'
model_vqa_med_CoT.py: error: argument --step_given: invalid int value: 'None'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 210, in _process_chunk
return [fn(*args) for args in chunk]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 210, in
return [fn(*args) for args in chunk]
^^^^^^^^^
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 56, in run_job
subprocess.run(cmd, shell=True, check=True)
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa_med_CoT.py --model-name /weights_finetuned/CoT-100 --mm_dense_connector_type 1 --num_l 6 --question-file /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/test_29_12_25.json --image-folder /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/images/ --answers-file /test_answer/CoT-100-chunk0.jsonl --num-chunks 2 --step_given None --conv-mode cot --use_rag True --chunk-idx 0 ' returned non-zero exit status 2.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 78, in
main()
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_med_datasets_eval_batch_CoT.py", line 68, in main
list(executor.map(run_job_with_args, range(args.num_chunks))) # Use run_job_with_args instead of lambda
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/process.py", line 620, in _chain_from_iterable_of_lists
for element in iterable:
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/sw/rev/25.04/rome_mofed_cuda80_rocky8/linux-rocky8-zen2/gcc-13.2.0/python-3.11.9-kukywutwnl5yzx55y6qol3awkq2g7vw6/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
subprocess.CalledProcessError: Command 'CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa_med_CoT.py --model-name /weights_finetuned/CoT-100 --mm_dense_connector_type 1 --num_l 6 --question-file /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/test_29_12_25.json --image-folder /user/nguyen50/u12045/.project/dir.project/Verified_SCOT_RAG/S-Chain/data/s_chain_en/images/ --answers-file /test_answer/CoT-100-chunk0.jsonl --num-chunks 2 --step_given None --conv-mode cot --use_rag True --chunk-idx 0 ' returned non-zero exit status 2.
Traceback (most recent call last):
File "/mnt/vast-nhr/projects/nim00014/Verified_SCOT_RAG/S-Chain/architectures/Exgra-Med/llava/eval/run_eval_CoT.py", line 191, in
candidate = json.load(open(args.candidate, 'r'))
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'candidate.json'
In general, I now face two problems: during training and during inference.
Hope you guys can help me. Thanks.