Can not upload files with chinese characters #1676

LiuPalin · 2024-03-03T08:04:44Z

LiuPalin
Mar 3, 2024

When I try to upload files with chinese characters, privateGPT failed and responed a message as below. The error is 'UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 12: illegal multibyte sequence'

I think I should use character set as 'utf-8' when file loading, but where should I configure the character set?

Traceback (most recent call last):
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\blocks.py", line 1594, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\blocks.py", line 1176, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\utils.py", line 689, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\ui\ui.py", line 243, in _upload_file
self._ingest_service.bulk_ingest([(str(path.name), path) for path in paths])
File "D:\AI-local\privateGPT\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest
documents = self.ingest_component.bulk_ingest(files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest
documents = IngestionHelper.transform_file_into_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents
documents = IngestionHelper._load_file_to_documents(file_name, file_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 48, in _load_file_to_documents
return string_reader.load_data([file_data.read_text()])
^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\pathlib.py", line 1059, in read_text
return f.read()
^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 12: illegal multibyte sequence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not upload files with chinese characters #1676

{{title}}

Replies: 0 comments

Select a reply

Can not upload files with chinese characters #1676

LiuPalin Mar 3, 2024

Replies: 0 comments

LiuPalin
Mar 3, 2024