You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to upload files with chinese characters, privateGPT failed and responed a message as below. The error is 'UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 12: illegal multibyte sequence'
I think I should use character set as 'utf-8' when file loading, but where should I configure the character set?
Traceback (most recent call last):
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\blocks.py", line 1594, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\blocks.py", line 1176, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\site-packages\gradio\utils.py", line 689, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\ui\ui.py", line 243, in _upload_file
self._ingest_service.bulk_ingest([(str(path.name), path) for path in paths])
File "D:\AI-local\privateGPT\privateGPT\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest
documents = self.ingest_component.bulk_ingest(files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_component.py", line 127, in bulk_ingest
documents = IngestionHelper.transform_file_into_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 30, in transform_file_into_documents
documents = IngestionHelper._load_file_to_documents(file_name, file_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI-local\privateGPT\privateGPT\private_gpt\components\ingest\ingest_helper.py", line 48, in _load_file_to_documents
return string_reader.load_data([file_data.read_text()])
^^^^^^^^^^^^^^^^^^^^^
File "D:\Anaconda3\envs\GPT_Python3\Lib\pathlib.py", line 1059, in read_text
return f.read()
^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 12: illegal multibyte sequence
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
When I try to upload files with chinese characters, privateGPT failed and responed a message as below. The error is 'UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 12: illegal multibyte sequence'
I think I should use character set as 'utf-8' when file loading, but where should I configure the character set?
Beta Was this translation helpful? Give feedback.
All reactions