Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.add_faiss_index and .add_elasticsearch_index returns ImportError at Google Colab #7456

Open
MapleBloom opened this issue Mar 16, 2025 · 6 comments

Comments

@MapleBloom
Copy link

MapleBloom commented Mar 16, 2025

Describe the bug

At Google Colab
!pip install faiss-cpu works
import faiss no error
but
embeddings_dataset.add_faiss_index(column='embeddings')
returns

[/usr/local/lib/python3.11/dist-packages/datasets/search.py](https://localhost:8080/#) in init(self, device, string_factory, metric_type, custom_index)
247 self.faiss_index = custom_index
248 if not _has_faiss:
--> 249 raise ImportError(
250 "You must install Faiss to use FaissIndex. To do so you can run conda install -c pytorch faiss-cpu or conda install -c pytorch faiss-gpu. "
251 "A community supported package is also available on pypi: pip install faiss-cpu or pip install faiss-gpu. "

because
_has_faiss = importlib.util.find_spec("faiss") is not None at the beginning of datasets/search.py returns False
when
the same code at colab notebook returns
ModuleSpec(name='faiss', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7b7851449f50>, origin='/usr/local/lib/python3.11/dist-packages/faiss/init.py', submodule_search_locations=['/usr/local/lib/python3.11/dist-packages/faiss'])

But

import datasets
datasets.search._has_faiss

at colab notebook also returns False

The same story with _has_elasticsearch

Steps to reproduce the bug

  1. Follow https://huggingface.co/learn/nlp-course/chapter5/6?fw=pt at Google Colab
  2. till embeddings_dataset.add_faiss_index(column='embeddings')
  3. embeddings_dataset.add_elasticsearch_index(column='embeddings')
  4. https://colab.research.google.com/drive/1h2cjuiClblqzbNQgrcoLYOC8zBqTLLcv#scrollTo=3ddzRp72auOF

Expected behavior

I've only started Tutorial and don't know exactly. But something tells me that embeddings_dataset.add_faiss_index(column='embeddings')
should work without Import Error

Environment info

Google Colab notebook with default config

@Akshay-Sisodia
Copy link

I can fix this.
It's mainly because faiss-gpu requires python<=3.10 but the default python version in colab is 3.11. We just have to downgrade the CPython version down to 3.10 and it should work fine.

@MapleBloom
Copy link
Author

I think I just had no chance to meet with faiss-cpu.
It could be import problem?
_has_faiss gets its value at the beginning of datasets/search.
I tried to call object before import faiss, so _has_faiss took False. And never updated later.

@Akshay-Sisodia
Copy link

Akshay-Sisodia commented Mar 17, 2025 via email

@MapleBloom
Copy link
Author

you can't meet the requirements

It is not the case (or I didn't reach this point) because the same code in notebook
importlib.util.find_spec("faiss")
finds faiss. I've mention it.
I think the problem is in the very moment when _has_faiss takes its value and never try again.
(or it couldn't find the path that was easily found when started from my code)

@Akshay-Sisodia
Copy link

Akshay-Sisodia commented Mar 17, 2025 via email

@MapleBloom
Copy link
Author

When you run the first cell containing pip install faiss-cpu does it
install it?

Yes. It was installed succesfully.
Methods of datasets library that depends on _has_faiss constant didn't start to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants