Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End2end tests and some examples have strong environment and data assumptions #958

Open
thvasilo opened this issue Aug 9, 2024 · 0 comments
Assignees

Comments

@thvasilo
Copy link
Contributor

thvasilo commented Aug 9, 2024

For example, if we try to run https://github.com/awslabs/graphstorm/tree/main/training_scripts/gsgnn_mt on the GraphStorm image, we'd run into the error

python3 tests/end2end-tests/data_gen/process_movielens.py
Traceback (most recent call last):
  File "/root/graphstorm/tests/end2end-tests/data_gen/process_movielens.py", line 29, in <module>
    user = pandas.read_csv('/data/ml-100k/u.user', delimiter='|', header=None,
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/data/ml-100k/u.user'

Similarly, to be able to run end2end tests, we'd start by trying to run https://github.com/awslabs/graphstorm/blob/main/tests/end2end-tests/create_data.sh

However, the assumptions that script starts with

mkdir -p /data
cd /data
cp -R /storage/ml-100k /data

Which 1) assumes root permissions by calling mkdir -p /data, which is fine on the GraphStorm image at least, although should be avoided, and that there exists a directory /storage/ml-100k.

The above make it currently not possible for someone to run the end2end tests after cloning the repo in their local env. We should make our scripts agnostic of such paths and files, and allow the end2end tests to run on fresh clones of the repo, and fix any examples that try to use scripts with such assumptions.

@thvasilo thvasilo self-assigned this Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant