You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That will download the model data from the provided S3 path into /opt/ml/input/data/<channel_name> which by default for models will be /opt/ml/input/data/model (see the Estimator docs)
But then here, we try to download the model again, this time into /opt/ml/gsgnn_model
The text was updated successfully, but these errors were encountered:
GSF tries to download the models into /opt/ml/gsgnn_model, as seen here https://github.com/thvasilo/graphstorm/blob/8e7c4c2e10accb114f2beccaa36ec3094d01241c/python/graphstorm/sagemaker/sagemaker_infer.py#L173
One a job with large model (learnable embeddings included) we see this in the logs in terms of disk space:
The partition mounted under
/
, and I think that includes/opt
, will only have 90GB available.To be able to download larger datasets/models we need to be used the partition mounted under
/tmp
.Also, in our inference launch script we define
https://github.com/thvasilo/graphstorm/blob/8e7c4c2e10accb114f2beccaa36ec3094d01241c/sagemaker/launch/launch_infer.py#L120
That will download the model data from the provided S3 path into /opt/ml/input/data/<channel_name> which by default for models will be /opt/ml/input/data/model (see the Estimator docs)
But then here, we try to download the model again, this time into /opt/ml/gsgnn_model
The text was updated successfully, but these errors were encountered: