Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-portable scratch directory #5

Open
weaverba137 opened this issue Feb 4, 2025 · 2 comments
Open

Non-portable scratch directory #5

weaverba137 opened this issue Feb 4, 2025 · 2 comments
Assignees

Comments

@weaverba137
Copy link
Member

dlairflow.util.user_scratch is specific not just to Data Lab, but one particular Data Lab server. This can be made more portable.

@weaverba137 weaverba137 self-assigned this Feb 4, 2025
@weaverba137
Copy link
Member Author

@rnikutta, I suggest that rather than make any assumptions about the directory path at all, we define an environment variable, SCRATCH or DLAIRFLOW_SCRATCH that must be set in a DAG (could be inside or outside of a task). I think it is better to raise an error if the environment variable is not set, although the fallback could be to use /tmp instead.

The problem with /tmp though is that some systems wipe /tmp on reboot, and I think we want something more persistent than that.

Also note, I don't want to use AIRFLOW_SCRATCH since Airflow already defines many environment variables, and I want to preclude any conflict.

@weaverba137
Copy link
Member Author

Another thing I think is needed here is a way for multiple users to have separate, non-conflicting scratch directories. That can't be based on os.environ['USER'] though because the value of that is always airflow. Perhaps there is a way to programmatically obtain the user as defined in the Airflow web interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant