Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0025 Improving incremental builds #39

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

peterbell10
Copy link

@peterbell10 peterbell10 commented Feb 1, 2022

This RFC proposes changes to ATen that will allow more granular header dependencies and tools to enforce their usage across the codebase which should greatly improve incremental and cached build performance.

Note that a lot of this has already been implemented and merged but this RFC should provide a complete picture of the motivation and how these PRs fit together.

Rendered version: https://github.com/pytorch/rfcs/blob/rfc-improve-build-times/RFC-0025-improving-incremental-builds.md

write code in compliance with the two enforcement macros and linked in
the corresponding error messages.

## **Unresolved questions**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that most of the above is done today. Do we want to extend this rfc to try and answer such questions?

## **Unresolved questions**
- Is it possible/desirable to enforce `TORCH_ASSERT_NO_OPERATORS` automatically?
e.g. for all `.cu` files, or all files over a certain compile time.
- Can `include-what-you-use` completely automate operator includes?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine a daily/weekly automatic CI that automatically runs/fix this similar to how we do for not important python lint with pytorch/pytorch@1edf6f5

Do you think that could work? Any danger of BC-breaking change for c++ users?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iwyu does have a "safe mode" that never remove includes from header files so I think it's possible without BC-breaking changes. The main worries are that it sometimes requires manual input like telling it which headers are public or private (e.g. List-inl.h is private and List.h is public) and also additional in-source pragmas (e.g. to allow List.h to include List-inl.h).

I think the ATen/ops headers are fairly safe though so it may be possible to filter the iwyu output to only change operator headers.

- Time to rebuild after adding editing a method operator
- `sccache` miss rate in open source CI

## **Drawbacks**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of our users include internal file and rely on the fact that most of ATen is available at that point.
This series of change (by cleaning core) will break them. Is there anything we can do to reduce such breakage while still fixing our build?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could do something like

#ifndef CAFFE2_BUILD_MAIN_LIB
#include <ATen/ATen.h>
#endif

However, I think that user code should really just be including ATen.h itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the fix is really on the other side there.

- Can `include-what-you-use` completely automate operator includes?
Tools exist for strictly managing _all_ includes, but that would be
a significant change from existing include style.
- Is it worth adopting `TensorBase` more widely than just kernels?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could answer this by seeing how much benefit we will get from doing that? Is there an easy way to see how this will change the metrics discussed above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of an easy way but some numbers to put it into perspective: 1184 files include Tensor.h today which is ~70% of the compile time for ATen and torch. ~40% of operators have some method variant so keeping it this way means very roughly speaking we can expect ~60% of operator changes to have fast incremental builds.

However, I suspect that methods change less often and that more development effort will be going into new operators which are mostly functional-only. So in practice it may be much more the 60% that are fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great!

@subramen subramen added the commenting RFC is open for discussions label Feb 9, 2022
@albanD
Copy link
Contributor

albanD commented Feb 9, 2022

The CUDA builds are not mentioned in this doc. Does it need any special casing?

@peterbell10
Copy link
Author

CUDA builds don't require any additional work for the main changes here. .cu files can use the per-operator headers as they would any other header and TensorBase was used first and foremost for cuda files.

One consideration is that include-what-you-use doesn't support cuda files, so using per-operator headers takes a little more manual work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed commenting RFC is open for discussions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants