Skip to content

[sharktank] Add ops.clone and implement for DefaultPrimitiveTensor #1222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Apr 4, 2025

Change sharded tensors to clone their data, but not the name to avoid name aliasing. Cloning should be thought as equivalent to b = a + 0.

Change sharded tensors to clone their data, but not the name to avoid
name aliasing. Cloning should thought as equivalent to `b = a + 0`.
@sogartar
Copy link
Contributor Author

sogartar commented Apr 7, 2025

This PR depends on #1227.

@sogartar sogartar marked this pull request as ready for review April 7, 2025 16:46
@sogartar sogartar requested a review from Alex-Vasile April 7, 2025 16:46
@@ -1075,7 +1086,7 @@ def __init__(
)

def clone(self, **kwargs) -> "SplitPrimitiveTensor":
kwargs["name"] = kwargs.get("name", self.name)
# We don't copy the name to not introduce name aliasing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem? I purposefully copied the name in order to keep richer metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that names should be unique. If the data is not the same it does not make sense that the name is the same. We are in the same compute graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops that insert computation in the graph should not create tensors with the same name.

def clone(self) -> "InferenceTensor":
from .. import ops

# We don't clone due the name to not introduce name aliasing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "due" looks out of place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it.

Comment on lines +43 to +47
# Clone introduces a tensor copy operation in the graph. It does not make sense to
# clone also the ExternalTensorTrait. The expectation is that when a user clones a
# tensor if they modify in-place the result this should not affect the original
# tensor. If we are to clone also the ExternalTensorTrait then we would have
# external_name aliasing, but the expectation is that there is no data aliasing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pipeline parallelism I need the external tensor trait behaviour to copy. We don't want to bake the pipeline parallelism decisions into the weights, so they are applied to the tensors when loaded from file inside an export script, e.g. export_paged_llama_v1.py, FYI changes aren't up yet.

.clone was added for the following behaviour:

  1. Load tensor t.
  2. Change .devices or move .shards to different devices with ops.transfer_to_logical_device
  3. t_new = t.clone(ts=moved_shards, devices=new_devices).
  4. Store back into theta.
  5. Continue exporting model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pipeline stages are not going to be a part of the same compute graph then you could do a deepcopy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there is a discrepancy of expectations.

  1. Usually a clone method does a deep copy.
  2. In torch this term is loaded with more meaning as it creates a tensor that is a part of the compute graph. I think we should avoid deviating from that meaning as it will introduce confusion down the line. If operations under sharktank.ops or methods of tensors have the same name as in torch then they should do the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoiding cloning the underlying torch tensor. It is mildly incorrect but it could drastically increase memory use when exporting. We can considering renaming clone to something else but it seemed good enough for the modelling library.

The primary motivation behind clone was to provide a mechanism to change non-numeric tensor information, e.g. device placement. Its mainly so that we can better handle different tensor types (Replicated, Sharded, Quantized) without fighting wit hthe underlying tensor data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the name is the problem then. I did not introduce this to be equivalent to torch.clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, another name then would be desirable as we may want something like torch.clone as some point.

cloned_tensor = sharded_tensor.clone()
assert sharded_tensor.is_deep_equal(cloned_tensor)
assert cloned_tensor.name != sharded_tensor.name
assert sharded_tensor.is_deep_equal(cloned_tensor, compare_name=False)
assert iterables_equal(sharded_tensor.devices, cloned_tensor.devices)

def testCloneTensorTraits(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this test passing if ExternalTensorTrait is not being copied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_deep_equal does not compare this. It probably should.

@sogartar sogartar requested a review from Alex-Vasile April 8, 2025 15:07
@rsuderman
Copy link
Collaborator

@sogartar could I get a run through behind the motivation for the PR? Most of the intention for clone was being able to create a new meta tensor and override non-numeric information, so cloning a torch tensor does not really fulfill that purpose.

Comment on lines +43 to +47
# Clone introduces a tensor copy operation in the graph. It does not make sense to
# clone also the ExternalTensorTrait. The expectation is that when a user clones a
# tensor if they modify in-place the result this should not affect the original
# tensor. If we are to clone also the ExternalTensorTrait then we would have
# external_name aliasing, but the expectation is that there is no data aliasing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoiding cloning the underlying torch tensor. It is mildly incorrect but it could drastically increase memory use when exporting. We can considering renaming clone to something else but it seemed good enough for the modelling library.

The primary motivation behind clone was to provide a mechanism to change non-numeric tensor information, e.g. device placement. Its mainly so that we can better handle different tensor types (Replicated, Sharded, Quantized) without fighting wit hthe underlying tensor data.

@sogartar
Copy link
Contributor Author

sogartar commented Apr 8, 2025

@rsuderman, the main motivation behind this PR is that clone in torch has already established semantics and if we are to name something the same way it should have the same semantics.
If we want something else, which seems to be the case from the discussions here, we should name it something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants