Replies: 7 comments 28 replies
-
I did not realize that the versioning system in For context, the primary reason I made the switch was to eliminate the egregious performance penalties from calling the AWS API in large pipelines. In I solved this problem by caching the results of a paginated LIST request, which reduced AWS HTTPS calls by 1000x and time spent by >60x. But unfortunately, ListObjectsV2 is not version-aware. The only alternative that might work is ListObjectVersions, which would download every single version of every single object in the pipeline, which would be an unmanageably large payload. So as a side effect, |
Beta Was this translation helpful? Give feedback.
-
Hmm, this is definitely going to require us to consider different strategies. I wonder if it is possible for us to implement a version where targets are named by hash in the S3 store, rather than by target name, and drop the versioning feature in the S3 bucket entirely. (This would have the added benefit of allowing the store to move between S3 providers, which is not possible for S3 object version history). |
Beta Was this translation helpful? Give feedback.
-
I'm not going to argue for you to change directions, but I must say this change surprised me because of how well suited the S3 versioning feature was for this workflow. I really thought it was one of the primary design goals for it. I don't know how widespread the user base is of this feature but I would be curious what their use cases are and if this will be an issue for others. |
Beta Was this translation helpful? Give feedback.
-
I was about to ask if, in the long-term, you would be open to custom repository back-ends other than "aws" and "gcp" that might allow for custom behaviors. However, it occurs to me that this might be entirely possible to implement using custom formats, where read and write functions access a local or cloud key-value store and just save the keys to disk. I'll toy with this a bit, but I'm sure it is also an "off-label" use case so I'll check before trying anything serious with it. |
Beta Was this translation helpful? Give feedback.
-
We're also using S3 versioning with a shared bucket for multiple users on a project, so this is an interesting discussion to me. Our server is running locally (minio) and we're also doing As I can understand the change was done to speed up working with AWS. But the side-effect is that now targets determines if a pipeline is up-to-date differently? Before it was considering local metadata vs AWS store, and now it ignores local metadata? |
Beta Was this translation helpful? Give feedback.
-
Looking back on this now. The approach you describe is apparently called "content-addressable storage" (CAS). Like I mentioned, It would be better to be able to opt into CAS regardless of the other storage settings (e.g. AWS vs local). I am still not sure this is feasible for |
Beta Was this translation helpful? Give feedback.
-
An update: as of #1322, content-addressable storage (CAS) is now part of I think the only tricky problems left to the user are (1) garbage collection / retention policy, and (2) an efficient targets/R/tar_repository_cas.R Lines 1 to 238 in 4fe9da7 |
Beta Was this translation helpful? Give feedback.
-
Help
Description
My team recently realized that this change in targets 1.4 may have significant consequences for our cloud-based workflows:
We use S3 versioning with a shared bucket for multiple users on a project. It has been excellent for letting us share compute-intense targets and avoiding downloading large targets that can be skipped, even as each user works on different branches of a project where some targets may diverge. As long a version of the target in the bucket exists that matches the local metafile, it could be skipped, and overwriting the latest version wouldn't interfere with others' state. This also means CI builds of targets can take advantage of already-built targets without interfering with development. Under targets 1.3, this requires setting `repository_meta = "local", which is fine. However, does this mean that in 1.4+, this approach to collaboration will no longer work? Is there a way to recover the old behavior? Being able to collaborate this way has been the primary benefit of cloud-based versioning for us.
@emmamendelsohn
Beta Was this translation helpful? Give feedback.
All reactions