Skip to content

Conversation

msdemlei
Copy link
Collaborator

This PR suggests limit groups with which operators can override the default limit specifications (e.g., async has a lot more time). This came out of astropy/pyvo#685. Actually, it would be simple to add an expression of a preferred mode for a given service on top of this. The discussion in the bug seems to indicate that there is little use for that, but I am not too convinced. With a bit of encouragement, I'd try this.

Limits on the time between job creation and
destruction time.
Limits for how long a service will retain async jobs
for the service's default access

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of the idea of "default access mode", but in order to stay backward compatible, there is not other choice.

However, you speak about "retaining async jobs", but the example of "typical default access mode" is "anonymous-sync". It does not quite work here, isn't it?

Then, where do you specify what is the default access mode?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the best thing to do is not to give example of default mode...

@msdemlei
Copy link
Collaborator Author

msdemlei commented Sep 16, 2025 via email

@mbtaylor
Copy link
Member

There's rather a lot going on in this PR that is unrelated to the limits business, but I think I found the relevant parts.

Repeating the limit elements in a PerModeLimits looks a bit clunky, though I may not have any better ideas. One other possibility would be adding an optional mode attribute to the four limit elements allowing them to indicate what mode they apply to, and allowing multiple instances of each of those elements, with no PerModeLimits element. Older clients might get confused by seeing multiple limit elements, but would probably just pick the value for one mode, and since it's not specified at present what mode that applies to they won't be grossly misinformed. However, I'm not necessarily saying that would be better than the proposed structure.

As far as wording goes, rather than talking about a default access mode and suggesting a "typical" assignment for what that is, I'd be inclined just to call the limit "default", e.g. "Default limits on the size of uploaded data" rather than "Limits on the size of uploaded data for the service's default access mode (typically anonymous-sync)". Services that don't want to distinguish can then just put the one limit in as now, but those which have per-mode limits can add mode-specific instances to cover the non-default case(s). From a client's point of view, it would just pick up the default limit, then override that for the current access mode if a suitable mode-specific one is present.

Limits for "auth-sync" and "auth-async" are likely to be too blunt an instrument for some services, since the limits in force may depend on your authenticated identity. The existing TAPRegExt 1.0 text (modified slightly by some earlier post-1.0 commit, and then deleted by this PR) already tackles this:

If a service supports authentication and has different limits depending on what user is authenticated, it should return the limits applying to the logged user.

I would therefore suggest reinstating the above text and restricting the modes to "sync" and "async". I admit that doesn't cover registry content, but it does take care of giving clients reading the capabilities endpoint an accurate idea of limits.

@msdemlei
Copy link
Collaborator Author

msdemlei commented Sep 24, 2025 via email

</xs:simpleContent>
</xs:complexType>

<xs:complexType name="PerModeLimits">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one per-mode use case that I'm thinking about that doesn't logically fit within PerModeLimits: output formats with mode-specific technical constraints. For example, we're considering VOParquet support exclusively for async queries, since partitioning large result sets across multiple files isn't feasible within sync queries.

What would be the optimal design if we wan to consider this? Should we rename PerModeLimits to PerModeCapabilities to encompass both limits and format capabilities, or would per-mode formats warrant a separate grouping?

@msdemlei
Copy link
Collaborator Author

msdemlei commented Sep 25, 2025 via email

@pdowler
Copy link
Contributor

pdowler commented Sep 25, 2025

I feel like we don't really know which features we describe might be mode-specific so I think I also favour a mode attribute.

I think I would lean toward the simplest: an optional attribute on some (all?) child elements. That way, no attribute present means the same thing it does now and elements would not need to be repeated for both modes. For some mode-specific option, eg:

    <outputFormat mode="async">
      <mime>application/vnd.apache.parquet</mime>
      <alias>parquet</alias>
    </outputFormat>

a client that did not grok the mode attribute would interpret that as applying to both modes. If that's a problem we need to solve, then this approach is probably not sufficient....

I would like to see a realistic example of where a repeatable mode element would really be better than a mode attribute... it looks to me like attribute is sufficient (assuming we are ok with allowing multiple of al child elements).

@pdowler
Copy link
Contributor

pdowler commented Sep 25, 2025

Would we (on principle) add the mode attribute to <language>? I do not have a use case in mind... just thinking about future-proofing.

What about it's children: <languageFeatures>? individual <feature>s?? I feel like this would be going to far...

@pdowler
Copy link
Contributor

pdowler commented Sep 25, 2025

In general, I feel like auth vs anon should be orthogonal to all features, but we do have one scenario where that is not the case: we support a custom DEST parameter so the user can specify that the output be sent to some external location (either an http URL that accepts PUT or a vos URI where we send the output to the specified location in a VOSpace service).

I do not know if this is a viable optional feature (param) that could be added to TAP or not, but if it did it would be an async specific feature (because sync inherently returns the output, while async stores it) and for practical reasons it requires auth to be able to perform the PUT (as the user).

To be clear: I do not think auth should be treated as a mode and responding with a "permission denied" (and auth challenges) if the job cannot write the output to DEST is probably fine, but a client can't really figure out how to make such a feature work without trial and error. Over in DataLink we added some predictive options so clients could tell what they might need to use an access_url, which seems related to this...

I am happy to consider this out-of-scope for now, just FYI.

This was referenced Sep 29, 2025
I am removing the in-document example; it seems a waste of paper, and
with auxiliaryurl we now have a better alternative.
Also, adding tests and updating the CI workflow.
@gmantele
Copy link

gmantele commented Oct 2, 2025

Would we (on principle) add the mode attribute to <language>? I do not have a use case in mind... just thinking about future-proofing.

What about it's children: <languageFeatures>? individual <feature>s?? I feel like this would be going to far...

I am not sure about setting this forMode attribute on language. Fundamentally, there should be no difference between sync and async (except maybe if we favor a very consuming ADQL feature for the async mode...but I clearly don't want to encourage this).

I would however see a possible interest between the auth and anon mode ; you could have some features only reserved for an authenticated community (e.g. a special catalogue accessible in auth mode only and which would require some special UDFs). But I think that this is out of topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants