-
Notifications
You must be signed in to change notification settings - Fork 4
Adding a facility to express per-mode limits #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
TAPRegExt-v1.1.xsd
Outdated
Limits on the time between job creation and | ||
destruction time. | ||
Limits for how long a service will retain async jobs | ||
for the service's default access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of the idea of "default access mode", but in order to stay backward compatible, there is not other choice.
However, you speak about "retaining async jobs", but the example of "typical default access mode" is "anonymous-sync". It does not quite work here, isn't it?
Then, where do you specify what is the default access mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the best thing to do is not to give example of default mode...
On Mon, Sep 15, 2025 at 01:34:31AM -0700, Grégory Mantelet wrote:
@gmantele commented on this pull request.
> @@ -82,8 +82,9 @@ xsi:schemaLocation="http://www.w3.org/2001/XMLSchema http://vo.ari.uni-heidelber
minOccurs="0" maxOccurs="1">
<xs:annotation>
<xs:documentation>
- Limits on the time between job creation and
- destruction time.
+ Limits for how long a service will retain async jobs
+ for the service's default access
I am not a big fan of the idea of "default access mode", but in
order to stay backward compatible, there is not other choice.
Well... we could define that these are always for anonymous access at
least, but that still won't let us off the hook on this:
However, you speak about "retaining async jobs", but the example of
"typical default access mode" is "anonymous-sync". It does not
quite work here, isn't it?
Good point.
Does anyone have a good idea for language to convey "globally define
`typical' limits"?
Then, where do you specify what is the default access mode?
We don't, at this point. That's one of the open questions I mention
in the PR description.
Perhaps we can say "anon sync if applicable, authenticated sync for
TAP services requiring authentication"?
> @@ -82,8 +82,9 @@ xsi:schemaLocation="http://www.w3.org/2001/XMLSchema http://vo.ari.uni-heidelber
minOccurs="0" maxOccurs="1">
<xs:annotation>
<xs:documentation>
- Limits on the time between job creation and
- destruction time.
+ Limits for how long a service will retain async jobs
+ for the service's default access
Maybe the best thing to do is not to give example of default mode...
Perhaps "will retain anonymous async jobs"? But what do
required-auth services do then?
All this makes me suspect that we are missing a much better idea.
Here's what my gospel has to say about this:
$ python -c "import this" | grep explain
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Of course, sometimes a bad idea is better than nothing at all...
|
There's rather a lot going on in this PR that is unrelated to the limits business, but I think I found the relevant parts. Repeating the limit elements in a PerModeLimits looks a bit clunky, though I may not have any better ideas. One other possibility would be adding an optional mode attribute to the four limit elements allowing them to indicate what mode they apply to, and allowing multiple instances of each of those elements, with no PerModeLimits element. Older clients might get confused by seeing multiple limit elements, but would probably just pick the value for one mode, and since it's not specified at present what mode that applies to they won't be grossly misinformed. However, I'm not necessarily saying that would be better than the proposed structure. As far as wording goes, rather than talking about a default access mode and suggesting a "typical" assignment for what that is, I'd be inclined just to call the limit "default", e.g. "Default limits on the size of uploaded data" rather than "Limits on the size of uploaded data for the service's default access mode (typically anonymous-sync)". Services that don't want to distinguish can then just put the one limit in as now, but those which have per-mode limits can add mode-specific instances to cover the non-default case(s). From a client's point of view, it would just pick up the default limit, then override that for the current access mode if a suitable mode-specific one is present. Limits for "auth-sync" and "auth-async" are likely to be too blunt an instrument for some services, since the limits in force may depend on your authenticated identity. The existing TAPRegExt 1.0 text (modified slightly by some earlier post-1.0 commit, and then deleted by this PR) already tackles this:
I would therefore suggest reinstating the above text and restricting the modes to "sync" and "async". I admit that doesn't cover registry content, but it does take care of giving clients reading the capabilities endpoint an accurate idea of limits. |
e8cbc38
to
d832e48
Compare
On Wed, Sep 17, 2025 at 02:27:07AM -0700, Mark Taylor wrote:
There's rather a lot going on in this PR that is unrelated to the
limits business, but I think I found the relevant parts.
Uh, sorry about that; there was some housekeeping to do as I did the
per-mode thing and I got lazy.
though I may not have any better ideas. One other possibility
would be adding an optional mode attribute to the four limit
elements allowing them to indicate what mode they apply to, and
allowing multiple instances of each of those elements, with no
PerModeLimits element. Older clients *might* get confused by
This sounds like a reasonable thing in particular *if* we only go for
sync and async as modes. It looks less reasonable if we keep four
modes and half expect that more may come.
As far as wording goes, rather than talking about a default access
mode and suggesting a "typical" assignment for what that is, I'd be
inclined just to call the limit "default", e.g. "Default limits on
Ok; that's what is in in commit 672701b.
Limits for "auth-sync" and "auth-async" are likely to be too blunt
an instrument for some services, since the limits in force may
depend on your authenticated identity. The existing TAPRegExt 1.0
text (modified slightly by some earlier post-1.0 commit, and then
deleted by this PR) already tackles this:
> If a service supports authentication and has different limits
> depending on what user is authenticated, it should return the
> limits applying to the logged user.
I would therefore suggest reinstating the above text and
restricting the modes to "sync" and "async". I admit that doesn't
cover registry content, but it does take care of giving clients
reading the capabilities endpoint an accurate idea of limits.
Hm. I see your point.
On the other hand, I kind of like the idea that people can figure out
what they'd win if the logged in, and for that you would need the
four modes.
I think the way to decide where to go is: Do people actually enforce
different limits according to who is logged in? If that is true,
we would by lying in auth-sync and auth-async at least for these
services. Then there is no (or negative) benefit in having them, and
we should drop the auth-X modes.
Soooo: Are you aware of services that do per-user limits or plan to
do that? Rubin folks: Do you think about that?
|
</xs:simpleContent> | ||
</xs:complexType> | ||
|
||
<xs:complexType name="PerModeLimits"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one per-mode use case that I'm thinking about that doesn't logically fit within PerModeLimits
: output formats with mode-specific technical constraints. For example, we're considering VOParquet support exclusively for async queries, since partitioning large result sets across multiple files isn't feasible within sync queries.
What would be the optimal design if we wan to consider this? Should we rename PerModeLimits
to PerModeCapabilities
to encompass both limits and format capabilities, or would per-mode formats warrant a separate grouping?
On Wed, Sep 24, 2025 at 11:38:44AM -0700, stvoutsin wrote:
What would be the optimal design if we wan to consider this? Should
we rename `PerModeLimits` to `PerModeCapabilities` to encompass
both limits and format capabilities, or would per-mode formats
warrant a separate grouping?
Hm. I think that's another argument for Mark's design with a mode
attribute on selected (repeatable) elements.
<outputFormat mode="async"...>?
Hm. Or, to be future-proof, make this an element so you can give
multiple modes in case we ever grow more of them?
<outputFormat>
<mode>async</mode>
<mode>async2</mode>?
Hm. What would be the list of elements for which we need mode, then?
Is this supportable in clients?
Mood report: I'm increasingly leaning towards suppering PerModeLimits
and have the mode attribute or element.
Other mood reports, anyone?
|
I feel like we don't really know which features we describe might be mode-specific so I think I also favour a mode attribute. I think I would lean toward the simplest: an optional attribute on some (all?) child elements. That way, no attribute present means the same thing it does now and elements would not need to be repeated for both modes. For some mode-specific option, eg:
a client that did not grok the mode attribute would interpret that as applying to both modes. If that's a problem we need to solve, then this approach is probably not sufficient.... I would like to see a realistic example of where a repeatable mode element would really be better than a mode attribute... it looks to me like attribute is sufficient (assuming we are ok with allowing multiple of al child elements). |
Would we (on principle) add the mode attribute to What about it's children: |
In general, I feel like auth vs anon should be orthogonal to all features, but we do have one scenario where that is not the case: we support a custom DEST parameter so the user can specify that the output be sent to some external location (either an http URL that accepts PUT or a vos URI where we send the output to the specified location in a VOSpace service). I do not know if this is a viable optional feature (param) that could be added to TAP or not, but if it did it would be an async specific feature (because sync inherently returns the output, while async stores it) and for practical reasons it requires auth to be able to perform the PUT (as the user). To be clear: I do not think auth should be treated as a mode and responding with a "permission denied" (and auth challenges) if the job cannot write the output to DEST is probably fine, but a client can't really figure out how to make such a feature work without trial and error. Over in DataLink we added some predictive options so clients could tell what they might need to use an access_url, which seems related to this... I am happy to consider this out-of-scope for now, just FYI. |
I am removing the in-document example; it seems a waste of paper, and with auxiliaryurl we now have a better alternative.
Also, adding tests and updating the CI workflow.
d832e48
to
845d03e
Compare
I am not sure about setting this I would however see a possible interest between the auth and anon mode ; you could have some features only reserved for an authenticated community (e.g. a special catalogue accessible in auth mode only and which would require some special UDFs). But I think that this is out of topic. |
This PR suggests limit groups with which operators can override the default limit specifications (e.g., async has a lot more time). This came out of astropy/pyvo#685. Actually, it would be simple to add an expression of a preferred mode for a given service on top of this. The discussion in the bug seems to indicate that there is little use for that, but I am not too convinced. With a bit of encouragement, I'd try this.