-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add a duration/timedelta type #514
Comments
A prefix is a good idea, especially if #427 gets accepted. Microseconds could be represented with |
@NighttimeDriver50000, thanks for the input. The |
The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation and RFC5545 defines a subset. This would make your example:
The benefit is the use of a recognised standard, and that the The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores. |
I would agree that using the previously baked standard would be a very good
idea. Java 8 supports parsing the ISO into Durations, which is a Very Nice
Thing, since I don't have to write the parser.
I'm sure other languages & libraries would also tend to the ISO direction
to some degree.
And TBH, I also agree this would be a good data type to have. Not sure it's
minimal, but certainly useful.
…On Sun, May 20, 2018, 12:00 AM jongiddy ***@***.***> wrote:
The date-time type is derived from RFC3339, which is a subset of ISO8601.
It would be great to define any other time types using similar standards. ISO8601
has a duration representation
<https://en.wikipedia.org/wiki/ISO_8601#Durations> and RFC5545
<https://tools.ietf.org/html/rfc5545#section-3.3.6> defines a subset.
This would make your example:
day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S
# allows floats
micro3 = PT0.001S
# allows combining
two_and_a_half_hours = PT2H30M
# not supported
five_seconds = PT2S3S
# can be negative
minus_one_seconds = -PT1S
# allowing underscores would be a non-standard extension
hundred_thousand_hours = PT100_000H
The benefit is the use of a recognised standard, and that the P prefix
makes parsing simpler and keeps more space for other types that may one day
be added.
The downsides are that sub-second units are only supported as decimals
(while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow
them at all), and the decimals do not support underscores.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#514 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAuc4JX4NPOa0MU-Ln2Y5q_2wfnIzWWMks5t0RSHgaJpZM4Rdtos>
.
|
Good to see some activity on this issue again. Usually I agree that using standards is preferable. However, I don't agree that ISO8601 or RFC5545 would be a better fit for this then something similar to the Go duration parsing. With the following reasons:
Fixing some of these is of course possible, but that would result in a custom standard. So it would lose the benefit of using the standard. PS. I'm fine with having the P prefix (or any other one to avoid confusion with the standards). So don't take that as a reason to prefer the standard. |
A suggestion: a simple
But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years? And time is not the only measurement, they all have units. Whether it be disk space/RAM, distance, volume, temperature, currency, etc. A more general I like the idea of the combining - you could express something like But generally, this sort of thing belongs more in the domain of the consuming application, not the data file format. And all measurements could be expressed as strings and handled by the application as appropriate for the application. What is the benefit of making it more complicated? |
@Falkon1313 I agree that a general quantity unit should be part of the the consuming application. However, there's a big difference between durations and the other quantities you mention: There's a standard library type for durations in almost every programming languague (at least the ones that also have a datetime type, which is already part of the toml spec). And like I said before, the main advantage would be to directly generate that type.
Yes I messed up there. I meant to say they are quite easily approximated in days. (fixed up that comment now) |
Maybe we should just add a type which is a pair (number, string), the unit of the number being represented in the string. It would be very useful for any physical quantity.
But the question is how it would be loaded in the program so that we can not add 1 apple with 2 watermelons... |
I personally think generic quantities are best parsed and considered by language and/or domain specific libraries. Any number with a unit may well get coerced to a native type differently depending on that language's native type, and the application's specific understanding of the units of a given domain. For instance a language's Decimal type might be most appropriate when handling quantities (especially currencies). An example of a quantity library which I think handles them well is Python's Pint, which keeps original strings around as long as possible (extremely useful for any kind of user input, including configs): https://pint.readthedocs.io/en/0.9/ But because durations are so often representable in standard libraries, and so ubiquitous when configuring services, I think it makes a lot of sense to have an intuitive, obvious format in Toml. For the record, that standard mentioned above is far from obvious to me. I'm not sure number, string tuples will offer much benefit beyond strings, if the application has to decide how those strings transform the number anyway - parsing numbers is easy, handling units is a fiddle. |
The USA standard for characters to use in lieu of μ is Of course, ISO8601 favors the .000001 second style of notation as of the previous issuance... it was recently revised in 2019 and I am not sure how, exactly, yet. |
Need this feature! |
Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug. Though I don't have any numbers on how common actual usecases of using calendar units for time are, work experience shows that whenever people used Given that ISO8601 makes a clear distinction between periods and time in the form of |
The more I look at ISO 8601's duration standards, the more I like them. The Here are some proposals. I tried to cover all the bases touched upon so far, and I hope I didn't stretch things out too far. What do you think?
|
@eksortso In my view, if we follow the ISO 8601 standard, we should stay close to it. One or two small changes may be fine, but if we're to deviate as far as you suggest, we can as well start from scratch and roll our own solution – maybe as @JelteF suggested or something close it. Or we look for another fine standard/convention that fits our needs better without requiring as many changes as you suggest. |
Fair enough. No need for additional units. Not now, anyway. But, and I was trying to address some of @JelteF's concerns: I still recommend allowing the underscores, being case-insensitive, and prepending P if T starts a time duration. TOML would gain readability and brevity, and these somewhat intermediate forms can be converted to ISO 8601-compliant durations with trivial string munging. |
@Felk Thinking about this again, you are absolutely right. However, what I'm trying to say though is that this separation is not very useful in practice, if it means you cannot convert the string into the built in duration type of the language. This is the case for |
@eksortso I agree that if we add a pass that removes all underscores and capitalizes all letters, we can still use parsing libraries for the standard. |
Thanks, @JelteF. The notion to allow |
So I know we're trying to get v1.0 out the door, but since there's little along those lines that I can help with, I'd like to move this along, in anticipation of v1.1. I've sat on a PR for this for awhile. Would it cause any problems (e.g. distract from v1.0 release efforts) if it were submitted for future consideration? |
I needed a way to represent durations sooner than later. I also did not want to fork the implementations I use as I rely on both cpptoml and python toml. So for the time being I am using an inline table like so: delta={count=-15, unit="secs"} I have a simple C++ utility to convert this into the std::chrono::duration types. But I'd love to see first class support for this. |
Copied from #717:
While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted. It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double). Btw, 8601 allows weeks, and an abbreviated format (without the letters). I wouldn't use either of those either (but I think there was already consensus on that in the main thread).
They don't disallow them, which in standard's parlor usually means that they allow them. My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive. Subtracting parts is complicated, and without timezone information not even reliably possible. What's more, you'll get a different amount of days depending on the time of year (daylight saving time) if you allow subtractions of parts. Ending up with a duration that's either years and months, or days and time means you have ordered types. These types are exact. Once you mix these, they mean something else depending on time of year. That's OK, and ultimately up to implementers, but doing all that for positive or negative durations is already quite some work. If independent parts can be positive and negative it's that much harder. And likewise, that much harder to explain to end users and in spec prose. |
Note that my point of limiting scope of individual members is not about date, time, duration calculations in toml, but that it can be reasonably expected to be the main use case where these types will be applied. (though I can sympathize with an opposing argument that we should be inclusive and allow each duration segment to be negative, many existing implementations of such types don't support such flexibility, but also, those that do either chose to support that the whole duration can be negative, or support that individual segments can be negative, but not both) |
The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway. But not everything does. So I have no problems with using ISO 8601's approach to fractional units, which seems reasonable enough to me.
I can understand the value of simplicity, but I also want to create a standard that's eminently usable. I wouldn't exclude half-hours for general use when over 8750 hours each year would interpret 0.5 hours the exact same way. Allowing such niceties creates challenges to devise simple, precise definitions. This is partly done; I already have ABNF code that takes fractions into account. And once I submit a PR (with language that's not on the computer I'm currently typing on), you can assess that for yourself.
I agree with you here. Plus or minus the whole duration. That way, we can safely look past the fine points of duration arithmetic. |
Perhaps true for http (but that doesn't support durations, iirc), my work was in the xml, xsd, XPath, xslt area, and those transcend the area of just "web technologies". (and also, the W3 mention was merely as an illustrating example how "some other standards body" did it, I'm fully aware that their approach has been, and often still is, with its own flaws) But i understand your points. Besides, most discussion in the W3 groups was wrt date, time, tz, era, calendar and duration arithmetic, which can fill a bookshelf by itself ;). It's dauntingly complex... I now realize that data manipulation is not something toml concerns itself with. I understand you need to be able to support applications that would want to express fractional time units, while other applications might want to prohibit that. Which is kind of in the same league of an application expecting a numeric value that is in the range 1-10, while toml will allow any 64 bit integer. In other words, @eksortso, I see now why you'd generally prefer a broader definition over a more limiting one, allowing a wider range of potential scenarios. |
Btw, would we want to differentiate between time span and duration? The first is defined by a start- and end-datetime, the second by a period without reference to, or bearing on, a given datetime. They are semantically equivalent, but serve different scenarios, and are expressed and interpreted differently. (apologies if this has already been decided). |
While we're at it, how are we going to interpret mixed hour/days/months/year durations? In most specifications and implementations I've seen, it's either time+days or months+years, but not both (and if both are allowed, they are not allowed to be normalized). Consider Edit, just missed this suggestion:
Probably a good idea indeed to start small. Though I don't think we should disallow durations longer than 24h, just disallow durations with We should also be explicit in allowing |
You can use 70minutes; I think that's "good enough", and it's a good trade-off with keeping both the implementation and syntax simpler. Some small possible inaccuracy with floats is also fine I think; we're not concerned with precision time-keeping, and if guaranteed precision is really needed you can use milliseconds or nanoseconds similar to 70minutes instead of 1.166..hours
I think we shouldn't include "month" at all, or if we do, simply define it as "30 days". There is no good way to deal with this in a context-less "duration" unless you force implementations to parse it as an object which keeps track of this (e.g. instead of merely storing it as an int64 you need some class/struct with a hour, day, month, etc. field), but many stdlib "duration" types don't (at least, Python and Go doesn't). |
Well, there is (keep month and year separate, basically), and there isn't (some might not call this a "good way"). But I agree, as I mentioned in my other comment, which I may have just edited while you were typing.
You may have misunderstood why I made the suggestion. It is precisely to keep the implementation simpler, as there's no way of knowing what happens if we try to force a decimal time-duration system on people. It's just not what time is. I think it'll be much harder to formalize decimal minutes (which you'll need to do if you were to allow it) than it is to allow only integer hours, minutes and decimal seconds. I don't think it'll be hard to create, parse and interpret |
Just want to point out that excluding months (or days or years) could be a WTF for users if you have other units. Arbitrarily assigning a non-standard value for them (like 30 days per month) would be even worse, since it would falsely appear to be able to do the right thing, but actually only sometimes would and other times you'd have apparently random bugs. Lots of things in both business and tech operate in terms of months. Whether it be things like quarterly reports (3 months) or monthly billing (1st of every month) or checking when something is due or if something is more than a month overdue, etc. You might have monthly log rotations, quarterly batch processes, semi-annual things (6 months), etc. I don't know how often people would reach for a duration (aside from the next due/overdue case, which is actually very common), but if it's there they'd expect to be able to use it. If this type is specifically going to exclude things like that or handle them in non-standard ways, then it needs to at least be very clearly documented that people who need standard durations should not use it but instead use a string and their standard libraries to handle it. And that it's not meant for things like scheduling, etc. That it's really only meant to measure durations in contiguous real seconds regardless of timezones and DST? In which case you only really need the seconds unit, right? Well, maybe microseconds too. Because I'd also second abelbraaksma in saying that decimal durations would be a bad idea. Which brings me to a suggestion. If it's not considering DST or month durations etc., then anything above 1 hour is ambiguous. If not accounting for leap seconds, then even 1 minute is ambiguous. So if the intent is to specify a duration in raw seconds, or less, then those are the only units that should be available. Whether it is seconds, milliseconds, nanoseconds, whatever unit precision makes the most sense; as an integer. And documentation should make clear that it's raw time, not clock time or calendar time, so people don't use 86400s to intend a day, etc. I'd suggest calling it something like 'raw duration' instead of just 'duration' to make it clear. I think that would simplify and clarify it. Maybe something like |
Re: @abelbraaksma; it's indeed not hard to parse At any rate, I just looked at what seems to work well for Varnish; that was the only config file format I could think of with native duration types (Might be worth looking what other formats are out there, can't recall any from the top of my head). I'm not opposed to the Re: @Falkon1313: I'm not sure how common those scenarios really are for TOML; what a duration useful for is mostly things like timeouts, cache durations, how often to run some background jobs, things like that. Things like "send report every quarter" or "send invoice 1st of every month" can't easily be expressed in a time duration; the first issue is that many standard libraries use an integer or some variant thereof so the only way this can work is if TOML implementations provide a custom "duration" type which keeps records what the TOML file actually has, and which won't integrate all that well in most stdlibs. Personally, I'd really like to avoid that: TOML should be easily parsed to the native types of most common languages. The second issue is what does "3 months" really mean? 3 months from when the application starts? 3 months from now? 3 months from Jan 1st? For something like A small ambiguation also exists due to leap seconds and leap days, but for many (not all) use cases these can essentially be ignored. |
Usually the relevant time scale is pretty well known and does not differ by more than an order of magnitude which is why I do not thing this feature is too critical. In most cases a well chosen field name is sufficient like Generally this seems like a very niche feature while resulting in more complex implementations. |
@arp242 I agree, that's why I suggested to use integers for With respect to the side-discussion on allowing months and years, if (big if?) we go that route, just do what NodaTime and other libraries do and don't mix year-month durations with day-time durations. Durations are irrespective of a timezone or a starting date/time. Hence a minute is 60 seconds, an hour is 60 minutes. But a month has undefined length (it must be irrespective of starting date/time), so a year is 12 months, but what a month is, we don't define. If you have any date or date-time value, you can add a year-month duration to it and a day-time duration. You can also add a year-month-day-time duration, but only by adding year and month first and then adding day and time. That way it is an unambiguous definition. |
@abelbraaksma I'd recommend using the same precision that we define for time types. From v1.0.0:
|
@eksorto, you're absolutely right, my main point was to have integers for hours and mins, secs should be the same as for time of course. |
@abelbraaksma Well, taking a hint from the current spec, we could use a similar approach for hours, minutes, and seconds. Values falling within well-defined boundaries will be accepted as is. And if the time values fall out of bounds, are fractional float values, or are specified out of order, then the parsing behavior would be implementation-specific. Someone will want to use But, any potential reliance on implementation-specific behaviors does beg the question posed by @pradyunsg of whether durations ought to be standardized in TOML at all. Do we want to bear the burden of defining time delta standards that all parsers must adhere to? We got away with that for dates and times. But we'd have to impose TOML-specific duration standards that are not as clear-cut as what exists for datetimes. |
@eksortso, that might be a viable approach. Also, I totally understand the reluctance of implementing this in the first place. I don't really have a strong opinion on that. I do like strong, useful types in TOML, but at the same time, where do you draw the line? Whether this is feature-creep or not is probably anybody's guess. Yet at the same time, it's useful and a relatively small addition. And people are not required to use it (heck, I know many people using TOML without using tables...). |
agree with it whichever way it takes ! When i use datetime.timedelta in python , i have to write like this : |
@abcdehc I get where you're coming from. In Python it'd be nice to write mv_units = self.config['param']
m = mv_units['moving_validity_m']
s = mv_units['moving_validity_s']
self.moving_validity = datetime.timedelta(minutes=m, seconds=s) And even then, Python will normalize all that to days, seconds, and microseconds anyway. Which points to the fact that TOML durations' fundamental nature has not yet been agreed upon, if it ever will be. The timedelta documentation in Python explicitly says that seconds are stored internally, but not minutes. A TOML parser could naively lean on Python's own implementation, or it could introduce a standardized duration object that would be at odds with how But no proposal so far about durations in TOML has defined what units are preserved in implementation. We haven't even discussed normalization. Too much is left to the parser or the language to scrape together. It's not like how we had RFC 3339 and implementations of it to rely on for dates and times. This is how deep this subject goes. I haven't even looked into how C++ or Golang represent their typical time duration data types or how they interoperate with time types. Is there any sort of agreed-upon standard? Is there an RFC that we could point to, to smooth this whole thing out? What is so minimal about any of these efforts? Complexity underlies the simplest implementations. So I regret to admit, short of a well-accepted standard (sorry, ISO 8601) or implementation, that we ought to abandon time durations as being not obvious enough for the TOML standard to embrace. |
In Go Personally I think that's not really a show-stopper though, as leap-seconds can be ignored for many purposes (it is a show-stopper for supporting at least months though), and in practice many applications (including those written in Go, but probably also Python) already ignore leap seconds with durations since they don't contain a database of when leap seconds occurred. Event time-specific applications don't always fully implement leap seconds "the right way"; for example Google's NTP doesn't apply leap seconds, OpenBSD just pretends they don't exist, etc. What I'm saying is that defining a "minute" to be "60 seconds" will be fine for practically all use cases, and we don't need to worry about leap seconds at all. |
@arp242 A little more comforting! But still no common standard. Seconds are the common standard, and we need millisecond precision guaranteed. If in TOML we fixed minutes to 60 seconds and hours to 3600 hours, could we confidently assert that common time durations in all languages can handle a sufficiently large number of seconds, positive or negative? And what would that limit be in order to ensure compatibility across platforms? |
For numbers TOML already specifies that "Arbitrary 64-bit signed integers should be accepted and handled losslessly"; for int64 we'd be talking about 2.9 million years, or 292 years if we allow nanoseconds. Using int64 nanoseconds probably makes sense. |
Using chrono::duration<>. Bunch of C++ template soup that ultimately distills down to a single integer or float, depending on what you want it to represent and what precision you need. Typically you'd use
In one of the newer versions of the standard there's new date/time types, with duration interop, but I have absolutely no idea how it works and it seems confusing as hell, tbh. All I can say is that there is some interop. |
Indeed. But just to emphasize, durations should be agnostic to leap seconds, minutes, years or even Era or calendar. That's why it's important to separate months + year and day + time. The only moment leap seconds or leap years come into play is when a duration is added to a date-time, which itself already has all the information (i.e., adding 1 month to Feb 1 2004 is 1 March 2004; adding 28 days to Feb 1 2004 is 29 Feb 2004, adding it to Feb 1 2005 is 1 March 2005). Luckily, TOML doesn't do calculations, so we don't have to worry about that. By making durations (which is not the same as timedelta!) agnostic of the current time, we bypass any of these potential issues and only need a very simple datatype. |
I'm used to the |
I think he just means that if you say "1 month" that it cannot be translated to a fixed duration (say 30.43 days or something), but is only applicable in the context of a calendar (say "February 3rd" + "1 month" = "March 3rd"). And that such calculations are not TOML's responsibility. |
@eksortso, I may be wrong. What I meant is that To me, a What different programming languages use for duration or timedelta or other (i.e. it could be
Indeed @Felk, that's what I meant ;). |
@arp242 That specifies an expected range for integers. It's not an implementation detail necessarily. That's important to remember because smaller integer ranges might be permitted, against advice, for things like embedded systems. I think we need to keep the specification logic separate from the implementation details that a parser may use.
Again, we're not dictating the implementation details. But your calculations suggest a good expected range for durations. In whichever way a parser may implement a TOML duration, it would guarantee millisecond precision, even though most implementations we've seen allow for an integral number of nanoseconds. How would we state this? 290,000 years in either direction? I think, though, that we ought to require this one thing of compliant parsers. It should be readily apparent, if not downright obvious, how a duration's value can be added to or subtracted from a timestamp's value, once each value is parsed. For instance, Python's |
While I do think that this proposed by arp242 is quite elegant:
But I think it is a bad idea to go anywhere above days. Leap years already is already not nice but there are many ways to represent years for example See https://altalang.com/beyond-words/6-calendars-around-the-world/. Time in general is just hard. Yes it is nice to have something like this in python: release_date = datetime.date.today() + parsed_toml["time-till-release"] But is that really that much better than: release_date = datetime.date.today() + datetime.timedelta(days=parsed_toml["days-till-release"]) I don't think it warrants the extra complexity. Similar to the file sizes proposal it is just put the unit in the name. We should really try to keep this in mind prettier/prettier#40 as well I know it talks about formatting but still. |
One major issue any implementation will run in to with durations (but also sizes, or any other suffix) is compatibility. Consider an existing file with:
You upgrade to a new TOML version with durations, and you want to support:
Great, but ... you don't want everyone to update their config files, so Turns out this is a bit tricky; in e.g. Python I guess you'll end up with:
But in other more statically typed languages parsers will have to end up creating your own struct or class or whatever the language has, so you can do:
And/or maybe:
But it all pushes some amount of complexity to both the parser and application, at least if you want Although for new keys it's okay to only support the suffixed variant, you still want to make sure ONLY that variant is allowed. I can foresee subtle confusion with things where people do:
And then the application just does:
And this "duck types" out alright and it "works", except it does something expected, which isn't even immediately obvious (low timeout which works fine in your local machine but times out in production ... sounds like a fun time). I suppose type hints and the like will prevent that, and things have been moving in that direction over the last few years, but still... Long story short, I started prototyping this in my TOML library and writing a concrete proposal, but after encountering these issues I'm less sure if we really want this. That said, it is commonly implemented in many config files. I did a survey of some common software, based on "what I could think of" and looking at the top 500 packages in https://popcon.debian.org – this is perhaps a bit biased, and some software supports neither format (e.g. ALSA configuration has no use for either durations or sizes). Overall, I think it's more widely supported than datetimes, which TOML already supports:
n/a: Not applicable; there are no settings that could use this unit. In many cases where a unit isn't supported, it would be better if it was. For example (default values):
Some are also inconsistent; e.g. Redis's Of course, this isn't TOML, but "how many TOML files actually need this?" is a bit harder to answer as it's harder to find projects which support TOML. I suppose I could check package list contents, but I haven't bothered (because that's a bit of work). |
Let's put this feature request on hold until after v1.1.0 is released. |
I think it would be very useful to have a duration type natively in toml. It's a thing I use a lot in my web service configs, for cache TTL or timeouts. Right now I resort to using integers and making the key include the resolution (e.g.
timeout_ms
,ttl_hours
). This has a couple of disadvantages:ttl_hours
and want 9 days you need have to enter216
. Which makes it (at least to me) not obvious when quickly looking at the config.I would propose the following basic and IMHO natural syntax (inspired by go duration parsing/formatting):
This notably doesn't include months and years because they can differ in duration and are quite easily approximated in days. I'm also fine with the following changes:
µ
for micro seconds. I think it's fine to use0.1ms
in most cases, so it's not strictly needed. I mainly put it in because Go duration parsing and formatting allows/uses it as well.D
which would result inD2h30m
.2s3s
. Again I mainly put this in because the Go duration parsing allows it.I really hope this is considered for inclusion as it would be really useful to me and my colleagues. (Much more so than the already supported datetime type, which I've never had an actual use for in a config).
PS. I created a modified fork https://github.com/pelletier/go-toml that supports this: https://github.com/JelteF/go-toml (see the last couple of commits)
The text was updated successfully, but these errors were encountered: