Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable time window for hourOfWeekAverages #2

Open
laurentg opened this issue Apr 14, 2015 · 4 comments
Open

Variable time window for hourOfWeekAverages #2

laurentg opened this issue Apr 14, 2015 · 4 comments

Comments

@laurentg
Copy link

Do we assume a constant 1h slot size for the speed average profiles? Even if the producer assume 1h slot internally, an idea for having more flexible output would be to output an array of (minute duration, speed) pairs. The special case 1h would be just a list of (60mn, speed) pair. The overhead should be minimal as protobuf use variable-length encoding for int32 so a minute value should probably be encoded in 1 or 2 bytes only. Also this would allow the compacting of large NULL ranges into one single slot.

That way the producer could generate more precise fallback values for the case where not enough data is available for some slots (for example if we have data for 1AM and 5AM but not 2-4AM, the average speed is probably lying somewhere the value for 1 and 5 AM, not the absolute average).

@laurentg
Copy link
Author

This would also allow to produce variable slot size depending on the amount of data a segment has: for example for segment with lots of data slots of 10 or 20 minutes, for segments with less data slots of 2 or 4h.

@kpwebb
Copy link
Contributor

kpwebb commented Apr 17, 2015

This is a really interesting proposal and I can see the value particularly with the 1-5am example. I'll take another pass at both the baseline and current conditions format to include something like this.

I think your approach to using the (duration,speed) pairs works well.

I'm not sure exactly what the implications are if the windows change over time. For example, dose the baseline data become more precise as more data is collected? If so, we could just make future updates more specific, right? Just trying to think if there's a downside to the windows changing.

Also, I've already implemented something like this for dynamically sizing the current condition windows. For example if the data for given segment allows 4:00-4:15 that's the size of the current condition bin. If not the bin grows to 4:00-4:30.

@laurentg
Copy link
Author

IMHO from the client point of view, having a varying slot size in time is no more difficult to handle than having a varying slot size for each segment. In both case, the client have to handle this. The only (small) difficulty is quickly (for route planners) getting the correct slot data for a given minute in the week; with fixed slot it is probably easier. I guess the difficulty would be more on the producer side.

@abyrd
Copy link

abyrd commented May 6, 2015

Would this same format be used for both storing internal data and exchanging with outside clients? If so then I can see the use for more precision and variable width in defining bins. However the outside consumer of data will probably always prefer completely uniform gridded data that has already been interpolated/smoothed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants