-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deciding what gets included in the archived traffic statistics #7
Comments
Additionally, we probably want to store statistics by time of day and day of week. Even if we were just storing an average, we need to store the number of observations internally in order to continue updating that average. Additionally, we want old observations to become less and less relevant over time. This might be a good place to apply a Bayesian method, using the previous estimate as a prior and the observation as the likelihood. Otherwise, we might need to store every observation, anonymized in some form. |
@kpwebb points out they used OLAP cubes before. This is a sensible approach. |
To help anonymization, there is probably no need to store the contextual "path" information for each segment, ie the preceding / following speed profile. This may not be enough, but that should make more difficult rebuilding whole paths from data. Thinking about it, this contextual path information could theoretically be helpful for getting more precise data, for example helping in computing intersection turns. In fact this last point may need a bit more discussion, I may open a new issue to discuss that. |
I'm seeing talk about a lot of situations in which we want to be able to provide data products that are anonymized, but are made from non-anonymized datasets. For example: rolling-average statistics, turn restrictions, recasting histograms into different axes, context-dependent speed calculations, and so on. Considering that we're proposing to create a formal organization, I propose that one of the roles of the organization is to maintain the security of slightly-less-than-anonymized data in order to retain the flexibility to innovate in the creation of fully anonymized data products at a later time. |
In addition to storing the average travel time by road segment and by time period, it would be very useful if we could find a way to also include the number of observations associated with the average travel times -- both for the purposes of establishing the reliability of the results, as well as for use in other applications that may rely on such data. Including observations makes the pool more valuable. Of course, if there is only one data contributor in a given region, this may impinge on their commercial data security concerns.
Thus, a technical challenge may be posed. Would it possible to make the # of observations accessible only in in cases where there are at least two operators covering roughly the same geographic area?
The text was updated successfully, but these errors were encountered: