Deciding what gets included in the archived traffic statistics #7

Holly-Transport · 2015-02-13T19:40:47Z

In addition to storing the average travel time by road segment and by time period, it would be very useful if we could find a way to also include the number of observations associated with the average travel times -- both for the purposes of establishing the reliability of the results, as well as for use in other applications that may rely on such data. Including observations makes the pool more valuable. Of course, if there is only one data contributor in a given region, this may impinge on their commercial data security concerns.

Thus, a technical challenge may be posed. Would it possible to make the # of observations accessible only in in cases where there are at least two operators covering roughly the same geographic area?

mattwigway · 2015-02-20T18:11:13Z

Additionally, we probably want to store statistics by time of day and day of week. Even if we were just storing an average, we need to store the number of observations internally in order to continue updating that average. Additionally, we want old observations to become less and less relevant over time. This might be a good place to apply a Bayesian method, using the previous estimate as a prior and the observation as the likelihood. Otherwise, we might need to store every observation, anonymized in some form.

mattwigway · 2015-02-20T18:32:29Z

@kpwebb points out they used OLAP cubes before. This is a sensible approach.

laurentg · 2015-02-23T15:22:20Z

To help anonymization, there is probably no need to store the contextual "path" information for each segment, ie the preceding / following speed profile. This may not be enough, but that should make more difficult rebuilding whole paths from data.

Thinking about it, this contextual path information could theoretically be helpful for getting more precise data, for example helping in computing intersection turns. In fact this last point may need a bit more discussion, I may open a new issue to discuss that.

bmander · 2015-03-02T23:14:34Z

I'm seeing talk about a lot of situations in which we want to be able to provide data products that are anonymized, but are made from non-anonymized datasets. For example: rolling-average statistics, turn restrictions, recasting histograms into different axes, context-dependent speed calculations, and so on. Considering that we're proposing to create a formal organization, I propose that one of the roles of the organization is to maintain the security of slightly-less-than-anonymized data in order to retain the flexibility to innovate in the creation of fully anonymized data products at a later time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deciding what gets included in the archived traffic statistics #7

Deciding what gets included in the archived traffic statistics #7

Holly-Transport commented Feb 13, 2015

mattwigway commented Feb 20, 2015

mattwigway commented Feb 20, 2015

laurentg commented Feb 23, 2015

bmander commented Mar 2, 2015

Deciding what gets included in the archived traffic statistics #7

Deciding what gets included in the archived traffic statistics #7

Comments

Holly-Transport commented Feb 13, 2015

mattwigway commented Feb 20, 2015

mattwigway commented Feb 20, 2015

laurentg commented Feb 23, 2015

bmander commented Mar 2, 2015