Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymization method #33

Open
abyrd opened this issue May 14, 2015 · 0 comments
Open

Anonymization method #33

abyrd opened this issue May 14, 2015 · 0 comments

Comments

@abyrd
Copy link

abyrd commented May 14, 2015

As stated in the architecture README:

The Traffic Engine (TE) translates vehicle location to OSM-linked speed estimates. By design the TE can be run inside a fleet operator allowing internal conversion from GPS location data to to traffic statistics. This ensures that the only data to leave the data provider’s network are fully anonymized traffic statistics.

That is to say, the series of GPS positions identified with individual vehicles are fed into the traffic engine, but the only information that is pushed out of it into the shared database is identified with map features. It is of course possible to reconstruct paths in places where the number of observations is very low (one or two taxis moving through a sparse residential area) but the traffic engine has a threshold number of observations below which it will not report any data. We may even assume that a place with so few observations has negligible congestion or is rarely traveled through.

This reporting threshold is configurable by the organization running a particular instance. So in sum, each contributor runs their own traffic engine, that traffic engine never shares vehicle identifiers with the outside world, and it only exports speed/congestion data under conditions set freely by the contributor.

This basic architecture should go a long way toward eliminating the risk of tracking any one probe vehicle, but some questions still remain: how high should the observation threshold be set, and might there be other subtle details that would allow a sophisticated consumer of this data to reconstruct trajectories?

The French Open Taxi Data project, which is interested in contributing to our congestion/speed database, is undergoing review by the CNIL and has statisticians available for counsel on user privacy issues. Like all of us they are interested in anonymization, but they have some specific strict guidelines to adhere to. I would welcome any comments here from @l-vincent-l @odtvince or their anonymization advisors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant