Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for polygon/linestring in results #823

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

red-fenix
Copy link

@red-fenix red-fenix commented Jul 11, 2024

At the moment, photon only returns the point of a location, and not the polygon (see #259). This PR will add the option to add the polygon (i.e. geometry) to the Elasticsearch Index and a API parameter polygon to return said polygon. If no polygon exists, the point is returned.

WARNING: This will increase the Elasticsearch Index size! (~575GB for a Planet import).

To enable: add the command line argument -use-geometry-column whilst importing and add &polygon=true to the API call.

@red-fenix red-fenix changed the title Add option for polygons in results Add option for polygon/linestring in results Jul 11, 2024
@lonvia
Copy link
Collaborator

lonvia commented Jul 14, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

  • This really needs to be implemented for the OpenSearch version because the ES version of Photon is on its way out. Note that the OpenSearch variant does not use mappings.json. It defines its mapping in https://github.com/komoot/photon/blob/master/app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java.
  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.
  • We already have a field extent which contains the bounding box. The geometry should replace this field, i.e. once full geometries are enabled, do not save extent and derive the extent from the geometry field when returning a result. We need to keep the centroid because it is not necessarily the geometric centroid of the geometry. (Note that extent was missing from the mapping specification so far. So likely it was in fact saved as a text field, which really is an oversight but can't really be fixed right now without creating an incompatible database version.)
  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).
  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.
  • This needs some tests for the import.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

  • Once we have full geometries, we'd want to use them for reverse lookup. See Discussion for PR to improve accuracy of reverse geocoding #357.
  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

@red-fenix
Copy link
Author

red-fenix commented Jul 17, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Thanks. I will update the PR with the changes in this file soon.

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.

Will take this along as well.

  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).

Agreed

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behavior as on import then.

OK. I will make the default to return the polygon when it's available in the index. The option 'polygon=false' will return the centroid instead.

  • This needs some tests for the import.

Will do.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

I have to look into this issue.

  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

I will look into this.
Another thing related to this: when someone is searching for a street Nominatim (and thus Photon) returns a street in parts because they are separate OSM id's. I'm still looking for a method to merge multiple ways (i.e. linestrings) into 1 linestring to make sure the whole street is displayed instead of a part (example)

@lonvia
Copy link
Collaborator

lonvia commented Jul 21, 2024

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

We'll drop ES support completely and go with OpenSearch. Note that the OS version already supports an external OpenSearch cluster. The support is just somewhat rudimentary and HTTP-only.

@karussell
Copy link
Collaborator

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

@red-fenix
Copy link
Author

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

Yes, it would return a Polygon like Nominatim does.

@red-fenix
Copy link
Author

red-fenix commented Sep 5, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Done

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.

Done

  • This needs some tests for the import.

Done

I'm looking forward to your response :)

@lonvia
Copy link
Collaborator

lonvia commented Oct 29, 2024

Sorry for the long delay in responding.

I've decided to get a release out first before looking into this again. Because of this there is a minor merge conflict now regarding the dependencies. Also, the opensearch variant doesn't compile because setUpESWithPolygons() is only defined for the elasticsearch variant. Could you quickly fix both issues?

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of more comments after trying this out. The OpenSearch version currently doesn't work at all. It already fails to import.

The output for the ElasticSearch version displays a wrong projection:

...
"geometry": {
  "coordinates":[9.5227103,47.1395576],
  "type":"Point",
  "crs":{"type":"name","properties":{"name":"EPSG:0"}}
}
...

Photon doesn't supply a CRS at all with the geometry. I'd leave it at that.

}

if (!supportPolygons && (photonRequest.isPolygonRequest() && photonRequest.getReturnPolygon())) {
throw new BadRequestException(400, "You're requesting a polygon, but polygons are not imported!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to throw a halt at this point and make sure the content is json formatted, see line 43 above.

returnPolygon = photonRequest.getReturnPolygon();
}

if (!supportPolygons && (photonRequest.isPolygonRequest() && photonRequest.getReturnPolygon())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check comes to late. Parameter need to be tested and rejected before the actual search is done (around line 45).

@@ -51,6 +51,6 @@ public String handle(Request request, Response response) {
debugInfo = requestHandler.dumpQuery(photonRequest);
}
*/
return new GeocodeJsonFormatter(photonRequest.getDebug(), photonRequest.getLanguage()).convert(results, debugInfo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the same code here as in the SearchRequestHandler (check if available etc.)

@@ -53,6 +53,6 @@ public String handle(Request request, Response response) {
debugInfo = requestHandler.dumpQuery(photonRequest);
}

return new GeocodeJsonFormatter(false, photonRequest.getLanguage()).convert(results, debugInfo);
return new GeocodeJsonFormatter(false, photonRequest.getLanguage(), photonRequest.getPolygon()).convert(results, debugInfo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the same code here as in the SearchRequestHandler (check if available etc.)

@@ -24,7 +24,7 @@
public class NominatimConnector {
private static final Logger LOGGER = org.slf4j.LoggerFactory.getLogger(NominatimConnector.class);

private static final String SELECT_COLS_PLACEX = "SELECT place_id, osm_type, osm_id, class, type, name, postcode, address, extratags, ST_Envelope(geometry) AS bbox, parent_place_id, linked_place_id, rank_address, rank_search, importance, country_code, centroid";
private static final String SELECT_COLS_PLACEX = "SELECT place_id, osm_type, osm_id, class, type, name, postcode, address, extratags, ST_Envelope(geometry) AS bbox, parent_place_id, linked_place_id, rank_address, rank_search, importance, country_code, centroid, geometry";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geometries can become very large, so should only be added to the returned columns when really needed.


if (webRequest.queryParams("polygon") != null) {
request.setPolygonRequest(true);
request.setReturnPolygon(Boolean.parseBoolean(webRequest.queryParams("polygon")));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a full parsing function here that throws on unknown values. Question is if we should allow 0/1, yes/no besides true/false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants