-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Currently, "Tags" are embedded as sub documents of Datasets, like:
{
"uid": "ee600210432b8f81ad229c33",
"type": "file",
"uri": /a/b/c.tiff",
"tags": [
{
"name": "label",
"locator": { "min": 1, "max": 2},
"value": "rods",
"confidence": 0.900,
"event_id": "wwewere6002104rwerwe81ad229c33",
},
{
"name": "label",
"value": "peaks",
"locator": { "min": 1, "max": 2},
"confidence": 0.001,
"event_id": "wwewere6002104rwerwe81ad229c33",
},
{
"name": "geometry",
"value": "GISAXS",
"locator": { "min": 1, "max": 2},
"confidence": 1.0,
"event_id": "wwewere6002104rwerwe81ad229c33",
}
],
}A dataset is rather primary in this instance.
There has been a lot of discussion about potentially turning switching to a model where the primary collection is Tag, which has a key for the data set that it was applied to, more like:
{
"name": "geometry",
"value": "GISAXS",
"locator": { "min": 1, "max": 2},
"confidence": 1.0,
"event_id": "wwewere6002104rwerwe81ad229c33",
"dataset": {
"type": "file",
"uri": "/a/b/c.tiff",
}
}With Dataset as the primary structure, searches for all datasets and all of their tags should be faster.
With Tags as the primary structure, searches for individual or multiple tags should be faster, as they would not have to return all of the payload of tags that were not queried.
I was pretty convinced that we want to switch to the Tags collection method, but talking with @taxe10 , the LabelMaker could better use the existing design. When the LabelMaker loads, it queries for all tags for multiple datasets, matching pretty well the current structure.
So the question for me now is, can we think think of compelling use cases where we want to search on a subset of known tags and receive all of the instances of them and the datasets that they relate to? If no, let's leave the collections as they are.