Splash-ML collection redesign

Currently, "Tags" are embedded as sub documents of Datasets, like: 

```json
{
    "uid": "ee600210432b8f81ad229c33",
    "type": "file",
    "uri": /a/b/c.tiff",
    "tags": [
        {
            "name": "label",
            "locator": { "min": 1, "max": 2},
            "value": "rods",
            "confidence": 0.900,
            "event_id": "wwewere6002104rwerwe81ad229c33",
        },
        {
            "name": "label",
            "value": "peaks",
            "locator": { "min": 1, "max": 2},
            "confidence": 0.001, 
            "event_id": "wwewere6002104rwerwe81ad229c33",
        },
            {
            "name": "geometry",
            "value": "GISAXS",
            "locator": { "min": 1, "max": 2},
            "confidence": 1.0, 
            "event_id": "wwewere6002104rwerwe81ad229c33",
        }
    ],
}
```

A dataset is rather primary in this instance. 

There has been a lot of discussion about potentially turning switching to a model where the primary collection is `Tag`, which has a key for the data set that it was applied to, more like:

```json
 {
  "name": "geometry",
  "value": "GISAXS",
  "locator": { "min": 1, "max": 2},
  "confidence": 1.0, 
  "event_id": "wwewere6002104rwerwe81ad229c33",
  "dataset": {
    "type": "file",
    "uri": "/a/b/c.tiff",
   }
}
```

With `Dataset` as the primary structure, searches for all datasets and all of their tags should be faster.

With `Tags` as the primary structure, searches for individual or multiple tags should be faster, as they would not have to return all of the payload of tags that were not queried.

I was pretty convinced that we want to switch to the `Tags` collection method, but talking with @taxe10 , the `LabelMaker` could better use the existing design. When the `LabelMaker` loads, it queries for all tags for multiple datasets, matching pretty well the current structure.

So the question for me now is, can we think think of compelling use cases where we want to search on a subset of known tags and receive all of the instances of them and the datasets that they relate to? If no, let's leave the collections as they are.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Splash-ML collection redesign #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Splash-ML collection redesign #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions