Best way to model four different kinds of metrics in biolink ML

There are four different kinds of metrics we need to represent here:

1. Simple (metric: "value") (see below example: `axiom_count`)
2. Simple list (metric: [ "val1", "val2"]) (see below example: `axiom_types`)
3. Constrained map (metric:  { "feature1": "value", "feature2": "value"} (see below example: `axiom_type_count`)
4. Open map (metric:  { "string1": "value", "string2": "value"} (see below example: `namespace_axiom_count`)

The goal for the biolink modelling exercise here would be to generate a json schema with which to check a document of metrics for schema constraints (datatypes etc), but also, to just have a nicely readable documentation of what the metrics mean, with the potential of perhaps using the JSON-LD context more widely to communicate metrics between groups.

```
{
  "metrics": {
    "axiom_count": 5504,
    "axiom_types": [
      "AnnotationAssertion",
      "EquivalentClasses",
      "TransitiveObjectProperty",
      "SubObjectPropertyOf",
      "SymmetricObjectProperty",
      "SubPropertyChainOf",
      "Declaration",
      "SubClassOf",
      "InverseObjectProperties"
    ],
    "axiom_type_count": {
      "AnnotationAssertion": 4356,
      "EquivalentClasses": 106,
      "TransitiveObjectProperty": 12,
      "SubObjectPropertyOf": 25,
      "SymmetricObjectProperty": 1,
      "SubPropertyChainOf": 11,
      "Declaration": 320,
      "SubClassOf": 666,
      "InverseObjectProperties": 7
    },
    "namespace_axiom_count": {
      "oboInOwl": 4819,
      "IAO": 306,
      "UBERON": 1903,
      "rdfs": 308,
      "BFO": 283,
      "obo": 235,
      "RO": 307,
      "foaf": 56,
      "BSPO": 29
    }
  }
}
```

The first attempt at dealing with this looks something like this:

```
id: http://www.obofoundry.org/registry/metrics.yml
name: metrics

types:
  mean:
    base: float
    uri: xsd:float
  count:
    base: int
    uri: xsd:int
  string:
    base: str
    uri: xsd:string
  boolean:
    base: boolean
    uri: xsd:boolean

classes:

  metrics:
    slots:
        - axiom_count
        - axiom_types
        
  axiom_type_count:
    description: Counting the various axiom types used in the ontology.
    slots:
        - AnnotationAssertion
        - EquivalentClasses
        - TransitiveObjectProperty
        - SubObjectPropertyOf
        - SymmetricObjectProperty
        - SubPropertyChainOf
        - Declaration
        - SubClassOf
        - InverseObjectProperties
  
  namespace_axiom_count:
    description: The number of axioms used by this ontology, broken down by which namespaces they reference (according the the OBO curiemap). For example, 19 axioms reference at least 1 entity in the BFO namespace.

slots:
  axiom_count:
    description: The number of axioms in the ontology.
    range: count
  axiom_types:
    description: A list of axiom types used in the ontology.
    multivalued: true
    
  AnnotationAssertion:
    range: count
  EquivalentClasses:
    range: count
  TransitiveObjectProperty:
    range: count
  SubObjectPropertyOf:
    range: count
  SymmetricObjectProperty:
    range: count
  SubPropertyChainOf:
    range: count
  Declaration:
    range: count
  SubClassOf:
    range: count
  InverseObjectProperties:
    range: count
```

@cmungall 
@deepakunni3 has already given me some advice on how to go about this use case, which is obviously a bit non-standard.. First of all I find it unsatisfying to have some metrics being slots and others being classes. Secondly, I don't know exactly how to model the  `namespace_axiom_count` case, due to the open set of keys. Deepak recommended to use key/value modelling, but it seems unsatisfactory to bend the perfectly find JSON structure just to fit a modelling framework. What are your thoughts on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best way to model four different kinds of metrics in biolink ML #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best way to model four different kinds of metrics in biolink ML #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions