Generic labels in metrics, usage in events and q/tx#816
Generic labels in metrics, usage in events and q/tx#816
Conversation
amimart
left a comment
There was a problem hiding this comment.
Hey I've let some remarks,
My main concern is about metrics, having labels that can carry a lot of different values can be dangerous for the node's memory, I even think it is a bad idea to have topic ids in metrics..
If we really need such labels I think we should consider implementing a metrics exporter that would expose them by watching the chain's state
|
|
||
| labels := map[string]string{ | ||
| "topic_id": strconv.FormatUint(msg.TopicId, 10), | ||
| "address": msg.Sender, |
There was a problem hiding this comment.
We should avoid labels with high cardinality, I think setting address as label would create too much metric entries which can impacts node performance
| labels := map[string]string{ | ||
| "address": msg.ReputerValueBundle.ValueBundle.Reputer, | ||
| "topic_id": strconv.FormatUint(msg.ReputerValueBundle.ValueBundle.TopicId, 10), | ||
| "nonce": strconv.FormatUint(uint64(msg.ReputerValueBundle.ValueBundle.ReputerRequestNonce.ReputerNonce.BlockHeight), 10), | ||
| "blockHeight": strconv.FormatInt(blockHeight, 10), | ||
| } | ||
| defer metrics.RecordMetrics("InsertReputerPayload", time.Now(), &err, labels) |
There was a problem hiding this comment.
I have multiple remarks here:
- The
time.Now()being executed here would provide wrong latency measurement; - The
address,nonce,blockHeightintroduce a very high cardinality, that can be dangerous;
I think we should have only one defer statement at the beginning, and if we need to evaluate vars at execution we can use defer func() {} instead
|
|
||
| moduleParams, err := ms.k.GetParams(ctx) | ||
| if err != nil { | ||
| defer metrics.RecordMetrics("InsertReputerPayload", time.Now(), &err, map[string]string{"error": err.Error()}) |
There was a problem hiding this comment.
Having error messages as label can creates a lot of metric entries and are hard to use (e.g. error msgs can contains values creating multiple error labels for the same one), it's preferable to use error codes instead
| labels := map[string]string{ | ||
| "worker_address": req.WorkerAddress, | ||
| "topic_id": strconv.FormatUint(req.TopicId, 10), | ||
| } | ||
| defer metrics.RecordMetrics("GetWorkerLatestInferenceByTopicId", time.Now(), &err, labels) |
There was a problem hiding this comment.
Is it needed to have labels on queries? I understand the need of labels when it's related to a mutation in the state but I'm not sure it's relevant for queries
| if labels == nil { | ||
| labels = make(map[string]string) | ||
| } |
There was a problem hiding this comment.
| if labels == nil { | |
| labels = make(map[string]string) | |
| } | |
| if labels == nil { | |
| labels = make(map[string]string) | |
| } |
We can read from a nil map so let it be nil
| return | ||
| } | ||
| metrics.IncrProducerEventCount(metrics.INFERER_SCORE_EVENT) | ||
| metrics.IncrProducerEventCountWithLabels(metrics.INFERER_SCORE_EVENT, map[string]string{"topic_id": fmt.Sprintf("%d", topicId)}) |
There was a problem hiding this comment.
Maybe better to use strconv.FormatUint as it is done in other places
Thanks for the review. Yeah you're totally right about the high cardinality. I'm reviewing this. TopicId would be very useful, but the concern stays and we may just want to get that from offchain data only, in which case this PR may just be discarded. |
|
It was decided to move out from adding cardinality to the node and use offchain data for topicId-based alerts. Closing PR. |
Purpose of Changes and their Description
Are these changes tested and documented?
Unreleasedsection ofCHANGELOG.md?