Skip to content

Update InterRowMSAS to compare the overall distributions between row n and n+1 (not the average) #800

@npatki

Description

@npatki

Problem Description

The InterRowMSAS metric is designed to compare real and synthetic sequences when it comes to their overall noisiness.

It does this by taking the difference between the value in a row $n$ and $n-1$ within each sequence. However after doing this, it averages the difference within each sequence. The problem is that when we do the average, we cancel out all the values except for the first and last.

$(x_1 - x_0) + (x_2 - x_1) + (x_3 - x_2) + ... + (x_n - x_{n-1}) = x_n - x_0$

Expected behavior

One simple fix to make this metric more informative is to not take the average within sequence. Rather, we can collect all the differences (from row $n$ and $n-1$) from all the sequences into a final distribution.

This will yield one overall distribution for real data ($D_r$) and one for the synthetic data ($D_s$). We can continue to take the KSComplement to compare these overall distributions (rather than a distribution of the means).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions