Limitations/Weaknesses of the Matrix Profile #793
Replies: 2 comments 1 reply
-
@statistactics Thanks for initiating the discussion and welcome to the STUMPY community! Note that I am not affiliated with the work of the original authors but here are some comments/thoughts for everyone to consider:
At the end of the day, there is no silver bullet and every analysis approach will/must have limitations and tradeoffs. In general, I've found that as long as your window size is greater than or equal to the duration of your repeating event but no more than 2x the duration then you should be okay. Of course, it is certainly possible that your time series may contain motifs of varying lengths and this is an active area of ongoing research. It is certainly valuable to try multiple different window sizes and, prior to matrix profiles, this was simply infeasible as the computational cost was too high beyond toy examples.
I'm not sure I fully understand your point so please feel free to elaborate. From my vantage point, "simple" is good and, if you don't know what you are looking for within your time series then "simple patterns" can be quite useful and can help orient where you should be looking first. Are you able to share a more concrete example?
Indeed, beyond the simplified O(n) space complexity, computing all of the pairwise z-norm Euclidean distances efficiently is the fundamental breakthrough. The significant speed increase(s) would not be possible/tenable without this research. Note that the work has been extended to non-normalized Euclidean distance as well as generalized to p-norm as well, which is generally sufficient for a large percentage of use cases. Alas, matrix profiles is another tool in the tool belt. The difference is that it is much more usable now for larger time series beyond 1,000 datapoints. |
Beta Was this translation helpful? Give feedback.
-
I also observed this effect. Euclidean Distance is sensetive to pattern-complexity aka jiggly-ness. Furthermore there is a paper that compensates the effect of random noise by estimating the random-noise: A repeating cause of errors are constant subsequences due to the division by standard-deviation. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone, given that this is a new method in time series mining, there has been nothing but praise for this method online.
Don't get me wrong, I love the matrix profile's versatility and efficiency. However, I think it's helpful to play the devil's advocate and discuss some limitations to perhaps find possible directions for improvement. After playing around with STUMPY and doing some reading, I concluded that the main limitations are:
Disclaimer: I have not read all 20+ research papers, but I have skimmed enough to have a decent grasp of the concept. Look forward to hearing everybody's thoughts!
Beta Was this translation helpful? Give feedback.
All reactions