High changepoint number prediction and penalty value #338

ClaretJeanLoup · 2025-02-27T10:19:12Z

Hi,

I'm running ruptures on biological data: I'm trying to detect genomic duplication breakpoints from depth of coverage variation in genome mapping data. I tried to use:
algo = rpt.Pelt(model="rbf").fit(d_subset['norm'].values)
refined_result = algo.predict(pen=X)
Making the penalty value vary from 1 to values in the thousands, but no matter the penalty value I still en up with more than 2k change points on a dataset of 12K values.
Is my data too noisy to reduce the number of changepoints or do I not get how penalty should be set? I expected higher penalty values to result in longer computing time and a smaller number of predicted change points.
I tried other algorithms like BottomUp and Window and got some nice predictions on smoothed (mean value of slidding windows of a thousand points) and normalised data but I would like to see if I can get more precise ones with Pelt algorithm.

Thanks for the amazing package you designed!

ClaretJeanLoup closed this as completed Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High changepoint number prediction and penalty value #338

High changepoint number prediction and penalty value #338

ClaretJeanLoup commented Feb 27, 2025

High changepoint number prediction and penalty value #338

High changepoint number prediction and penalty value #338

Comments

ClaretJeanLoup commented Feb 27, 2025