Added support for random weighted sampling for unbalanced datasets by hummuscience · Pull Request #284 · paninski-lab/lightning-pose

hummuscience · 2025-04-22T11:21:24Z

This is a draft pull request for adding the weighted random sampling option as suggested here: #158 (comment)

I am not very experienced with coding and good practices (typical biologist background :D) so I put this together with the help of an LLM.

For some reason, I was unable to pass the config setting to the function. Am I missing something? This is the reason it's currently set to be enabled by default. Technically, even when its turned on, it should still work normally for data that is not unbalanced, right?

I also added a test to check if the functionality works. Even though, it makes more sense to add a test with an unbalanced dataset as input and check that the outputs are correct. Right?

Also, I am not sure yet how well this works with the suggestion to use the COCO input for heterogenous datasets: #263

As I said, not much experience here and would love some input on how to do this right.

I am also open for a meeting (as @themattinthehatt suggested) to discuss this and also adding the top_view_mouse model to LP.

themattinthehatt · 2025-04-24T16:02:44Z

Thanks for the PR @hummuscience! Happy to take a closer look soon. I'm a bit swamped until mid-May with end of semester/deadlines, but after that let's definitely plan to meet and discuss further (both this PR and the top_view_mouse model). This work will actually dovetail quite nicely with the COCO input for heterogeneous datasets issue.

themattinthehatt · 2026-06-15T23:32:56Z

@hummuscience you were so far ahead of me on this one! Sorry I left this lingering for so long, I got very in-the-weeds with the Lightning Pose 3D project. I'm finally circling back to this problem of training a "super animal" model across multiple datasets. I have a couple PRs that are building out the infrastructure better:

add visibility column to labels csv #440 : more formally separating the "occluded" condition from "not labeled"
horizontal flip augmentation #446 : properly implementing the horizontal flip augmentation that caused you so many headaches

Right now I'm playing around with some datasets where I've subsampled each to have the same number of frames, but once I have some proof-of-concept results with that I'll want to come back to this PR.

hummuscience · 2026-06-16T12:15:28Z

Funnily, I ended up cleaning this up with a slightly different approach to make this work :D I could have a look how much it diverged from what I did here

themattinthehatt · 2026-06-16T18:23:17Z

Please let me know, I'm curious! I'll still be playing around with artificially balanced datasets for the next couple of weeks but will then look to move beyond that and fit models on the full, imbalanced datasets.

Added support for random weighted sampling

4128c70

themattinthehatt self-requested a review April 24, 2025 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for random weighted sampling for unbalanced datasets#284

Added support for random weighted sampling for unbalanced datasets#284
hummuscience wants to merge 1 commit into
paninski-lab:mainfrom
hummuscience:random_weighted_sampler

hummuscience commented Apr 22, 2025

Uh oh!

themattinthehatt commented Apr 24, 2025

Uh oh!

themattinthehatt commented Jun 15, 2026

Uh oh!

hummuscience commented Jun 16, 2026

Uh oh!

themattinthehatt commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hummuscience commented Apr 22, 2025

Uh oh!

themattinthehatt commented Apr 24, 2025

Uh oh!

themattinthehatt commented Jun 15, 2026

Uh oh!

hummuscience commented Jun 16, 2026

Uh oh!

themattinthehatt commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants