Skip to content

Imbalanced Dataset vs Cost Function #21

@S-C-H

Description

@S-C-H

Hi,

@albahnsen Do you have any thoughts on the relationship between the cost matrix and re-balancing the data? I notice that you do not rebalance in your final logistic regression model in your wonderful paper.

If I have a highly imbalanced dataset where:
<1% are Positive
99% are Negative

But the theoretical cost is:
30 if all are labelled positive
and 1 if all are labelled negative

What should I be adjusting to stop it predicting all Positive? The imbalance? The cost? THe iterations?

Thanks!

Edit: I've done some Cross Validation to check different C aand max_iter but it seems like the best savings score I can get it 0 (with the worst being -12).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions