-
Notifications
You must be signed in to change notification settings - Fork 82
Open
Description
Hi,
@albahnsen Do you have any thoughts on the relationship between the cost matrix and re-balancing the data? I notice that you do not rebalance in your final logistic regression model in your wonderful paper.
If I have a highly imbalanced dataset where:
<1% are Positive
99% are Negative
But the theoretical cost is:
30 if all are labelled positive
and 1 if all are labelled negative
What should I be adjusting to stop it predicting all Positive? The imbalance? The cost? THe iterations?
Thanks!
Edit: I've done some Cross Validation to check different C aand max_iter but it seems like the best savings score I can get it 0 (with the worst being -12).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels