-
-
Notifications
You must be signed in to change notification settings - Fork 276
[Feature] Add RandomForestClassifier to linfa-trees #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #390 +/- ##
==========================================
+ Coverage 36.09% 36.21% +0.12%
==========================================
Files 99 100 +1
Lines 6502 6566 +64
==========================================
+ Hits 2347 2378 +31
- Misses 4155 4188 +33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks for your contribution but at the moment I have no time to review properly. I still think #229 implementation is more general and was close to being merged. Did you take look? Why not starting from it? |
i did checked but i felt i am more specific towards random forest.. hence i proceeded with my PR. I have tested it across various datasets ... it works perfectly. |
Please add serde support: https://github.com/joelchen/linfa/blob/master/algorithms/linfa-trees/src/decision_trees/random_forest.rs |
@maxprogrammer007, I've just merged #392 which introduces So if you agree, I suggest you could reuse part of your code to implement a struct RandomForest<F: Float, L: Label> {
ensemble_learner: EnsembleLearner<DecisionTree<F, L>>,
bootstrap_features_ratio: f64,
feature_indices: Vec<Vec<usize>>
} A step further, would be to manage feature subsampling directly in |
@relf sure i will proceed with new PR. |
This PR extends the
linfa-trees
crate by introducing a new Random Forest classifier. It builds upon the existing Decision Tree implementation to provide an ensemble method that typically outperforms a single tree by training many trees on bootstrapped subsets of both rows and columns (features), and then aggregating their predictions via majority voting.🚀 What’s Added
src/decision_trees/random_forest.rs
RandomForestParams
/RandomForestValidParams
: hyperparameters (n_trees
,max_depth
,feature_subsample
,seed
) with validation viaParamGuard
.RandomForestClassifier
: stores aVec<DecisionTree>
and the per-tree feature indices used for training.Fit
implementation:DecisionTree
on its slice.Predict
implementation:tree.predict(&sub_x)
(returnsArray1<usize>
).Exports
src/decision_trees/mod.rs
andsrc/lib.rs
to re-exportRandomForestParams
andRandomForestClassifier
.Example
examples/iris_random_forest.rs
: demonstrates loading the Iris dataset, training a Random Forest, printing the confusion matrix and accuracy.Unit Test
tests/random_forest.rs
: an integration test asserting ≥ 90 % accuracy on Iris with fixed RNG seed for reproducibility.Dependencies
rand = "0.8"
tolinfa-trees/Cargo.toml
for RNG and sampling utilities.README
README.md
with a “Random Forest Classifier” section, usage example, and run instructions.🧐 Motivation
Fit
/Predict
/ParamGuard
conventions and integrates cleanly withDataset
.🔍 Files Changed
📦 Example
✅ Checklist
ParamGuard
for hyperparameter validationFit<Array2<F>, Array1<usize>>
Predict<Array2<F>, Array1<usize>>
with correct feature‐slice logiccargo run --example iris_random_forest
)cargo test
)rand
dependency addedThank you for reviewing! I’m happy to address any feedback or suggestions.