California housing model

ML model on California Housing Dataset

This project is a machine learning pipeline for predicting California housing prices using the California Housing Dataset. It includes data preprocessing, model training, evaluation using cross-validation, and inference — all in a clean and production-friendly script.

Features

Stratified train/test split using income categories
Data preprocessing with 'Pipeline' and 'ColumnTransformer'
One-hot encoding for categorical data
SimpleImputer to handle missing data
'RandomForestRegressor' model
Cross-validated RMSE evaluation
Model and pipeline persistence with 'joblib'
Inference support on unseen data ('input.csv' → 'output.csv')

Download the dataset

Get "housing.csv" from the official GitHub repository of the Hands-On ML book Place it in the same directory as main_new.py.

Train or Inference

The script auto-detects if the model is already trained:

If no model exists, it will:

Load data
Preprocess
Train the model
Save the model & pipeline
Export test set to input.csv

If model exists, it will:

Load input.csv
Predict housing prices
Save results to output.csv

Dependencies

Scikit-Learn
Pandas
Numpy
Joblib

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
main_new.py		main_new.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

California housing model

Features

About

Uh oh!

Releases

Packages

Languages

Kritank07/Data-Science-ML-model

Folders and files

Latest commit

History

Repository files navigation

California housing model

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages