Skip to content

midterm-peer review dh734 #7

@pangolion

Description

@pangolion

This project mainly analyzes the historical demographic dataset to build up a regression model to predict the election results' swing counties. The training model will build up using the data from the 2016 presidential election. And finally, they are trying to use the 2020 presidential election data. The current report is showing that they are trying to build up a regression model to make the prediction.

I think this group is doing a wonderful job.
Firstly, the way of feature transforming is very cool. I think the way they divided the raw count data fully represents the data it contains.
Secondly, they make the correlation analysis for the dataset. The significant level they set up is 25%. This will help them denoise the model and also saving the computation resource.
Finally, they develop five-fold cross-validation that will help them keep away from the under or overfitting problem.

However, there is still some space to improve.
The current model they are using is a regression model. However, their response would be boolean data. Therefore, I suggested using a classification model like the logistic regression model.

Secondly, they get rid of a large amount of missing value. I think they can try a different method to utilize this missing value, like matrix completion.

Finally, for the current result, I am a little bit concerned about the underfitting problem. They can also try to compute the variance and bias of the current model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions