midterm-peer review dh734

This project mainly analyzes the historical demographic dataset to build up a regression model to predict the election results' swing counties. The training model will build up using the data from the 2016 presidential election. And finally, they are trying to use the 2020 presidential election data. The current report is showing that they are trying to build up a regression model to make the prediction.

I think this group is doing a wonderful job.
Firstly, the way of feature transforming is very cool. I think the way they divided the raw count data fully represents the data it contains. 
Secondly, they make the correlation analysis for the dataset. The significant level they set up is 25%. This will help them denoise the model and also saving the computation resource.
Finally, they develop five-fold cross-validation that will help them keep away from the under or overfitting problem.

However, there is still some space to improve.
The current model they are using is a regression model. However, their response would be boolean data. Therefore, I suggested using a classification model like the logistic regression model.

Secondly, they get rid of a large amount of missing value. I think they can try a different method to utilize this missing value, like matrix completion.

Finally, for the current result, I am a little bit concerned about the underfitting problem. They can also try to compute the variance and bias of the current model.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

midterm-peer review dh734 #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

midterm-peer review dh734 #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions