- Jake Barnabe
- Simon Frew
- Merete Lutz
- Waleed Mahmood
We attempt to construct a classification model using an RBF SVM classifier algorithm which uses FIFA22 player attribute ratings to classify players' potential with target classes "Low", "Medium", "Good", and "Great". The classes are split on the quartiles of the distribution of the FIFA22 potential ratings. Our model performed reasonably well on the test data with an accuracy score of 0.84, with hyperparamters C: 100 & Gamma: 0.01. However, we believe there is still significant room for improvement before the model is ready to be utilized by soccer clubs and coaching staffs to predict the potential of players on the field instead of on the screen.
The final report can be found here
- Docker is a container used
to manage the software dependencies for this project.
The Docker image used for this project is based on the
quay.io/jupyter/minimal-notebook
image. Additional dependencies are specified in the Dockerfile. - Pythonic dependencies are described in the environment.yaml
-
Install and launch Docker on your computer.
-
Clone this GitHub repository.
-
Navigate to the root of this project before beginning the project analysis.
- Navigate to the root of this project on your computer using the command line and enter the following command:
docker compose up
-
In the terminal, look for a URL that starts with
http://127.0.0.1:8888/lab?token=
Copy and paste that URL into your browser. -
To replicate the analysis, open a terminal window within Jupyter Lab and run:
conda activate fifa-potential
bash run.sh
- To view the analysis, run the following in the root directory to rebuild the report and copy it to the
docs/
directory.
jupyter-book build --all report
yes | cp -r -f report/_build/html/* docs
- To shut down the container and clean up the resources,
press
Ctrl
+C
in the terminal where you launched the container, and then type
docker compose rm
- Run the following command from the root directory of this project, reset the repository to its original clean state:
docker-compose run --rm fifa-potential make clean
- Run the following command from the root directory of this project, in order to replicate the analysis:
docker-compose run --rm fifa-potential make all
- Install local dependencies
conda env create --file environment.yaml
- To replicate the analysis, navigate to the root of this project on your computer using the command line and enter the following commands:
conda activate fifa-potential
bash run.sh
- To view the analysis, run the following in the root directory to rebuild the report and copy it to the
docs/
directory.
jupyter-book build --all report
yes | cp -r -f report/_build/html/* docs
- To remove all modified files, execute
git restore .
at the root of the repository to revert all local changes to the repository
-
Add the dependency to the
Dockerfile
file on a new branch. -
Re-build the Docker image locally to ensure it builds and runs properly.
-
Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
-
Update the
docker-compose.yml
file on your branch to use the new container image (make sure to update the tag specifically). -
Send a pull request to merge the changes into the
main
branch.
Tests are run using the pytest
command in the root of the project.
More details about the test suite can be found in the
tests
directory.
Run the following in the root directory to build the report and copy it to the docs/
directory.
jupyter-book build --all report
y | cp -r -f report/_build/html/* docs
Note that this will not rerun the analysis itself, simply update the rendered report.
Refer to run.sh for execution order. Commands and recommended parameters are listed below as required for Milestone 3:
# Load, clean, and tidy data
python src/01_load_clean_tidy.py \
--url=https://sports-statistics.com/database/fifa/fifa_2022_datasets.zip \
--filename=players_22.csv
# Generate EDA figures
python src/02_eda_figures.py \
--dataset=data/processed/fifa_train.csv \
--target=potential
# Preprocess data
python src/03_preprocessing.py \
--train=data/processed/fifa_train.csv \
--test=data/processed/fifa_test.csv
# Complete model selection
python src/04_model_selection.py \
--scaled_train=data/processed/scaled_fifa_train.csv
# Complete hyperparameter tuning
python src/05_hyperparameter_scoring.py \
--scaled_train=data/processed/scaled_fifa_train.csv \
--scaled_test=data/processed/scaled_fifa_test.csv
Third parties wishing to:
- Contribute to the software.
- Report issues or problems with the software.
- Seek support.
Please refer to
CONTRIBUTING.md
This report is licensed under a Attribution-NonCommercial-NoDerivs 4.0 International (CC BY-NC-ND 4.0 Deed) License with the repository itself under a MIT License. The underlying dataset is licensed by a CC0 1.0 Universal (Public Domain) license.
This README file references https://github.com/ttimbers/breast_cancer_predictor_py/tree/main.
US National Soccer Players. (2023). (rep.). How to evaluate soccer players. Retrieved from https://ussoccerplayers.com/soccer-training-tips/evaluating-players.
Harris, C.R. et al., 2020. Array programming with NumPy. Nature, 585, pp.357–362.
McKinney, Wes. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, edited by Stéfan van der Walt and Jarrod Millman, 51–56.
Pauli Virtanen, et al., and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272.
Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), pp.2825–2830.
VanderPlas et al., (2018). Altair: Interactive Statistical Visualizations for Python. Journal of Open Source Software, 3(32), 1057, https://doi.org/10.21105/joss.01057
Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.