Plagiarism detector project

This repository contains code and associated files for deploying a plagiarism detector using AWS SageMaker. This project was submitted in partial fulfillment of the requirements for the Udacity Machine Learning Nanodegree.

Project Overview

In this project a plagiarism detector is built that examines a text file and performs binary classification; labeling that file as either plagiarised or not, depending on how similar that text file is to a provided source text.

This project is broken down into three main notebooks:

Notebook 1: Data Exploration

Load in the corpus of plagiarism text data.
Exploration of the existing data features and the data distribution.

Notebook 2: Feature Engineering

Clean and pre-process the text data.
Define features for comparing the similarity of an answer text and a source text, and extract similarity features.
Feature selection, by analyzing the correlations between different features.
Create train/test .csv files that hold the relevant features and class labels for train/test data points.

Notebook 3: Train and deploy a neural network in SageMaker

Upload train/test feature data to S3.
Define a binary classification PyTorch model and a training script.
Train the PyTorch model and deploy it using SageMaker.
Evaluate the deployed classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
plagiarism_data		plagiarism_data
source_pytorch		source_pytorch
1_Data_Exploration.ipynb		1_Data_Exploration.ipynb
2_Plagiarism_Feature_Engineering.ipynb		2_Plagiarism_Feature_Engineering.ipynb
3_Training_a_Model.ipynb		3_Training_a_Model.ipynb
README.md		README.md
helpers.py		helpers.py
problem_unittests.py		problem_unittests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plagiarism detector project

Project Overview

About

Releases

Packages

Languages

mironable/plagiarism-detector

Folders and files

Latest commit

History

Repository files navigation

Plagiarism detector project

Project Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages