Data-Quality-Analysis-Using-BERT

This project helps in understanding the Data Quality Analysis using BERT

BERT(Bidirectional Encoder Representations from Transformers) is a natural language processing (NLP) model developed by Google. It's a neural network that analyzes text by considering the context of words in a sentence, rather than just looking at them one by one. BERT is used to improve the accuracy of search engines, language translation, and conversational AI.

This project uses BERT token to detect the Anamolies present in the data.

NOTE:IN THIS PROJECT WE HAVE TAKEN A SIMPLE DATASET.

THIS Project Also has multiple Visualizations that helps in understanding the Data Quality Analysis using BERT.

The visualizations are :-

 CLASS DISTRIBUTION BEFORE UNDERSAMPLING :-

 CLASS DISTRIBUTION AFTER UNDERSAMPLING :-

 ANOMALY SCORE DETECTION :-

 WORD COUNT DISTRIBUTION :-

 CHARACTER COUNT DISTRIBUTION :-

 CONFUSION MATRIX :-

 PCA VISULAIZATIONS OF BERT EMBEDDINGS :-

 FEATURE IMPORTANCE(RANDOM FOREST) :-

  ROC CURVE FOR ANOMALY DETECTION :-

HOW TO RUN THE PROJECT

FIRST DOWNLOAD THE CODE. CREATE THE FOLDER WITH ANY NAME U WANT AND IN THAT FOLDER PASTE THE PY AND DATASET FILE FROM CODE FILE WHICH U HAVE DOWNLOADED. THEN OPEN VS CODE AND OPEN THE FOLDER U CREATED AND OPEN THE TERMINAL. PASTE THE TEXT FROM THE REQUIREMENT TXT FILE FROM THE DOWNLOADED CODE FILE. WAIT UNTIL IT INSTALLS THE REQUIREMENTS

AFTER THE INSTALLATION PASTE THE BELOW CODE :-

        python bert_anomaly_detection.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
PROJECT VIDEO (1).mp4		PROJECT VIDEO (1).mp4
README.md		README.md
Requirements.txt		Requirements.txt
bert_anomaly_detection.py		bert_anomaly_detection.py
dataset.csv		dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Quality-Analysis-Using-BERT

This project helps in understanding the Data Quality Analysis using BERT

This project uses BERT token to detect the Anamolies present in the data.

NOTE:IN THIS PROJECT WE HAVE TAKEN A SIMPLE DATASET.

THIS Project Also has multiple Visualizations that helps in understanding the Data Quality Analysis using BERT.

The visualizations are :-

HOW TO RUN THE PROJECT

AFTER THE INSTALLATION PASTE THE BELOW CODE :-

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data-Quality-Analysis-Using-BERT

This project helps in understanding the Data Quality Analysis using BERT

This project uses BERT token to detect the Anamolies present in the data.

NOTE:IN THIS PROJECT WE HAVE TAKEN A SIMPLE DATASET.

THIS Project Also has multiple Visualizations that helps in understanding the Data Quality Analysis using BERT.

The visualizations are :-

HOW TO RUN THE PROJECT

AFTER THE INSTALLATION PASTE THE BELOW CODE :-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages