Skip to content

Chandrashekharreddy6/Data-Quality-Analysis-Using-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Data-Quality-Analysis-Using-BERT

This project helps in understanding the Data Quality Analysis using BERT

BERT(Bidirectional Encoder Representations from Transformers) is a natural language processing (NLP) model developed by Google. It's a neural network that analyzes text by considering the context of words in a sentence, rather than just looking at them one by one. BERT is used to improve the accuracy of search engines, language translation, and conversational AI.

This project uses BERT token to detect the Anamolies present in the data.

NOTE:IN THIS PROJECT WE HAVE TAKEN A SIMPLE DATASET.

THIS Project Also has multiple Visualizations that helps in understanding the Data Quality Analysis using BERT.

The visualizations are :-

 CLASS DISTRIBUTION BEFORE UNDERSAMPLING :- 

Figure_1

 CLASS DISTRIBUTION AFTER UNDERSAMPLING :-

Figure_2

 ANOMALY SCORE DETECTION :-

anamoly

 WORD COUNT DISTRIBUTION :-

WORD

 CHARACTER COUNT DISTRIBUTION :-

CHAR

 CONFUSION MATRIX :-

confusion matrix

 PCA VISULAIZATIONS OF BERT EMBEDDINGS :-

PCA

 FEATURE IMPORTANCE(RANDOM FOREST) :-

feature

  ROC CURVE FOR ANOMALY DETECTION :-

ROC

HOW TO RUN THE PROJECT

FIRST DOWNLOAD THE CODE. CREATE THE FOLDER WITH ANY NAME U WANT AND IN THAT FOLDER PASTE THE PY AND DATASET FILE FROM CODE FILE WHICH U HAVE DOWNLOADED. THEN OPEN VS CODE AND OPEN THE FOLDER U CREATED AND OPEN THE TERMINAL. PASTE THE TEXT FROM THE REQUIREMENT TXT FILE FROM THE DOWNLOADED CODE FILE. WAIT UNTIL IT INSTALLS THE REQUIREMENTS

AFTER THE INSTALLATION PASTE THE BELOW CODE :-

        python bert_anomaly_detection.py

About

πŸš€ BERT-Based Anomaly Detection System -- > This project implements "Anomaly Detection" using " BERT embeddings" and machine learning techniques. It processes text-based data, extracts meaningful representations using "BERT", and applies "Isolation Forest" and "Random Forest" models for detecting anomalies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages