EMPom-NLP

EMPom-NLP is a Python module for preprocessing, clustering, topic modelling and sentiment analysis of raw message data, extracted from EMPomoschnik chat bot.

Features

Data transformation from raw .xlsx file with chat history
Data preprocessing:
- message separation, redundant data removal
- order/shop/incident codes detection and preservation
- stemming/lemmatization
Vectorization (TF-IDF or One-Hot)
Clustering using KMeans
Topic definition using Latent Dirichlet Allocation (and TF-IDF top features, if chosen as vectorizer)
Interactive cluster and topic distribution visualization using TSNE embedding (tweaked pyLDAvis)
Sentiment analysis (in order to conduct sentiment analysis, install pretrained labse model from here)
Saving results in an easily interpretable format for further actions (.xslx)

Demonstration

To see the module in action, please investigate the following notebook.

File structure

For better usability, the module is organized in several files.

EMPom-NLP
│   README.md                       # This introductory text file
│   demo.ipynb                      # Notebook for workflow demonstration
│
└───classes
│   │   Preprocessing.py            # Data extraction and preprocessing classes
│   │   UniVectorizer.py            # Universal vectorizer class
│   
└───auxiliary
    │   Visualization.py            # Visualization
    │   Sentiment.py                # Sentiment analysis tool
    │   Insight.py                  # Functions for better result interpretation
    └───kmeans_to_pyLDAvis          # Module to simplify clustering visualization
        | ...

Credits

Code written by Anna Pastukhova and Maxim Plotnikov. Curated and supervised by Egor Terikov, Vitaliy Makarenko and Vladislav Smirnov.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EMPom-NLP

Features

Demonstration

File structure

Credits

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
auxiliary		auxiliary
classes		classes
data		data
.gitignore		.gitignore
README.md		README.md
demo.ipynb		demo.ipynb
kmeans_vis.html		kmeans_vis.html

Mc-Seem/Empom-NLP

Folders and files

Latest commit

History

Repository files navigation

EMPom-NLP

Features

Demonstration

File structure

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages