Skip to content

Mc-Seem/Empom-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMPom-NLP

EMPom-NLP is a Python module for preprocessing, clustering, topic modelling and sentiment analysis of raw message data, extracted from EMPomoschnik chat bot.

Features

  • Data transformation from raw .xlsx file with chat history
  • Data preprocessing:
    • message separation, redundant data removal
    • order/shop/incident codes detection and preservation
    • stemming/lemmatization
  • Vectorization (TF-IDF or One-Hot)
  • Clustering using KMeans
  • Topic definition using Latent Dirichlet Allocation (and TF-IDF top features, if chosen as vectorizer)
  • Interactive cluster and topic distribution visualization using TSNE embedding (tweaked pyLDAvis)
  • Sentiment analysis (in order to conduct sentiment analysis, install pretrained labse model from here)
  • Saving results in an easily interpretable format for further actions (.xslx)

Demonstration

To see the module in action, please investigate the following notebook.

File structure

For better usability, the module is organized in several files.

EMPom-NLP
│   README.md                       # This introductory text file
│   demo.ipynb                      # Notebook for workflow demonstration
│
└───classes
│   │   Preprocessing.py            # Data extraction and preprocessing classes
│   │   UniVectorizer.py            # Universal vectorizer class
│   
└───auxiliary
    │   Visualization.py            # Visualization
    │   Sentiment.py                # Sentiment analysis tool
    │   Insight.py                  # Functions for better result interpretation
    └───kmeans_to_pyLDAvis          # Module to simplify clustering visualization
        | ...

Credits

Code written by Anna Pastukhova and Maxim Plotnikov. Curated and supervised by Egor Terikov, Vitaliy Makarenko and Vladislav Smirnov.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published