Skip to content

Utilities for conducting qualitative analysis, such as topic modelling of free-text survey responses

License

Notifications You must be signed in to change notification settings

JiscDACT/qualkit

Repository files navigation

DACT Qualitative Analysis Toolkit (qualkit)

This project is a collection of utilities for conducting qualitative analysis.

It currently consists of the following modules:

  • clean: a utility for cleaning up text prior to use with other tools
  • sentiment: a wrapper around SciKit's SentimentIntensityAnalyzer
  • anchored_topic_model: creates topic models using the Corex algorithm (Gallagher et. al., 2017) with user-supplied anchors to 'steer' the model using domain knowledge
  • stopwords: a standard set of stopwords
  • topics: a wrapper around SciKit's LatentDirichletAllocation
  • keywords: a wrapper around NLTK's RAKE (Rapid Keyword Extraction) algorithm for finding keywords in text.

For more details on each module, see the 'docs' folder.

Installing the toolkit and its requirements

Install using:

pip install qualkit

Or add 'qualkit' to your requirements.txt file, or add as a dependency in project properties in PyCharm.

User Control

A user has control over the following aspects when using this toolkit which will influence outputs.

  • Anchoring strategies
  • Anchor Strength
  • Number of topics
  • Labelling True/False for each topic instead of dichotomising
  • How data is preprocessed before topic modelling, redaction, tfidr vectoriser etec

References

Gallagher, R. J., Reing, K., Kale, D., and Ver Steeg, G. "Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge." Transactions of the Association for Computational Linguistics (TACL), 2017.

About

Utilities for conducting qualitative analysis, such as topic modelling of free-text survey responses

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages