This project implements the Apriori algorithm for frequent itemset mining on Twitter data related to flu shots. The implementation analyzes keyword co-occurrence patterns in tweets to discover meaningful associations between terms.
- Implementation of the Apriori algorithm for frequent pattern mining
- Support for processing large-scale Twitter datasets
- Configurable minimum support threshold
- Output ranking system for discovered patterns
- Performance optimizations for handling large datasets
- Pattern Mining Algorithm: Core implementation of the Apriori algorithm
- Data Processing: Handles text data with keyword separators
- Performance Monitoring: Ensures efficient processing within specified time constraints
- Results Generation: Creates formatted output of discovered patterns with support counts
The program accepts three command-line parameters:
- Input dataset filename
- Minimum support count threshold
- Output filename
- Name: Jinghan (Summer) Sun
- Email: jinghan.sun@emory.edu
