Skip to content

jaympatel1893/HiddenMarkovModelPOSTagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hidden Markov Model for Parts of Speech Tagging

Data

Corpus is in Catalan Language
Traning Data: "catalan_corpus_train_tagged.txt"
Raw Test Data: "catalan_corpus_dev_raw.txt"
Tagger Test Data: "catalan_corpus_dev_tagged.txt"

Hidden Markov Model

Given observation as words, we need to find most likely tags of each word. "Finding the most likely explanation for an observation sequence can be solved efficiently using the Viterbi algorithm."

Following the algorithm from Wiki - https://en.wikipedia.org/wiki/Viterbi_algorithm

Programs

hmmlearn.py will learn a hidden Markov model from the training data, and hmmdecode.py will use the model to tag new data.

python hmmlearn.py /path/to/input

This will generate hmmmodel.txt The format is as follows:
countTransition countEmission countTags possibleTagsForEachWord

The tagging program will be invoked in the following way:

python hmmdecode.py /path/to/input

The program will read the parameters of a hidden Markov model from the file hmmmodel.txt, tag each word in the test data, and write the results to a text file called hmmoutput.txt

About

Part of Speech Tagging using Hidden Markov Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages