Following list of 4 go-implemented tokenizers were tested based on MSR text file.
All the 4 tokenizer were using default mode with hmm set to true
| metrics | gojieba | gse | jiebago | sego |
|---|---|---|---|---|
| P | 81.67% | 83.79% | 81.93% | 79.64% |
| R | 81.37% | 79.45% | 81.63% | 84.20% |
| F1 | 81.52% | 81.56% | 81.78% | 81.86% |
| Time | 1.19s | 4.41 | 6.82 | 1.56 |
gojieba
- Top performance with consideration in the trade-off between precision and performance.
- Implemented in C++, with higher performance and less resource requirement
- Do not support cross-platform compile
gse
- Best precision
- Implementation is learned from
segoandjiebago