Skip to content

Commit 342a252

Browse files
Tyler KniessTyler Kniess
Tyler Kniess
authored and
Tyler Kniess
committed
increase pronominal coverage
1 parent b5bb1cd commit 342a252

7 files changed

+258
-40322
lines changed

Makefile

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
all:
2-
cat os.lexc > os-copy.lexc
2+
cat os.lexc > backups/os.lexc.copy
3+
cat os.twol > backups/os.twol.copy
34
hfst-lexc -v os.lexc -o os.lexc.hfst
45
hfst-twolc os.twol -o os.twol.hfst
56
hfst-compose-intersect -v -o os.gen.hfst os.lexc.hfst os.twol.hfst
67
hfst-invert -v os.gen.hfst -o os.mor.hfst
78
hfst-fst2strings os.mor.hfst --xfst obey-flags | sort | gsed s/':'/'\t'/g|gsed s/'<'/' <'/g | gsed s/'>'/'> '/g > forms.txt
89
python3 gi-prefix.py
910
cat os.lexc | sh visualize_transducer.sh
10-
cat forms-prefixed.txt>forms.txt
11+
cat forms-prefixed.txt > forms.txt
1112
python3 coverage.py
1213
cat heliand/clean.txt|while read line; do hfst-lookup os.mor.hfst; done > lookup.txt
1314
echo 'The top 20 words yet to be included are:'

OldSaxonLexc.png

14.4 KB
Loading

coverage.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
covered_total_words = []
99
words_not_covered = []
1010

11-
heliand = 'heliand/hel-unique-tokens-clean.txt'
11+
heliand = 'heliand/hel-unique-tokens.txt'
1212
heliand_total = 'heliand/clean.txt'
1313
muspilli = 'forms-prefixed.txt'
1414

coverage.txt

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
The Hêliand contains a total of 46433 forms, 7965 of which are unique.
1+
The Hêliand contains a total of 46433 forms, 7506 of which are unique.
22

3-
Muspilli currently contains 429540 synthetically-generated forms.
3+
Muspilli currently contains 429545 synthetically-generated forms.
44

5-
Of the 7965 unique tokens in the Hêliand, Muspilli contains 3182 forms, or 39.95%
5+
Of the 7506 unique tokens in the Hêliand, Muspilli contains 3324 forms, or 44.285%
66

7-
Of the 46433 total tokens in the Hêliand, Muspilli contains 37157 forms, or 80.023%
7+
Of the 46433 total tokens in the Hêliand, Muspilli contains 37359 forms, or 80.458%
88

forms.txt

+17-40,086
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)