@@ -2,10 +2,9 @@ Introduction
2
2
-----------------
3
3
4
4
XWRT (XML-WRT) is an efficient XML/HTML compressor (actually it works well with all textual files).
5
- It transforms XML to more
6
- compressible form and uses zlib (default), LZMA, PPMd, or lpaq6 as
7
- back-end compressor. This idea is based on well-known XML compressor - XMill.
8
- Moreover, XML-WRT creates a semi-dynamic dictionary and replaces frequently
5
+ It transforms XML to more compressible form and uses zlib (default), LZMA, PPMd, or lpaq6 as
6
+ back-end compressor. This idea is based on well-known XML compressor: XMill.
7
+ Moreover XML-WRT creates a semi-dynamic dictionary and replaces frequently
9
8
used words with shorter codes. There are additional techniques to improve
10
9
compression ratio:
11
10
- word alphabet can consist of start tags (like '<tag >'), urls, e-mails
@@ -19,6 +18,25 @@ compression ratio:
19
18
- quotes modeling ('="' and '">' replaced with a single char)
20
19
21
20
21
+ Comparision to other XML compressors
22
+ -------------------------
23
+
24
+ All files used for comparision can be downloaded from [ Wratislavia XML Corpus] . Results are given in bpc (bits ber character). Tested with XWRT 3.1:
25
+
26
+ file |gzip |XMill 0.9|zip|XWRT -l2 (gzip)|LZMA -a1|XWRT -l6 (LZMA)|PPMdJ -o8 -m64|XMill 0.9 PPMd|XMLPPM -l 9|SCMPPM -l 9|XWRT -l9 (PPM)|FastPAQ8 74 MB|XWRT -l11 (FastPAQ8)
27
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
28
+ dblp |1.463|1.250|0.865|0.943|0.747|0.724|0.940|0.802|0.693|0.690|0.659|0.597
29
+ enwikibooks|2.339|2.295|1.742|1.686|1.504|1.565|1.838|1.621|1.621|1.481|1.357|1.269
30
+ enwikinews |2.248|2.198|1.597|1.462|1.301|1.291|1.746|1.379|1.398|1.202|1.172|1.090
31
+ lineitem |0.721|0.380|0.276|0.421|0.243|0.359|0.270|0.261|0.242|0.243|0.236|0.226
32
+ Shakespeare|2.182|2.044|1.481|1.646|1.349|1.245|1.584|1.295|1.293|1.204|1.220|1.185
33
+ SwissProt |0.985|0.619|0.475|0.478|0.388|0.490|0.477|0.416|0.417|0.363|0.395|0.313
34
+ uwm |0.553|0.382|0.315|0.368|0.278|0.426|0.310|0.259|0.274|0.240|0.254|0.228
35
+ average |1.499|1.310|0.964|1.001|0.830|0.871|1.024|0.862|0.848|0.775|0.756|0.701
36
+
37
+ [ Wratislavia XML Corpus ] : http://pskibinski.pl/research/Wratislavia/
38
+
39
+
22
40
23
41
Usage
24
42
-----------------
@@ -67,6 +85,18 @@ ADDITIONAL OPTIONS:
67
85
```
68
86
69
87
88
+ Compilation
89
+ -------------------------
90
+ For Linux/Unix:
91
+ ```
92
+ make BUILD_SYSTEM=linux
93
+ ```
94
+
95
+ For Windows (MinGW)
96
+ ```
97
+ make
98
+ ```
99
+
70
100
71
101
Used libraries
72
102
---------------------
0 commit comments