Skip to content

Commit dbcf070

Browse files
committed
added HISTORY.md
1 parent 127f44c commit dbcf070

File tree

2 files changed

+65
-4
lines changed

2 files changed

+65
-4
lines changed

HISTORY.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
XWRT 3.4 (5.11.2007)
2+
-PPMVC replaced with PPMd var J1
3+
-added support for LZMA and PPMd back-end compression on Linux
4+
5+
XWRT 3.2 (25.10.2007)
6+
-FastPAQ8 replaced with lpaq6 (compression level 10-14)
7+
8+
XWRT 3.1 (05.06.2007)
9+
-improved support for XML files encoded in UTF-8
10+
-dictionary is compressed using front compression
11+
-added little-endian/big-endian Unicode (UCS-2) support
12+
-non-textual files are compressed/stored without using a filter
13+
-64-bit compiler support
14+
15+
XML-WRT 3.0 (14.09.2006)
16+
-internal PPMVC and FastPAQ8 compression
17+
18+
XML-WRT 2.0 (14.06.2006)
19+
-internal zlib and LZMA compression
20+
-input XML file is split into containers depend on start-tags and end-tags and content under the same tag is sent to the same container
21+
-container for dates in format 1980-02-31 and 01-MAR-1920
22+
-container for times in format 11:30pm
23+
-container for numbers from 1900 to 2155 (years)
24+
-container for pages in format "x-y", where y-x<256, eg. "120-148", "1480-1600"
25+
-container for numbers in format "x-y", eg. "1234-0", "87-623"
26+
-container for two digits after period, eg. "102.00", "12.01"
27+
-container for numbers from 0.0 to 24.9 (one digit after period), eg. "12.0", "9.9"
28+
-urls (statring from "http:"), e-mails ([email protected]), "&uuml;" added to dynamic dictionary
29+
30+
XML-WRT 1.0 (27.03.2006)
31+
-first public release

README.md

+34-4
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,9 @@ Introduction
22
-----------------
33

44
XWRT (XML-WRT) is an efficient XML/HTML compressor (actually it works well with all textual files).
5-
It transforms XML to more
6-
compressible form and uses zlib (default), LZMA, PPMd, or lpaq6 as
7-
back-end compressor. This idea is based on well-known XML compressor - XMill.
8-
Moreover, XML-WRT creates a semi-dynamic dictionary and replaces frequently
5+
It transforms XML to more compressible form and uses zlib (default), LZMA, PPMd, or lpaq6 as
6+
back-end compressor. This idea is based on well-known XML compressor: XMill.
7+
Moreover XML-WRT creates a semi-dynamic dictionary and replaces frequently
98
used words with shorter codes. There are additional techniques to improve
109
compression ratio:
1110
- word alphabet can consist of start tags (like '<tag>'), urls, e-mails
@@ -19,6 +18,25 @@ compression ratio:
1918
- quotes modeling ('="' and '">' replaced with a single char)
2019

2120

21+
Comparision to other XML compressors
22+
-------------------------
23+
24+
All files used for comparision can be downloaded from [Wratislavia XML Corpus]. Results are given in bpc (bits ber character). Tested with XWRT 3.1:
25+
26+
file |gzip |XMill 0.9|zip|XWRT -l2 (gzip)|LZMA -a1|XWRT -l6 (LZMA)|PPMdJ -o8 -m64|XMill 0.9 PPMd|XMLPPM -l 9|SCMPPM -l 9|XWRT -l9 (PPM)|FastPAQ8 74 MB|XWRT -l11 (FastPAQ8)
27+
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
28+
dblp |1.463|1.250|0.865|0.943|0.747|0.724|0.940|0.802|0.693|0.690|0.659|0.597
29+
enwikibooks|2.339|2.295|1.742|1.686|1.504|1.565|1.838|1.621|1.621|1.481|1.357|1.269
30+
enwikinews |2.248|2.198|1.597|1.462|1.301|1.291|1.746|1.379|1.398|1.202|1.172|1.090
31+
lineitem |0.721|0.380|0.276|0.421|0.243|0.359|0.270|0.261|0.242|0.243|0.236|0.226
32+
Shakespeare|2.182|2.044|1.481|1.646|1.349|1.245|1.584|1.295|1.293|1.204|1.220|1.185
33+
SwissProt |0.985|0.619|0.475|0.478|0.388|0.490|0.477|0.416|0.417|0.363|0.395|0.313
34+
uwm |0.553|0.382|0.315|0.368|0.278|0.426|0.310|0.259|0.274|0.240|0.254|0.228
35+
average |1.499|1.310|0.964|1.001|0.830|0.871|1.024|0.862|0.848|0.775|0.756|0.701
36+
37+
[Wratislavia XML Corpus]: http://pskibinski.pl/research/Wratislavia/
38+
39+
2240

2341
Usage
2442
-----------------
@@ -67,6 +85,18 @@ ADDITIONAL OPTIONS:
6785
```
6886

6987

88+
Compilation
89+
-------------------------
90+
For Linux/Unix:
91+
```
92+
make BUILD_SYSTEM=linux
93+
```
94+
95+
For Windows (MinGW)
96+
```
97+
make
98+
```
99+
70100

71101
Used libraries
72102
---------------------

0 commit comments

Comments
 (0)