Skip to content

Commit beea149

Browse files
committed
deploy: e1c68cc
1 parent 52f1766 commit beea149

33 files changed

+644
-66
lines changed

docs/main/_sources/quantiles/index.rst.txt

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,21 @@ in the stream.
1010
These sketches may be used to compute approximate histograms, Probability Mass Functions (PMFs), or
1111
Cumulative Distribution Functions (CDFs).
1212

13-
The library provides three types of quantiles sketches, each of which has generic items as well as versions
14-
specific to a given numeric type (e.g. integer or floating point values). All three types provide error
15-
bounds on rank estimation with proven probabilistic error distributions.
13+
The library provides four types of quantiles sketches, three of which have generic items as well as versions
14+
specific to a given numeric type (e.g. integer or floating point values). Those three types provide error
15+
bounds on rank estimation with proven probabilistic error distributions. t-digest is a heuristic-based sketch
16+
that works only on numeric data, and while the error properties are not guaranteed, the sketch typically
17+
does a good job with small storage.
1618

17-
* KLL: Provides uniform rank estimation error over the entire range
19+
* KLL: Provides uniform rank estimation error over the entire range.
1820
* REQ: Provides relative rank error estimates, which decreases approaching either the high or low end values.
21+
* t-digest: Relative rank error estimates, heuristic-based without guarantees but quite compact with generally very good error properties.
1922
* Classic quantiles: Largely deprecated in favor of KLL, also provides uniform rank estimation error. Included largely for backwards compatibility with historic data.
2023

2124
.. toctree::
2225
:maxdepth: 1
23-
26+
2427
kll
2528
req
29+
tdigest
2630
quantiles_depr

docs/main/_sources/quantiles/kll.rst.txt

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,6 @@ The analysis is obtained using `get_quantile()` function or the
1414
inverse functions `get_rank()`, `get_pmf()` (Probability Mass Function), and `get_cdf()`
1515
(Cumulative Distribution Function).
1616

17-
As of May 2020, this implementation produces serialized sketches which are binary-compatible
18-
with the equivalent Java implementation only when template parameter `T = float`
19-
(32-bit single precision values).
20-
2117
Given an input stream of `N` items, the `natural rank` of any specific
2218
item is defined as its index `(1 to N)` in inclusive mode
2319
or `(0 to N-1)` in exclusive mode
@@ -168,4 +164,3 @@ Additionally, the interval may be quite large for certain distributions.
168164
.. rubric:: Non-static Methods:
169165

170166
.. automethod:: __init__
171-
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
t-digest
2+
--------
3+
4+
.. currentmodule:: datasketches
5+
6+
The implementation in this library is based on the MergingDigest described in
7+
`Computing Extremely Accurate Quantiles Using t-Digests <https://arxiv.org/abs/1902.04023>`_ by Ted Dunning and Otmar Ertl.
8+
9+
The implementation in this library has a few differences from the reference implementation associated with that paper:
10+
11+
* Merge does not modify the input
12+
* Derialization similar to other sketches in this library, although reading the reference implementation format is supported
13+
14+
Unlike all other algorithms in the library, t-digest is empirical and has no mathematical basis for estimating its error
15+
and its results are dependent on the input data. However, for many common data distributions, it can produce excellent results.
16+
t-digest also operates only on numeric data and, unlike the quantiles family algorithms in the library which return quantile
17+
approximations from the input domain, t-digest interpolates values and will hold and return data points not seen in the input.
18+
19+
The closest alternative to t-digest in this library is REQ sketch. It prioritizes one chosen side of the rank domain:
20+
either low rank accuracy or high rank accuracy. t-digest (in this implementation) prioritizes both ends of the rank domain
21+
and has lower accuracy towards the middle of the rank domain (median).
22+
23+
Measurements show that t-digest is slightly biased (tends to underestimate low ranks and overestimate high ranks), while still
24+
doing very well close to the extremes. The effect seems to be more pronounced with more input values.
25+
26+
For more information on the performance characteristics, see `the Datasketches page on t-digest <https://datasketches.apache.org/docs/tdigest/tdigest.html>`_.
27+
28+
.. autoclass:: tdigest_float
29+
:members:
30+
:undoc-members:
31+
:exclude-members: deserialize
32+
33+
.. rubric:: Static Methods:
34+
35+
.. automethod:: deserialize
36+
37+
.. rubric:: Non-static Methods:
38+
39+
.. automethod:: __init__
40+
41+
.. autoclass:: tdigest_double
42+
:members:
43+
:undoc-members:
44+
:exclude-members: deserialize
45+
46+
.. rubric:: Static Methods:
47+
48+
.. automethod:: deserialize
49+
50+
.. rubric:: Non-static Methods:
51+
52+
.. automethod:: __init__

docs/main/_static/pygments.css

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
66
.highlight .hll { background-color: #ffffcc }
77
.highlight { background: #f8f8f8; }
88
.highlight .c { color: #3D7B7B; font-style: italic } /* Comment */
9-
.highlight .err { border: 1px solid #FF0000 } /* Error */
9+
.highlight .err { border: 1px solid #F00 } /* Error */
1010
.highlight .k { color: #008000; font-weight: bold } /* Keyword */
11-
.highlight .o { color: #666666 } /* Operator */
11+
.highlight .o { color: #666 } /* Operator */
1212
.highlight .ch { color: #3D7B7B; font-style: italic } /* Comment.Hashbang */
1313
.highlight .cm { color: #3D7B7B; font-style: italic } /* Comment.Multiline */
1414
.highlight .cp { color: #9C6500 } /* Comment.Preproc */
@@ -25,34 +25,34 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
2525
.highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
2626
.highlight .gs { font-weight: bold } /* Generic.Strong */
2727
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
28-
.highlight .gt { color: #0044DD } /* Generic.Traceback */
28+
.highlight .gt { color: #04D } /* Generic.Traceback */
2929
.highlight .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
3030
.highlight .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
3131
.highlight .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
3232
.highlight .kp { color: #008000 } /* Keyword.Pseudo */
3333
.highlight .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
3434
.highlight .kt { color: #B00040 } /* Keyword.Type */
35-
.highlight .m { color: #666666 } /* Literal.Number */
35+
.highlight .m { color: #666 } /* Literal.Number */
3636
.highlight .s { color: #BA2121 } /* Literal.String */
3737
.highlight .na { color: #687822 } /* Name.Attribute */
3838
.highlight .nb { color: #008000 } /* Name.Builtin */
39-
.highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */
40-
.highlight .no { color: #880000 } /* Name.Constant */
41-
.highlight .nd { color: #AA22FF } /* Name.Decorator */
39+
.highlight .nc { color: #00F; font-weight: bold } /* Name.Class */
40+
.highlight .no { color: #800 } /* Name.Constant */
41+
.highlight .nd { color: #A2F } /* Name.Decorator */
4242
.highlight .ni { color: #717171; font-weight: bold } /* Name.Entity */
4343
.highlight .ne { color: #CB3F38; font-weight: bold } /* Name.Exception */
44-
.highlight .nf { color: #0000FF } /* Name.Function */
44+
.highlight .nf { color: #00F } /* Name.Function */
4545
.highlight .nl { color: #767600 } /* Name.Label */
46-
.highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
46+
.highlight .nn { color: #00F; font-weight: bold } /* Name.Namespace */
4747
.highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */
4848
.highlight .nv { color: #19177C } /* Name.Variable */
49-
.highlight .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
50-
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
51-
.highlight .mb { color: #666666 } /* Literal.Number.Bin */
52-
.highlight .mf { color: #666666 } /* Literal.Number.Float */
53-
.highlight .mh { color: #666666 } /* Literal.Number.Hex */
54-
.highlight .mi { color: #666666 } /* Literal.Number.Integer */
55-
.highlight .mo { color: #666666 } /* Literal.Number.Oct */
49+
.highlight .ow { color: #A2F; font-weight: bold } /* Operator.Word */
50+
.highlight .w { color: #BBB } /* Text.Whitespace */
51+
.highlight .mb { color: #666 } /* Literal.Number.Bin */
52+
.highlight .mf { color: #666 } /* Literal.Number.Float */
53+
.highlight .mh { color: #666 } /* Literal.Number.Hex */
54+
.highlight .mi { color: #666 } /* Literal.Number.Integer */
55+
.highlight .mo { color: #666 } /* Literal.Number.Oct */
5656
.highlight .sa { color: #BA2121 } /* Literal.String.Affix */
5757
.highlight .sb { color: #BA2121 } /* Literal.String.Backtick */
5858
.highlight .sc { color: #BA2121 } /* Literal.String.Char */
@@ -67,9 +67,9 @@ span.linenos.special { color: #000000; background-color: #ffffc0; padding-left:
6767
.highlight .s1 { color: #BA2121 } /* Literal.String.Single */
6868
.highlight .ss { color: #19177C } /* Literal.String.Symbol */
6969
.highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */
70-
.highlight .fm { color: #0000FF } /* Name.Function.Magic */
70+
.highlight .fm { color: #00F } /* Name.Function.Magic */
7171
.highlight .vc { color: #19177C } /* Name.Variable.Class */
7272
.highlight .vg { color: #19177C } /* Name.Variable.Global */
7373
.highlight .vi { color: #19177C } /* Name.Variable.Instance */
7474
.highlight .vm { color: #19177C } /* Name.Variable.Magic */
75-
.highlight .il { color: #666666 } /* Literal.Number.Integer.Long */
75+
.highlight .il { color: #666 } /* Literal.Number.Integer.Long */

docs/main/distinct_counting/cpc.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>Compressed Probabilistic Counting (CPC) &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

docs/main/distinct_counting/hyper_log_log.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>HyperLogLog (HLL) &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

docs/main/distinct_counting/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>Distinct Counting &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

docs/main/distinct_counting/theta.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>Theta Sketch &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

docs/main/distinct_counting/tuple.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>Tuple Sketch &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

docs/main/frequency/count_min_sketch.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
99
<title>CountMin Sketch &mdash; datasketches 0.1 documentation</title>
10-
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=80d5e7a1" />
10+
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=b86133f3" />
1111
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=e59714d7" />
1212

1313

0 commit comments

Comments
 (0)