Skip to content

Releases: openzipkin/zipkin

Zipkin 1.10

12 Sep 12:24
Compare
Choose a tag to compare

Zipkin 1.10 addresses a couple long-term problems relating to span timestamp and duration.

Firstly, we no longer attempt to support duration queries on the "cassandra" storage type. Cassandra 2.2+ doesn't support SASI indexing, and trying to work around that resulted in a feature most couldn't use. @michaelsembwever from The Last Pickle has a more sustainable solution in mind that uses Cassandra 3.8+. Please look for announcements on the experimental cassandra3 storage type.

Next is something that applies to all storage types. When trace instrumentation don't record Span.timestamp and duration, the Zipkin server tries to guess by looking at annotations. Previously, when we guessed wrong, the trace would render strangely. We now guess much more conservatively so as to avoid this.

Here's the impact:

  • Span duration is no longer derived by collectors, as it is often wrong. Duration queries won't work unless traces reported to zipkin include duration.
  • Span timestamp is derived only when needed, usually to support indexing
  • Span timestamp and duration are still backfilled at query time, as otherwise the UI wouldn't work.

Note: The Span.timestamp and duration fields were added a year ago, but many tracers still don't record them. We hope our documentation on how to record timestamp and duration will help ease the task of updating them. If you use a tracer that doesn't yet record Span.timestamp and duration, please raise an issue or PR to the corresponding repository so that it is eventually fixed.

Zipkin 1.8

31 Aug 02:06
Compare
Choose a tag to compare

Zipkin 1.8 is a library change focused on encoding performance. If you are instrumenting apps and use Zipkin's Codec, you'll want to upgrade.

Span encoding has been completely rewritten in order to get common-case overhead in microsecond or less range.

Zipkin 1.7 Codec.writeSpan() vs libthrift (pace car)

CodecBenchmarks.writeClientSpan_json_zipkin       avgt   15  17.131 ± 0.446  us/op
CodecBenchmarks.writeClientSpan_thrift_libthrift  avgt   15   1.952 ± 0.043  us/op
CodecBenchmarks.writeClientSpan_thrift_zipkin     avgt   15   0.996 ± 0.021  us/op
CodecBenchmarks.writeLocalSpan_json_zipkin        avgt   15  10.124 ± 0.177  us/op
CodecBenchmarks.writeLocalSpan_thrift_libthrift   avgt   15   1.168 ± 0.016  us/op
CodecBenchmarks.writeLocalSpan_thrift_zipkin      avgt   15   0.593 ± 0.010  us/op
CodecBenchmarks.writeRpcSpan_json_zipkin          avgt   15  43.495 ± 1.086  us/op
CodecBenchmarks.writeRpcSpan_thrift_libthrift     avgt   15   4.878 ± 0.046  us/op
CodecBenchmarks.writeRpcSpan_thrift_zipkin        avgt   15   2.666 ± 0.018  us/op
CodecBenchmarks.writeRpcV6Span_json_zipkin        avgt   15  49.759 ± 0.867  us/op
CodecBenchmarks.writeRpcV6Span_thrift_libthrift   avgt   15   5.390 ± 0.073  us/op
CodecBenchmarks.writeRpcV6Span_thrift_zipkin      avgt   15   3.147 ± 0.026  us/op

Zipkin 1.8 Codec.writeSpan() vs libthrift (pace car)

CodecBenchmarks.writeClientSpan_json_zipkin       avgt   15   1.445 ± 0.036  us/op
CodecBenchmarks.writeClientSpan_thrift_libthrift  avgt   15   1.951 ± 0.014  us/op
CodecBenchmarks.writeClientSpan_thrift_zipkin     avgt   15   0.433 ± 0.011  us/op
CodecBenchmarks.writeLocalSpan_json_zipkin        avgt   15   0.813 ± 0.010  us/op
CodecBenchmarks.writeLocalSpan_thrift_libthrift   avgt   15   1.191 ± 0.016  us/op
CodecBenchmarks.writeLocalSpan_thrift_zipkin      avgt   15   0.268 ± 0.004  us/op
CodecBenchmarks.writeRpcSpan_json_zipkin          avgt   15   3.606 ± 0.068  us/op
CodecBenchmarks.writeRpcSpan_thrift_libthrift     avgt   15   5.134 ± 0.081  us/op
CodecBenchmarks.writeRpcSpan_thrift_zipkin        avgt   15   1.384 ± 0.078  us/op
CodecBenchmarks.writeRpcV6Span_json_zipkin        avgt   15   3.912 ± 0.115  us/op
CodecBenchmarks.writeRpcV6Span_thrift_libthrift   avgt   15   5.488 ± 0.098  us/op
CodecBenchmarks.writeRpcV6Span_thrift_zipkin      avgt   15   1.323 ± 0.014  us/op

Why encoding speed matters

Applications that report to Zipkin typically record timing information and metadata on the calling thread. After the operation completes, this is encoded into a Span and scheduled to go out of process, usually via http or Kafka. When the encoding overhead is measurable, it can confuse timing information, particularly when operations are in single-digit or less milliseconds.

For example, if a local operation takes 400us, and your encoding overhead is 40us, there will be a 10% gap between the end of one span and the start of the next. This will notably skew the duration of the parent, particularly if there are a lot of spans like this. When encoding overhead in single-digit microseconds or less, this problem is less noticeable.

Zipkin 1.7

18 Aug 13:32
Compare
Choose a tag to compare

Zipkin 1.7 has a lot to offer, thanks to users for telling us what they'd like.

@dragontree101 wanted to be able to know which version of zipkin his server was running. @shakuzen landed the /info endpoint, which prints out something like this:

{
  "zipkin": {
    "version": "1.7.0"
  }
}

@mikewrighton wants to run zipkin-ui from a different host than zipkin-server. @hyleung spiked a new variable you can use to control cross-origin policy. For example, you can export ZIPKIN_QUERY_ALLOWED_ORIGINS=http://foo.bar.com, if you are the lucky owner of foo.bar.com!

@dan-tr uses Zipkin with Elasticsearch, but found our microsecond timestamps didn't work out-of-box with Kibana. He suggested we add a field timestamp_millis, and we did! because it was a smart idea.

@ivansenic works on an APM called inspectIT. He rightly noted there's still a ton of Java 6 VMs out there that need to be traceable by Java agents. Now, zipkin.jar is an agent-friendly, 152k jar full of Java 6 bytecode (still with no dependencies!).

We're occasionally asked where javadocs are published. Thanks to @abesto's automation expertise, historical javadocs can now be found here http://zipkin.io/zipkin/

Finally, we're looking for incremental and compatible ways to improve zipkin's model, particularly for asynchronous activity (like tracing Kafka). If you are interested in steering us, please comment on..

Thanks for keeping with us,
OpenZipkin

Zipkin 1.6

31 Jul 21:07
Compare
Choose a tag to compare

Zipkin 1.6 server has been updated to use Spring Boot 1.4.

We've also corrected default values around the UI, which should lead to better search performance. Most notably, startTs defaults to 1 hour back instead of 7 days back. #1212

  • Note: You can reset the lookback value to whatever you like. For example, you might set JAVA_OPTS="-Dzipkin.ui.default-lookback=86400000" for 1 day. Settings like this are documented in the README

Zipkin 1.5

23 Jul 03:21
Compare
Choose a tag to compare

Zipkin 1.5 is all about the dependency view in the UI.

Many of you may have seen the dependency tab, and never any data in it. This would be the case if you were running Cassandra or Elasticsearch.

screen shot 2016-07-23 at 10 58 21 am

What you should have seen is a diagram showing the relative amount of calls between services, something like this (except with your services present!):

screen shot 2016-07-23 at 11 05 24 am

Zipkin 1.5 includes support to populate the data under this screen for all storage options (mysql, cassandra and elasticsearch).

The job that produces this data is called zipkin-dependencies. Zipkin Dependencies aggregates links between services into a daily bucket. This means you should run it daily, like a batch job (eventhough underneath it is spark). In fact, our docker image includes cron setup to do that for you!

For example, here's a run against a small cassandra DB using spark standalone (default):

$ STORAGE_TYPE=cassandra CASSANDRA_CONTACT_POINTS=192.168.99.100 java -jar zipkin-dependencies.jar
Running Dependencies job for 2016-07-23: 1469232000000000 ≤ Span.timestamp 1469318399999999
11:05:09.653 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11:05:09.706 [main] WARN  org.apache.spark.util.Utils - Your hostname, acole resolves to a loopback address: 127.0.0.1; using 192.168.1.10 instead (on interface en0)
11:05:09.706 [main] WARN  org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address
11:05:11.078 [main] WARN  com.datastax.driver.core.NettyUtil - Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
Saved with day=2016-07-23
Dependencies: [{"parent":"brave-resteasy-example","child":"brave-resteasy-example","callCount":1}, {"parent":"zipkin-server","child":"cassandra","callCount":14}]

Upgrading

If you are using cassandra or elasticsearch, you should upgrade to zipkin 1.5, but there's no schema-related change required.

If you are using mysql, you'll need to add a new table for this to work. Here's a copy/paste of the DDL for your convenience.

CREATE TABLE IF NOT EXISTS zipkin_dependencies (
  `day` DATE NOT NULL,
  `parent` VARCHAR(255) NOT NULL,
  `child` VARCHAR(255) NOT NULL,
  `call_count` BIGINT
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;

ALTER TABLE zipkin_dependencies ADD UNIQUE KEY(`day`, `parent`, `child`);

Credits

The spark job was originally written by @yurishkuro, based on a hadoop job originally written by @eirslett years ago. IOTW, the job itself isn't new, rather the accessibility of it. Before, it only worked with cassandra and wasn't published to maven central or integrated with docker. Now, it should be easy for anyone to include this functionality into their deployment.

Zipkin 1.4

13 Jul 07:17
Compare
Choose a tag to compare

Zipkin 1.4 most notably includes the ability to store and show IPv6 addresses associated with services.

Endpoint.ipv6

Zipkin span data can now include an ipv6 address of an Endpoint, binary encoded in thrift or text-encoded in json. If using MySQL, you need to add a column to store this. No action is needed in Cassandra or Elasticsearch. See #1178

Operational Improvements

  • Adds SCRIBE_ENABLED: set to false to disable scribe
  • Adds SELF_TRACING_SAMPLE_RATE: set to a low value like 0.001 to safely self-trace production

Zipkin 1.3

10 Jul 08:11
Compare
Choose a tag to compare

Zipkin 1.3 includes highlighting of spans in error state and improvements to the Cassandra storage component.

Error annotations

Inspired by recent work in OpenTracing, we've added a new annotation "error". When an annotation value, this indicates when a potentially transient error occurred. When a binary annotation key, the value is a human readable message associated with a error resulting in a failed span. See #1140 for details.

Thanks to @virtuald the UI acts according to these rules, highlighting degraded spans yellow, and failed ones red.

trace
Instrumentation (like Brave, zipkin-tracer etc) need to change to support this. Please help if you have time!

Span.timestamp, duration 0 coerce to null

We've noticed some instrumentation log invalid timestamp and duration of 0, when they meant to log null. Timestamp or duration of 0 microseconds are invalid or don't explain latency. We now coerce these 0s to null. For cases where a sub-microsecond span duration occurred, you should round up to 1. See #1155 and #1176

Elasticsearch daily bucket fix

We found and fixed a concurrency bug that could put spans into the wrong daily buckets. See #1175

Cassandra

Schema bug fix

We found a bug where traces against the same service in the same millisecond weren't indexed. This affects indexes only (trace data itself wasn't lost). For example, you might find a trace that exists in cassandra, but you can't query it using the api.

Specifically, the following indexes now have trace_id added to their PRIMARY_KEY definitions.

  • service_span_name_index
  • service_name_index
  • annotations_index

There's no automatic data migration available. The most straight-forward way to address this in an existing cluster is to drop the following indexes and restart a zipkin server (which will recreate them as long as CASSANDRA_ENSURE_SCHEMA=true). You can also update the indexes manually based on the schema

Tuning

We've done a lot of work tuning the amount of data written to indexes on a per-span basis. Those using Cassandra should see a significant drop in index size due to reasons documented in the tuning section of the README.

Query logging

Those supporting zipkin may need to debug query latency. We now use the QueryLogger which is enabled when the log category "com.datastax.driver.core.QueryLogger" is at debug or trace level. Trace level includes bound values. See #1156

Zipkin 1.2

29 Jun 05:52
Compare
Choose a tag to compare

Zipkin 1.2.1 includes Prometheus metrics and Elasticsearch bug fixes.

Prometheus metrics are enabled by default, under the /prometheus endpoint.

Many thanks to Kristian from Iterate for developing this feature!

1.1.5

21 Jun 14:35
Compare
Choose a tag to compare

This is a patch release that fixes a bug where json received with optional fields set to null failed to parse. You should update to this patch, particularly if your apps are using the zipkin ruby gem.

See #1136 for details

Zipkin 1.1.4

04 Jun 03:21
Compare
Choose a tag to compare

This is a patch release that fixes a bug where CASSANDRA_ENSURE_SCHEMA didn't work when the keyspace was absent. See #1128 for details