Skip to content

Releases: openzipkin/zipkin

Zipkin 2.2

11 Oct 08:27
Compare
Choose a tag to compare

Zipkin 2.2 focuses on operations, allowing proxy-mounting the UI and bundles a Prometheus Grafana dashboard

@stepanv modified the zipkin UI such that it can work behind reverse proxies which choose a different path prefix than '/zipkin'. If you'd like to try zipkin under a different path, Stepan wrote docs showing how to setup apache http.

Previously, zipkin had both spring and prometheus metrics exporters. Through hard work from @abesto and @kristofa, we now have a comprehensive example setup including a Zipkin+Prometheus Grafana dashboard. To try it out, use our docker-compose example, which starts everything for you. Once that's done, you can start viewing the health of your tracing system, including how many messages are dropped.

Here's an example, which you'd see at http://192.168.99.100:3000/dashboard/db/zipkin-prometheus?refresh=5s&orgId=1&from=now-5m&to=now if using docker-machine:

screen shot 2017-10-11 at 4 26 51 pm

Other notes

  • our docker JVM has been upgraded to 1.8.0_144 from 1.8.0_131
  • the zipkin-server no longer writes log messages about drop messages at warning level as it can fill up disk. Enable debug logging to see the cause of drops
  • elasticsearch storage will now drop on backlog as opposed to backing up, as the latter led to out-of-memory crashes under load surges.

Finally, please join us on gitter if you have any questions or feedback about Zipkin 2.2

Zipkin 2.1

03 Oct 00:48
Compare
Choose a tag to compare

Thanks to @shakuzen, zipkin 2.1 adds RabbitMQ to the available span transports.

RabbitMQ has been requested many times, though we only started formally tracking it this year. A lot of interest grew from spring-cloud-sleuth which supported a custom RabbitMQ transport. Starting with Zipkin 2.1, RabbitMQ support is built-in to zipkin-server (though custom deployments can remove it).

Using this is easy, just set RABBIT_ADDRESSES to a comma-separated list of rabbit hosts.. if playing around, you can use localhost:

$ RABBIT_ADDRESSES=localhost java -jar zipkin.jar

More documentation is available here.

Once a server is running applications send spans to rabbit, specifically to the queue/routing key associated with zipkin (defaults to "zipkin"). You can post a test trace using normal CLI while you wait for tracers to support RabbitMQ transport.

$ echo '[{"traceId":"9032b04972e475c5","id":"9032b04972e475c5","kind":"SERVER","name":"get","timestamp":1505990621526000,"duration":612898,"localEndpoint":{"serviceName":"brave-webmvc-example","ipv4":"192.168.1.113"},"remoteEndpoint":{"serviceName":"","ipv4":"127.0.0.1","port":60149},"tags":{"error":"500 Internal Server Error","http.path":"/a"}}]' > sample-spans.json
$ rabbitmqadmin publish exchange=amq.default routing_key=zipkin < sample-spans.json

Many thanks to @shakuzen for driving this feature. There's a lot more work than just coding when we add a new default feature. Evenings and weekend time from Tommy are gratefully received.

Zipkin 2

12 Sep 16:58
Compare
Choose a tag to compare

In version 1.31, we introduced our v2 http api, availing dramatically simplified data types. Zipkin 2 is the effort to move all infrastructure towards that model, while still remaining backwards compatible.

What's new?

The core java library (under the package zipkin2) has model, codec and storage types. This includes a bounded in-memory storage component used in test environments.

The following artifacts are new and can coexist with previous ones.

  • io.zipkin.zipkin2:zipkin:2.0.0 < core library
  • io.zipkin.zipkin2:zipkin-storage-elasticsearch:2.0.0 < first v2 native storage driver

Note: If you are using io.zipkin.java:zipkin and io.zipkin.zipkin2:zipkin, use version 2.0.0 (or later) for both as we still maintain the old libraries.

What's next?

There are a few storage implementations in-flight and some may port to the new libraries. Next, we will add a v2 native transport library and work on a Spring Boot 2 based server. Expect incremental progress along the way. Please join us on gitter if you have ideas!

The server itself is still the same

Note: if you are only using or configuring Zipkin, there's little impact. Zipkin server hasn't changed, you just upgrade it. If you have java tracing setup, read the below. Otherwise, you are done unless you want extra details.

Changing java applications to use Zipkin v2 format

Java applications often use the zipkin-reporter project directly or indirectly to send data to Zipkin collectors. Our version 2 json format is smaller and measurably more efficient.

Once you've upgraded your Zipkin servers, opt-into the version 2 format like this:
Ex:

   /** Configuration for how to send spans to Zipkin */
   @Bean Sender sender() {
-    return OkHttpSender.create("http://your_host:9411/api/v1/spans");
+    return OkHttpSender.json("http://your_host:9411/api/v2/spans");
   }
 
   /** Configuration for how to buffer spans into messages for Zipkin */
-  @Bean Reporter<Span> reporter() {
-    return AsyncReporter.builder(sender()).build();
+  @Bean Reporter<Span> spanReporter() {
+    return AsyncReporter.v2(sender()).build();
   }

If you are using Brave directly, you can stick the v2 reporter here:

     return Tracing.newBuilder()
-        .reporter(reporter()).build();
+        .spanReporter(spanReporter())

If you are using Spring XML, the related change looks like this:

-  <bean id="sender" class="zipkin.reporter.okhttp3.OkHttpSender" factory-method="create"
+  <bean id="sender" class="zipkin.reporter.okhttp3.OkHttpSender" factory-method="json"
       destroy-method="close">
-    <constructor-arg type="String" value="http://localhost:9411/api/v1/spans"/>
+    <constructor-arg type="String" value="http://localhost:9411/api/v2/spans"/>
   </bean>
 
   <bean id="tracing" class="brave.spring.beans.TracingFactoryBean">
     <property name="reporter">
       <bean class="brave.spring.beans.AsyncReporterFactoryBean">
+        <property name="encoder" value="JSON_V2"/>

What's new in the Zipkin v2 library

Zipkin v2 libraries are under the zipkin2 java package and the io.zipkin.zipkin2 maven group ID. The core library has a few changes, which mostly cleanup or pare down features we had before. Here are some highlights:

Span now uses validated strings as opposed to parsed objects

Our new json encoder is 2x as fast as prior due to factors including a validation approach. For example, before we used the java long type to represent a 64-bit ID and a 32-bit integer to represent an ipv4 address. Most of the time, IDs are and IPs are transmitted and stored as strings. This resulted in needless expensive conversions. By switching to this, using other serialization libraries is easier, too, as you don't need custom type converters.

Ex.

-  Endpoint.builder().serviceName("tweetie").ipv4(192 << 24 | 168 << 16 | 1).build());
+  Endpoint.newBuilder().serviceName("tweetie").ip("192.168.0.1").build());

protip: if you have an old endpoint, you can do endpoint.toV2() on it!

Span now uses auto-value instead of public final fields

We originally had public final fields for our model types (borrowing from square wire style). This has a slight glitch which is that data transformations can't use method references (as fields aren't methods!). This is cleaned up now.

-    assertThat(spans).extracting(s -> s.duration)
+    assertThat(spans).extracting(Span::duration)

Asynchronous operations are now cancelable

Most will not make custom Zipkin servers, but those making storage or transport plugins have a cleaner api.

Borrowing heavily from Square Retrofit and OkHttp, Zipkin storage interfaces return a Call object, which represents a single unit of work, such as storing spans. This provides means to either synchronously invoke the command, pass a callback, or compose with your favorite library. Unlike before, calls are cancelable.

For example, before, if you wanted to write integration tests that synchronously invoke storage, you'd need to play callback games. These are gone.

-    CallbackCaptor<Void> callback = new CallbackCaptor<>();
-    storage().asyncSpanConsumer().accept(spans, callback);
-    callback.get();
+   storage.spanConsumer().accept(spans).execute();

As an implementor, the whole thing is simpler especially combined with validated string IDs

-  @Override public void getTrace(long traceIdHigh, long traceIdLow, Callback<List<Span>>) {
-    String traceIdHex = Util.toLowerHex(traceIdHigh, traceIdLow);
+  @Override public Call<List<Span>> getTrace(String traceId) {

(json) Codec libraries are cleaned up

We've introduced SpanBytesEncoder and SpanBytesDecoder instead of the catch-all Codec type from v1. When writing zipkin-reporter, we noticed that almost all applications do not need decode logic (as they simply serialize and send out of process). For those writing data to Zipkin, we can serialize either the old format or the new with SpanBytesEncoder.JSON_V1 or SpanBytesEncoder.JSON_V2 accordingly. It is important to note that writing v1 format does not require a version 1.X jar in your classpath.

Zipkin 1.30

27 Mar 06:05
Compare
Choose a tag to compare

Zipkin 1.30 accepts a new simplified json format on all major transports including http, Kafka, SQS, Kinesis, Azure Event Hub and Google Stackdriver.

The primary goal of this format is making Zipkin data easier to understand and simpler for folks to write. A dozen folks in Zipkin have vetted ideas on this format for over a year. We took it seriously because we don't want to bother you with a format unless it will last years. Thanks especially to @bplotnick @basvanbeek and @mansu for donating time recently towards vetting final details.

Here's an example curl command that uploads json representing a server operation:

# make epoch seconds epoch microseconds, because.. microservices!
$ date +%s123456
1502677917123456
$ curl -s localhost:9411/api/v2/spans -H'Content-Type: application/json' -d'[{
  "traceId": "86154a4ba6e91387",
  "id": "86154a4ba6e91387",
  "kind": "SERVER",
  "name": "get",
  "timestamp": 1502677917123456,
  "duration": 207000,
  "localEndpoint": {
    "serviceName": "hamster-wheel",
    "ipv4": "113.210.108.10"
  },
  "remoteEndpoint": {
    "ipv4": "77.12.22.11"
  },
  "tags": {
    "http.path": "/api/hamsters",
    "http.status_code": "302"
  }
}]'

The above says a lot with a little: the server's identifier in discovery (hamster-wheel), the http route and the client IP (likely from X-Forwarded-For or similar). This request took 207ms in the server and resulted in a redirect.

We released collector-side ahead of client/reporter-side, so that folks can roll-out version upgrades ahead of demand. That said, there are already work in progress using this, like census and @flier's c/c++ tracer so update to the most recent patch release as soon as you can!

If you are interested more in this format, check out the newly polished OpenApi spec, or a go client example compiled from it (thx @devinsba). If you have further questions, hop on https://gitter.im/openzipkin/zipkin

Next releases will formalize more including "zipkin2" java types for those who need it. That said, one nice thing about the new format is that it is easy enough for normal json tools to manage. Regardless, keep eyes open for more and thanks for the interest.

Zipkin 1.29

07 Aug 09:32
Compare
Choose a tag to compare

Zipkin 1.29 models messaging spans, shows errors in the service graph and supports Elasticsearch 6

Message tracing

Producing and consuming messages from a broker, such as RabbitMQ or Kafka, is similar but different than one-way RPC. For example, one message can have multiple consumers, and many times the producer of the message can't know if this will be the case. Also, and particularly in Kafka, consuming a message is often completely decoupled from processing of it, and consumption may happen in bulk.

Through community discussion, notably advice from @bogdandrutu from Census, we reached this conclusion for message tracing with Zipkin:

  • Messaging consumers should always be a child span of the producing span (and not a linked trace)
    • If using B3, this means X-B3-SpanId is the parent of the consumer span
  • "ms" and "mr" annotate message send and receive events
    • span2 format replaces these with Span.Kind.PRODUCER, CONSUMER
  • If producer and consumer spans include duration, it should only reflect local batching delay
    • time spent processing a message should be in a different child span

There are diagrams of how instrumentation work with this model on the website. You can also look at @ImFlog's Kafka 0.11 tracing work in progress. If you have more questions or want to share your work, contact us on gitter.

Visualizing error count between services

Thanks to @hfgbarrigas' initial work, and lots of review support by @shakuzen,
we now have errorCount on dependency links, indicating how many of callCount
between services were in error.

MySQL users who want this need to add the error_count column:

alter table zipkin_dependencies add `error_count` BIGINT

The UI is relatively simple, coloring the line yellow when 50% or more calls are in error, and red when 75%. These rates can be overridden or disabled with configuration.

Example link detail screen
screen shot 2017-07-28 at 6 25 44 pm

Example of when >50% of calls are in error
screen shot 2017-07-28 at 6 25 35 pm

Example of when >75% of calls are in error
screen shot 2017-07-28 at 6 25 08 pm

Trace instrumentation's contract is easy: add the "error" tag, for example on http 500. When aggregating links, the value of the "error" tag isn't important. Please update to latest versions of instrumentation if you don't see errors, yet. For example, zipkin-ruby recently support this thanks to @jcarres-mdsol.

Elasticsearch 6

Currently, Elasticsearch uses one index for all types: spans, dependencies (and a special service name index). Elasticsearch 6 no longer supports multiple types per index. Instead we write separate indexes for span and dependency links when Elasticsearch 6 is detected. Incidentally, we also use the new span2 json format, which is simplified and more efficient.

The next version will support the same single-type indexing with Elasticsearch 2.4+. If you can't wait that long, look at #1674 for the experimental flag you can use today.

Thanks to @anuraaga @ImFlog @xeraa and @jcarres-mdsol for advice and support leading to this feature. The next release will thank those who test it!

Zipkin 1.28

06 Jul 06:15
Compare
Choose a tag to compare

Zipkin 1.28 bounds the in-memory storage component

Since the rewrite, we've always had a way to start zipkin without any storage service dependency. This is great for running examples, unit tests, or ad-hoc tests. It wasn't good for tests in more persistent environments like Kubernetes as eventually the memory would blow up and we'd recommend people to use something else. It also wasn't good for short tests that take a lot of traffic for the same reason.

Initially, we were hesitant to add features that might end up as people accidentally going prod with our in-memory storage. However, many people asked about this, usually after something blew-up in test: We realized bounding the memory provider was indeed worthwhile. Thanks to hard work and tuning by @joel-airspring, the default server now starts and won't likely blow up if you send a lot of traffic to it.

So, now you can play around and zipkin will just drop old traces to make room for new ones.

# run with self-tracing enabled, so each api hit is traces, and max-spans set lower than 500000 spans (default)
$ SELF_TRACING_ENABLED=true java -Dzipkin.storage.mem.max-spans=500 -jar ./zipkin-server/target/zipkin-server-*exec.jar
# in another window, do this for a while
$ while true; do curl -s localhost:9411/api/v1/services;done
# then, check to see the span count is less than or equal to what you set it to: <=500
$ curl -s localhost:9411/api/v1/traces?limit=1000000|jq '.[]|.[]|.id'|wc -l

Please note this option can likely break under certain types of load, so please don't consider the in-memory provider production-grade, or on a path to be the latest data grid! If you are interested in an in-memory storage option for production, you might consider upvoting Hazelcast, noting you want it to work embedded.

Zipkin 1.27

30 Jun 08:41
Compare
Choose a tag to compare

Zipkin 1.27 moves the UI under the path /zipkin, allows listening on multiple Kafka topics and improves Cassandra 3 support.

The Zipkin UI was formerly served from an unmodified server as the base path. We've had folks ask for a year in various ways to have this under a subpath instead. We decided to move the UI under /zipkin as it matched most users' requirements and was easiest for our single-page app to route. Thanks to @eirslett @danielkwinsor and @neilstevenson for help with implementation and testing.

We recently added Kafka 0.10 support. This version includes the ability to listen on multiple topics, something you might do if you have environments where spans come from different sources. Thanks to @danielkwinsor for implementation and @dgrabows for review, we now support this by simply comma-delimiting the topic. Note: there are some gotchas if you are considering migrating from Kafka 0.8 to 0.10. Thanks to @fedj for noting something you might run into.

Some of you may using the experimental "cassandra3" storage type. We had a serious glitch @llinder found where blocking could occur on a query depending on the count of results retuned. Not only did Lance fix the glitch, but also added testcontainers to ensure clean, docker-based integration tests run on every PR.

Finally, Zipkin 1.27 fixes a number of broken windows. Thanks @NithinMadhavanpillai for adding a test to help us fix a bad data bug parsing dependencies, @fgcui1204 for finding out why service names sometimes cut off in the UI, @ImFlog for backfilling docs about how ports can be specified in cassandra and @joel-airspring for fixing a few distracting glitches in our build.

Zipkin 1.26

18 May 05:27
Compare
Choose a tag to compare

Thanks to @dgrabows, Zipkin 1.26 now supports Kafka 0.10. Notably, this allows you to run without a ZooKeeper dependency. (Recent versions of Kafka no longer require consumers to connect to ZooKeeper)

Our docker image will automatically use this, if the variable KAFKA_BOOTSTRAP_SERVERS is set instead of KAFKA_ZOOKEEPER. An example docker setup is available here.

While you do not need to upgrade your instrumented apps, you can choose to opt-in by using libraries such as our kafka10 sender.

Thanks again for the comprehensive work by @dgrabows and review feedback by @StephenWithPH

Zipkin 1.25

16 May 12:56
Compare
Choose a tag to compare

Zipkin 1.25 lets you to disable the query api when deploying collector-only services. It also lets you log http requests sent to Elasticsearch. Finally, it fixes a bug where a non-default MySQL schema would fail health checks.

Disabling the UI and Query api for collector-only servers

@SirTyro's security team wants collectors deployed separately, in a way that reduces exposure if compromised. You can now disable the api and UI by setting QUERY_ENABLED=false. Thanks to @shakuzen for help implementing this.

Understanding Zipkin's requests to Elasticsearch

Reflecting on a troubleshooting session with @ezraroi, we could have used more data to understand why an Elasticsearch index template was missing. This would have saved us time. You can now set ES_HTTP_LOGGING=BASIC to see what traffic is sent from zipkin to Elasticsearch. Other options include HEADER and BODY. Thanks to OkHttp for the underlying interceptor that does this.

Fixed health check when you have a non-default MySQL schema

@zhanglc stumbled upon a bug where the health check misreported a service unhealthy if it had a non-default schema. This is now fixed.

Zipkin 1.24

06 May 03:26
Compare
Choose a tag to compare

Zipkin 1.24 enables search by binary annotation (aka tag) key. It also adds a helper for those parsing IP addresses. Finally, it fixes a bug where server-side service names weren't indexed.

Search by binary annotation (aka tag) key

Before, you could only search by exact key=value match on binary annotations (aka tags).
Thanks to @kellabyte for noticing we can gain a lot of value by allowing search on tag
key. Ex "error" search now returns any traces that include an error, regardless of the
message.

This change is now in, and here's the impact:

  • Cassandra will index a bit more: once per unique service/tag key
  • Elasticsearch now does two nested queries when looking for a key
  • MySQL now considers all annotation rows when looking for a key

Helps tracers be more safe about IP Addresses

Before, tracers including Brave and Finagle blindly assumed addresses
and strings were IPv4. While that's usually the case, it can lead to
very late problems, such as runtime exceptions.

Zipkin 1.24 adds a utility to encourage safe parsing practice of potentially
null inputs. This re-uses code from guava (without adding a dependency),
avoiding troublesome IP from name service lookups.

Ex. if your input is an HttpServletRequest, the following is safe:

if (!builder.parseIp(input.getHeader("X-Forwarded-For"))) {
  builder.parseIp(input.getRemoteAddr());
}

Fixed mid-tier service name indexing

@garyd203 stumbled upon a bug where we weren't indexing mid-tier service names.
Basically, you couldn't search for a service that wasn't itself a client of something else.
Surprisingly, this affected all data stores. Lots of thanks to Gary for writing the test,
which made implementation a breeze.