Skip to content
Xin Xu edited this page Nov 6, 2015 · 8 revisions

This guide will help you get started and acquainted with Pulsar Reporting. We assume you already have a working Pulsar Pipeline and data is flowing to Kafka. If not, please refer to Pulsar Realtime Analytics to get Pulsar Pipeline up and run.

Pulsar Reporting components

Pulsar Reporting is composed of three major components:

  • Pulsar Reporting API:Provides an abstract data access layer on top of metric stores. Supports both SQL and structured JSON queries.
  • Pulsar Reporting UI: Gets the data from the Pulsar Reporting API, builds different charts and displays them in the browser.
  • [Druid Kafka Extension]: Optional Druid Kafka extension to replace the Druid Kafka firehore and avoid Kafka rebalance issues.

Deployment

Pulsar Reporting is a very light framework. Pulsar Reporting API can be deployed to any Java web containers. Pulsar Reporting UI is a pure Angular.JS application, it can be deployed to Node/Nginx or any Java web containers.

Dependencies

Pulsar Reporting need to be configured to connect to metric stores. The current supported metric store is Druid. We will add support for other metric stores later. To get the Demo application run, We need a running Druid cluster which ingests data from Pulsar Pipeline through Kafka.

We dockerized all the Druid dependencies and a Tomcat server to host Pulsar Reporting API. We recommend to have at least two servers to run all the docker images. One for Hadoop and the other for Druid and etc.

  • Druid: Metric store, ROLAP engine ( > 4vCPU 12GB RAM is recommended)
  • Memcached: Query cache for Druid and Pulsar Reporting API.
  • Hadoop: Deep storage for Druid (> 4vCPU 12GB RAM 60GB Disk is recommended, run it on a separated host than the others)
  • Tomcat: Pulsar Reporting API Server

Quick start

We assume you already have the Pulsar Pipeline up and run. So you have a running Kafka cluster ready for data ingestion to Druid. We will share the same Zookeeper used by Pulsar Pipeline and Kafka to manage Druid cluster. If you don't like the shared approach, you can choose to have separated Zookeeper instances to manager Druid also.

All the docker components:

Component Docker Image Name Docker Container Name
Druid pulsario-druid pulsarioDruid
Hadoop pulsario-hadoop pulsarioHadoop
Memcached pulsario-memcached pulsarioMemcached
Pulsar Reporting API pulsario-reportingapi pulsarioReportingapi
Pulsar Reporting UI pulsario-reportingui pulsarioReportingui

Note: The druid docker is single node mode, and need > 2h warm up time for data ingestion(This is a limitation from druid, need to wait for the first successful handler-over to get druid metadata from druid broker nodes).

Usage

Start Sequence

1. Memcached
2. Hadoop
3. Druid
4. Pulsar Reporting API
5. Pulsar Reporting UI

Start Steps

  1. Download the repo [docker scripts] (../../docker-files)

  2. Enter the workspace of the component

cd docker-files/pulsarReporting/<componentName>
  1. Build Image
./build
  1. Run Container

"pulsarioZookeeper IP" represents the IP of the host which is used to run Pulsar pipeline Zookeeper docker

"pulsarioKafka IP" represents the IP of the host which is used to run Pulsar pipeline Kafka broker docker

  1. Memcached
./shell
  1. Hadoop
./shell
  1. Druid
./shell <pulsarioDruid IP> <pulsarioMemcached IP> <pulsarioHadoop IP> <pulsarioKafka IP> <pulsarioZookeeper IP>
  1. Pulsar Reporting API
./shell <pulsarioDruid IP> <pulsarioMemcached IP>
  1. Pulsar Reporting UI
./shell <pulsarioReportingapi IP>

Check if things work

Druid Docker

Check the web console:

http://<pulsarioDruid IP>:8081

Issuing a query on druid realtime node:

curl -XPOST -H'Content-type: application/json' "http://<pulsarioDruid IP>:8084/druid/v2/?pretty" -d'{"queryType":"timeBoundary","dataSource":"pulsar_event"}'

Hadoop Docker

Check the web console:

http://<pulsarioHadoop IP>:50070/explorer.html#/druid

API Docker

Add a new datasource with query, for example:

curl --user admin:test -XPOST -H'Content-type: application/json' "http://<pulsarioTomcat IP>:8080/prapi/v2/datasources" -d'{"displayName":"datasource1", "type":"druid","endpoint":"http://<Druid Host IP>:8082/druid/v2/","comment":"testComment","properties":"{}"}'

Send Query by API, for example:

curl --user admin:test -XPOST -H'Content-type: application/json' "http://<pulsarioTomcat IP>:8080/prapi/v2/sql/trackingdruid" -d'{"sql":"select count(count) as row, browserfamily from pulsar_event group by _dd_d","intervals":"2015-XX-XX 00:00:00/2015-XX-XX 00:00:00","granularity":"all"}'

UI Docker

Go to   http://<pulsarioReportingui IP>:9090
Login by default account  (username:admin, password:test)

How to Run Pulsar Reporting UI

There are several ways to get the UI framework up and running. The supported browsers include Chrome and FireFox.

Run Pulsar Reporting UI locally

You can download pre-build binary from Pulsar.IO site: pulsar-reporting-ui-bundle

The downloaded zip file contains the current latest Pulsar Reporting UI build and all its dependencies. Extract the zip to any folder you want to run the Reporting UI from. You need to configure the Pulsar Reporting API endpoint before you start the application:

cd .../pulsar-reporting-ui-bundle/

Update endpoint in config-bundle.json file.

...
  "apiUrl": "http://<pulsarioReportingapi>/prapi/v2", // Update your endpoint
...

Start server

cd .../pulsar-reporting-ui-bundle
node server.dist.js

Go to http://localhost:9090

Run Pulsar Reporting UI on Docker

Please follow the setup steps to run Pulsar Reporting UI docker [Docker setup steps] (Getting-Started#start-steps)

Build from source and run locally

Please ensure you already downloaded and installed the latest node (which includes npm), bower and grunt.

You can manual download or clone source code form github

  • Install plugin and download bower dependency
cd .../pulsar-reporting-ui
npm install
bower install
  • Configuration endpoint
cd .../pulsar-reporting-ui/app/

Update endpoint in config-bundle.json file.

...
  "apiUrl": "http://<pulsarioReportingapi>/prapi/v2", // Update your endpoint
...
  • Start server
cd .../pulsar-reporting-ui/server
node server.js

Go to http://localhost:9191