-
Notifications
You must be signed in to change notification settings - Fork 15
Getting Started
This guide will help you get started and acquainted with Pulsar Reporting. We assume you already have a working Pulsar Pipeline and data is flowing to Kafka. If not, please refer to Pulsar Realtime Analytics to get Pulsar Pipeline up and run.
Pulsar Reporting is composed of three major components:
- Pulsar Reporting API:Provides an abstract data access layer on top of metric stores. Supports both SQL and structured JSON queries.
- Pulsar Reporting UI: Gets the data from the Pulsar Reporting API, builds different charts and displays them in the browser.
- [Druid Kafka Extension]: Optional Druid Kafka extension to replace the Druid Kafka firehore and avoid Kafka rebalance issues.
Pulsar Reporting is a very light framework. Pulsar Reporting API can be deployed to any Java web containers. Pulsar Reporting UI is a pure Angular.JS application, it can be deployed to Node/Nginx or any Java web containers.
Pulsar Reporting need to be configured to connect to metric stores. The current supported metric store is Druid. We will add support for other metric stores later. To get the Demo application run, We need a running Druid cluster which ingests data from Pulsar Pipeline through Kafka.
We dockerized all the Druid dependencies and a Tomcat server to host Pulsar Reporting API. We recommend to have at least two servers to run all the docker images. One for Hadoop and the other for Druid and etc.
- Druid: Metric store, ROLAP engine ( > 4vCPU 12GB RAM is recommended)
- Memcached: Query cache for Druid and Pulsar Reporting API.
- Hadoop: Deep storage for Druid (> 4vCPU 12GB RAM 60GB Disk is recommended, run it on a separated host than the others)
- Tomcat: Pulsar Reporting API Server
We assume you already have the Pulsar Pipeline up and run. So you have a running Kafka cluster ready for data ingestion to Druid. We will share the same Zookeeper used by Pulsar Pipeline and Kafka to manage Druid cluster. If you don't like the shared approach, you can choose to have separated Zookeeper instances to manager Druid also.
All the docker components:
| Component | Docker Image Name | Docker Container Name |
|---|---|---|
| Druid | pulsario-druid | pulsarioDruid |
| Hadoop | pulsario-hadoop | pulsarioHadoop |
| Memcached | pulsario-memcached | pulsarioMemcached |
| Pulsar Reporting API | pulsario-reportingapi | pulsarioReportingapi |
| Pulsar Reporting UI | pulsario-reportingui | pulsarioReportingui |
Note: The druid docker is single node mode, and need > 2h warm up time for data ingestion(This is a limitation from druid, need to wait for the first successful handler-over to get druid metadata from druid broker nodes).
1. Memcached
2. Hadoop
3. Druid
4. Pulsar Reporting API
5. Pulsar Reporting UI
-
Download the repo [docker scripts] (../../docker-files)
-
Enter the workspace of the component
cd docker-files/pulsarReporting/<componentName>
- Build Image
./build
- Run Container
"pulsarioZookeeper IP" represents the IP of the host which is used to run Pulsar pipeline Zookeeper docker
"pulsarioKafka IP" represents the IP of the host which is used to run Pulsar pipeline Kafka broker docker
- Memcached
./shell
- Hadoop
./shell
- Druid
./shell <pulsarioDruid IP> <pulsarioMemcached IP> <pulsarioHadoop IP> <pulsarioKafka IP> <pulsarioZookeeper IP>
- Pulsar Reporting API
./shell <pulsarioDruid IP> <pulsarioMemcached IP>
- Pulsar Reporting UI
./shell <pulsarioReportingapi IP>
Check the web console:
http://<pulsarioDruid IP>:8081
Issuing a query on druid realtime node:
curl -XPOST -H'Content-type: application/json' "http://<pulsarioDruid IP>:8084/druid/v2/?pretty" -d'{"queryType":"timeBoundary","dataSource":"pulsar_event"}'
Check the web console:
http://<pulsarioHadoop IP>:50070/explorer.html#/druid
Add a new datasource with query, for example:
curl --user admin:test -XPOST -H'Content-type: application/json' "http://<pulsarioTomcat IP>:8080/prapi/v2/datasources" -d'{"displayName":"datasource1", "type":"druid","endpoint":"http://<Druid Host IP>:8082/druid/v2/","comment":"testComment","properties":"{}"}'
Send Query by API, for example:
curl --user admin:test -XPOST -H'Content-type: application/json' "http://<pulsarioTomcat IP>:8080/prapi/v2/sql/trackingdruid" -d'{"sql":"select count(count) as row, browserfamily from pulsar_event group by _dd_d","intervals":"2015-XX-XX 00:00:00/2015-XX-XX 00:00:00","granularity":"all"}'
Go to http://<pulsarioReportingui IP>:9090
Login by default account (username:admin, password:test)
There are several ways to get the UI framework up and running. The supported browsers include Chrome and FireFox.
You can download pre-build binary from Pulsar.IO site: pulsar-reporting-ui-bundle
The downloaded zip file contains the current latest Pulsar Reporting UI build and all its dependencies. Extract the zip to any folder you want to run the Reporting UI from. You need to configure the Pulsar Reporting API endpoint before you start the application:
cd .../pulsar-reporting-ui-bundle/
Update endpoint in config-bundle.json file.
...
"apiUrl": "http://<pulsarioReportingapi>/prapi/v2", // Update your endpoint
...
Start server
cd .../pulsar-reporting-ui-bundle
node server.dist.js
Go to http://localhost:9090
Please follow the setup steps to run Pulsar Reporting UI docker [Docker setup steps] (Getting-Started#start-steps)
Please ensure you already downloaded and installed the latest node (which includes npm), bower and grunt.
You can manual download or clone source code form github
- Install plugin and download bower dependency
cd .../pulsar-reporting-ui
npm install
bower install
- Configuration endpoint
cd .../pulsar-reporting-ui/app/
Update endpoint in config-bundle.json file.
...
"apiUrl": "http://<pulsarioReportingapi>/prapi/v2", // Update your endpoint
...
- Start server
cd .../pulsar-reporting-ui/server
node server.js
Go to http://localhost:9191
- Web Site: http://gopulsar.io
- Google Group: Pulsar Google Group
- Developer Mail: [email protected]
- White Paper: Pulsar White Paper