Add Pulsar transport #2297

schrepfler · 2018-11-29T21:54:21Z

I think it would be useful to try to add an additional transport to Zipkin which would be Apache Pulsar.

Feature:
Description of the feature

Rational
Pulsar has several features which I think make it interesting for us:

Integrated SQL query capability.
Latest Pulsar has integrated SQL query capabilities based on the Facebook Presto engine. This means it's possible to query data directly from the topic thus reducing the need to have a separate DB tier.
Benefit: Potentially simplified deployments.
Unknown: Not quite clear how capable is the query engine and would it be suitable for the kind of queries Zipkin makes.
Tiered storage.
Pulsar has the capability to offload topics onto long term/cheap storage (ex. S3) without having to estimate ahead of time any size or time based expiration policies.
Benefit: Simplify DB management overhead and provide simple way how to enable cheap durable persistence.
Decoupled storage from brokers
By decoupling brokers from storage Pulsar (at least on paper) should be easier to scale out.
Benefit: Smaller operational overhead, less chances of errors and simplified operations.
Scales to more topics
Multitenancy and Georeplication capabilties.
Benefit: Potentially diverse business uses
Pulsar functions.
It's possible to implement very low latency triggers based on messages which trigger actions directly in the middleware.
Kafka compatibility mode.
Might be possible to use it in Kafka compatibility mode to simplify development (but at that point I'm not sure if it's possible to leverage the other features).
Web Sockets capability
It's possible to expose the data stream over web sockets, might be interesting if we want to do fancy stuff in the UI (like real time tracing on a giant ring, I think this view was lost from the old days at Twitter?)
Lower latency thank Kafka (on paper)
Schema registry
If the data is persisted in one of the formats that support schema it might be possible to evolve the format by using the format schema tools and the registry. I think using a schema is a requirement in order to be able to use Presto to query the data. JSON, Protobuf and Avro are supported.

bsideup · 2019-02-11T10:59:18Z

A few non-func points from me:

Pulsar provides truly async client, while Kafka is coming from the blocking world
Pulsar consumer API is simpler than Kafka's (where if you don't poll you might end up de-assigned from the partition (sic!))
Running Kafka behind NAT is problematic, while Pulsar can be used with a proxy
Pulsar SQL is a driver for Presto, not a brand new tech from scratch by a commercial owner of the tech
Backing up Pulsar clusters is much easier thanks to the tiered storage

schrepfler · 2019-02-11T17:26:37Z

Is 2) a problem? I think in this case we care mostly about the publisher API (agent/middleware -> pulsar) which can also be batched/delayed locally (at risk of loss).

bsideup · 2019-02-11T17:30:11Z

No2 is an advantage of Pulsar, not a problem :) Sorry for confusing :)

Also, Kafka's publisher can block on publish, even if async API is used, while afaik there is no such issue in Pulsar

schrepfler · 2019-02-11T21:53:27Z

No problem. I think Kafka did mitigate the blocking client a bit in the last releases (just for fairness/completeness of the argument) https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior

bsideup · 2019-02-11T21:56:05Z

Only in the consumer part, but there are still blocking calls in Kafka's async producer :(

codefromthecrypt · 2019-02-18T09:13:03Z

if anyone does a spike, do link back for the fans

codefromthecrypt · 2019-03-10T16:22:16Z

I see there was a spike of interest, but unsure any outcome. Did anyone do anything?

codefromthecrypt added collector help wanted labels Apr 18, 2019

codefromthecrypt mentioned this issue Jul 1, 2019

Apache Pulsar collector and reporter. #2645

Closed

jeqo mentioned this issue Aug 22, 2019

Feature request: KafkaReporter in addition to KafkaSender openzipkin/zipkin-reporter-java#158

Closed

CodePrometheus linked a pull request Jan 24, 2025 that will close this issue

Add Pulsar collector #3788

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pulsar transport #2297

Add Pulsar transport #2297

schrepfler commented Nov 29, 2018

bsideup commented Feb 11, 2019

schrepfler commented Feb 11, 2019 •

edited

Loading

bsideup commented Feb 11, 2019 •

edited

Loading

schrepfler commented Feb 11, 2019

bsideup commented Feb 11, 2019

codefromthecrypt commented Feb 18, 2019

codefromthecrypt commented Mar 10, 2019

Add Pulsar transport #2297

Add Pulsar transport #2297

Comments

schrepfler commented Nov 29, 2018

bsideup commented Feb 11, 2019

schrepfler commented Feb 11, 2019 • edited Loading

bsideup commented Feb 11, 2019 • edited Loading

schrepfler commented Feb 11, 2019

bsideup commented Feb 11, 2019

codefromthecrypt commented Feb 18, 2019

codefromthecrypt commented Mar 10, 2019

schrepfler commented Feb 11, 2019 •

edited

Loading

bsideup commented Feb 11, 2019 •

edited

Loading