Feature request: `KafkaReporter` in addition to `KafkaSender` #158

jeqo · 2019-08-20T00:08:22Z

Feature:

Add a KafkaReporter to process spans individually and have the ability to use Kafka producer native batching and been able to partition spans based on traceId.

Rationale

Current Sender API is aimed to transfer encoded batch of spans to different back-ends; already too late to do partitioning by key, and in the case of Kafka, overlaps with producer's native batching mechanisms (batch.size and linger.ms in https://kafka.apache.org/documentation/#producerconfigs).

Instead of reusing BoundedAsyncReporter with KafkaSender, a potential KafkaReporter will handle the report spans itself, allowing better message definition (e.g. using traceId as key) and using it's own batching mechanisms.

@Override public void report(Span span) {
    metrics.incrementSpans(1);
    int spanSizeInBytes = encoder.sizeInBytes(span);
    metrics.incrementSpanBytes(spanSizeInBytes);
    ProducerRecord<byte[], byte[]> record =
      new ProducerRecord<>(topic, span.traceId().getBytes(), encoder.encode(span));
    get().send(record, (metadata, exception) -> {
       if (exception != null) {
         metrics.incrementMessagesDropped(exception);
         metrics.incrementSpansDropped(1);
       }
    });
  }

Instead of looking for changes on the KafkaSender to support this improvements, Reporter API seems like the right abstraction level to place Kafka Producer.

Example Scenario:

While developing a Kafka-based storage and a streaming version of Zipkin Dependencies there is a need to have spans partitioned by traceId in order to group them and apply aggregations that can be used to apply sampling, dependency counts, etc.

In the Kafka Storage, we had to add a partitioning Span Consumer to compensate this which involves an additional overhead, compared with already keyed-spans; in which case we could focus only on processing.

The text was updated successfully, but these errors were encountered:

codefromthecrypt · 2019-08-22T01:01:32Z

first thought is to say "in addition to" as we can host both and would need to for the foreseeable future. This is also an alternative to add partitioning to normal async reporter I think.

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

jeqo · 2019-08-22T06:04:24Z

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

Agree, I have seen this could be an issue on the kafka-storage as well. Messages will be TraceId:SerializedListOfSpans

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

I remember something similar was mentioned here openzipkin/zipkin#2297 but haven't managed to find an issue with this.

This is also an alternative to add partitioning to normal async reporter I think.

Forgot to mention about this once I found the Reporter API fits here. If we do this on the AsyncReporter the main issue is that Sender API expects a serialized batch of spans, forcing KafkaSender to deserialize batch first to then build TraceId:SerializedListOfSpans messages.

If we could redesign the current APIs, one potential change on the sender API could be to receive a Map[TraceId,SerializedListOfSpans] and let the sender to build the batch depending on the back-end. This will add additional task to the AsyncReporter to build this Map instead of a batch, and to the Senders on building a request based on their backend e.g. Http Sender would build 1 payload from all map entries (before it just receive a batch and send a batch), Kafka Sender would build a producer record per map entry.

codefromthecrypt · 2024-04-16T18:09:07Z

closing due to lack of interest and stalled impl. someone can try again later if either change!

shakuzen added the enhancement label Aug 20, 2019

This was referenced Aug 21, 2019

Refactoring streams and storage openzipkin-contrib/zipkin-storage-kafka#25

Merged

Kafka reporter #159

Closed

jeqo changed the title ~~Feature request: KafkaReporter instead of KafkaSender~~ Feature request: KafkaReporter in addition to KafkaSender Aug 22, 2019

codefromthecrypt closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: `KafkaReporter` in addition to `KafkaSender` #158

Feature request: `KafkaReporter` in addition to `KafkaSender` #158

jeqo commented Aug 20, 2019

codefromthecrypt commented Aug 22, 2019

jeqo commented Aug 22, 2019

codefromthecrypt commented Apr 16, 2024

Feature request: KafkaReporter in addition to KafkaSender #158

Feature request: KafkaReporter in addition to KafkaSender #158

Comments

jeqo commented Aug 20, 2019

codefromthecrypt commented Aug 22, 2019

jeqo commented Aug 22, 2019

codefromthecrypt commented Apr 16, 2024

Feature request: `KafkaReporter` in addition to `KafkaSender` #158

Feature request: `KafkaReporter` in addition to `KafkaSender` #158