Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: KafkaReporter in addition to KafkaSender #158

Closed
jeqo opened this issue Aug 20, 2019 · 3 comments
Closed

Feature request: KafkaReporter in addition to KafkaSender #158

jeqo opened this issue Aug 20, 2019 · 3 comments

Comments

@jeqo
Copy link
Member

jeqo commented Aug 20, 2019

Feature:

Add a KafkaReporter to process spans individually and have the ability to use Kafka producer native batching and been able to partition spans based on traceId.

Rationale

Current Sender API is aimed to transfer encoded batch of spans to different back-ends; already too late to do partitioning by key, and in the case of Kafka, overlaps with producer's native batching mechanisms (batch.size and linger.ms in https://kafka.apache.org/documentation/#producerconfigs).

Instead of reusing BoundedAsyncReporter with KafkaSender, a potential KafkaReporter will handle the report spans itself, allowing better message definition (e.g. using traceId as key) and using it's own batching mechanisms.

@Override public void report(Span span) {
    metrics.incrementSpans(1);
    int spanSizeInBytes = encoder.sizeInBytes(span);
    metrics.incrementSpanBytes(spanSizeInBytes);
    ProducerRecord<byte[], byte[]> record =
      new ProducerRecord<>(topic, span.traceId().getBytes(), encoder.encode(span));
    get().send(record, (metadata, exception) -> {
       if (exception != null) {
         metrics.incrementMessagesDropped(exception);
         metrics.incrementSpansDropped(1);
       }
    });
  }

Instead of looking for changes on the KafkaSender to support this improvements, Reporter API seems like the right abstraction level to place Kafka Producer.

Example Scenario:

While developing a Kafka-based storage and a streaming version of Zipkin Dependencies there is a need to have spans partitioned by traceId in order to group them and apply aggregations that can be used to apply sampling, dependency counts, etc.

In the Kafka Storage, we had to add a partitioning Span Consumer to compensate this which involves an additional overhead, compared with already keyed-spans; in which case we could focus only on processing.

@codefromthecrypt
Copy link
Member

first thought is to say "in addition to" as we can host both and would need to for the foreseeable future. This is also an alternative to add partitioning to normal async reporter I think.

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

@jeqo jeqo changed the title Feature request: KafkaReporter instead of KafkaSender Feature request: KafkaReporter in addition to KafkaSender Aug 22, 2019
@jeqo
Copy link
Member Author

jeqo commented Aug 22, 2019

Other things to watch is if we don't want to keep this compatible. Ex using a list instead of single span opens options including sending the whole local root as a unit, and also remaining compatible with normal zipkin receivers.

Agree, I have seen this could be an issue on the kafka-storage as well. Messages will be TraceId:SerializedListOfSpans

Previous versions of kafka had some nasty blocking behavior which could crash callers, so we need to be careful of any danger added to the call-site of the reporter including encoding overhead and potential exceptions.

I remember something similar was mentioned here openzipkin/zipkin#2297 but haven't managed to find an issue with this.

This is also an alternative to add partitioning to normal async reporter I think.

Forgot to mention about this once I found the Reporter API fits here. If we do this on the AsyncReporter the main issue is that Sender API expects a serialized batch of spans, forcing KafkaSender to deserialize batch first to then build TraceId:SerializedListOfSpans messages.

If we could redesign the current APIs, one potential change on the sender API could be to receive a Map[TraceId,SerializedListOfSpans] and let the sender to build the batch depending on the back-end. This will add additional task to the AsyncReporter to build this Map instead of a batch, and to the Senders on building a request based on their backend e.g. Http Sender would build 1 payload from all map entries (before it just receive a batch and send a batch), Kafka Sender would build a producer record per map entry.

@codefromthecrypt
Copy link
Member

closing due to lack of interest and stalled impl. someone can try again later if either change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants