Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you create reliable consumers? #1

Open
clarkbreyman-yammer opened this issue Jan 17, 2014 · 3 comments
Open

How do you create reliable consumers? #1

clarkbreyman-yammer opened this issue Jan 17, 2014 · 3 comments

Comments

@clarkbreyman-yammer
Copy link

Joe - looks really interesting but the HTTP protocol seems to leave you open to dropping messages if the consumer fails after receiving the message but before processing it. We were looking to build something similar but use an outbound post to ensure that the consumer was able to complete processing of the message before ACK.

@joestein
Copy link
Contributor

Hi Clark, we have started to think about that, yup. There are also low level consumer api scenarios that have to be implemented. For the consumer push scenario we have been looking at a few options including a way to plug in your own. So you could use https://github.com/Atmosphere/atmosphere or http://pusher.com/ or whatever. Right now more of our http based use cases are on the producer side but if you had an implementation we could do that.

So from a reliability perspective we need a low level consumer so that the offsets are synced after processing by the caller and done so by the caller. We also have to add a REST interface for committing the offsets with zookeeper for that caller's consumer group's consumer. This will allow the consumer to use zookeeper for managing the offsets but controlling all the business logic around that through a REST interface. This should be /consumer/ implementation with the GET being latest offset's message response for that topic, group, partition and POST being to commit the offset (which the caller will be responsible to-do)

If we did the above would that address your needs instead of a push? Or if you need/want a push mechanism can you elaborate more on specifics?

@clarkbreyman-yammer
Copy link
Author

Having the REST client call back to ACK is going to double the number of round-trip requests increasing latency and likely throughput. I'm still in the process of wrapping my head around kafka now but it seems like the parallelism of a consumer group is bounded by the number of partitions... meaning that the throughput is limited by the number of partitions and the consumption latency.

Having the REST client talk with ZK seems to defeat the purpose of protocol encapsulation.

@joestein
Copy link
Contributor

Parallelism of the consumers within a group are bound partitions, yes.

Here is more info on the low level (simple) consumer https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

We can hook in atmosphere and do web socket and fall back to long polling, would that work better?

We can support multiple different type of consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants