You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a group supervision tree, workers behave as expected: If their handler triggers a crash, they are restarted and the group can continue handling messages.
In a non-group supervision tree, crashed workers are simply ignored and never restarted. This is unintuitive and a pain to deal with, since the outer supervisor (the process returned by Elsa.Supervisor.start_link/1) keeps running as though everything were fine. I would expect the DynamicProcessManager to take care of worker restarts.
Here is an example integration test that fails and demonstrates the problem:
test"restarts a crashed worker that isn't in a group"dotopic="consumer-test3"Elsa.create_topic(@brokers,topic)start_supervised!({Elsa.Supervisor,connection: :name1,endpoints: @brokers,consumer: [topic: topic,handler: Testing.ExampleMessageHandlerWithState,handler_init_args: %{pid: self()},begin_offset: :earliest]})send_messages(topic,["message1"])send_messages(topic,["message2"])assert_receive{:message,%{topic: ^topic,value: "message1"}},5_000assert_receive{:message,%{topic: ^topic,value: "message2"}},5_000kill_worker(topic)send_messages(topic,["message3"])send_messages(topic,["message4"])# These assertions fail, because the worker wasn't brought back up.assert_receive{:message,%{topic: ^topic,value: "message3"}},5_000assert_receive{:message,%{topic: ^topic,value: "message4"}},5_000enddefmoduleTesting.ExampleMessageHandlerWithStatedouseElsa.Consumer.MessageHandlerdefinit(args)do{:ok,args}enddefhandle_messages(messages,state)doEnum.each(messages,&send(state.pid,{:message,&1})){:ack,state}endenddefpsend_messages(topic,messages)do:brod.start_link_client(@brokers,:test_client):brod.start_producer(:test_client,topic,[])messages|>Enum.with_index()|>Enum.each(fn{msg,index}->partition=rem(index,2):brod.produce_sync(:test_client,topic,partition,"",msg)end)enddefpkill_worker(topic)dopartition=0worker_pid=Elsa.Registry.whereis_name({:elsa_registry_name1,:"worker_#{topic}_#{partition}"})Process.exit(worker_pid,:kill)assertfalse==Process.alive?(worker_pid)end
The text was updated successfully, but these errors were encountered:
jtrees
added a commit
to jtrees/elsa
that referenced
this issue
Apr 25, 2023
In a group supervision tree, workers behave as expected: If their handler triggers a crash, they are restarted and the group can continue handling messages.
In a non-group supervision tree, crashed workers are simply ignored and never restarted. This is unintuitive and a pain to deal with, since the outer supervisor (the process returned by
Elsa.Supervisor.start_link/1
) keeps running as though everything were fine. I would expect theDynamicProcessManager
to take care of worker restarts.Here is an example integration test that fails and demonstrates the problem:
The text was updated successfully, but these errors were encountered: