Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about workflow and task lifecycle events #1054

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

cdavernas
Copy link
Member

@cdavernas cdavernas commented Jan 9, 2025

Please specify parts of this PR update:

  • Specification
  • Schema
  • Examples
  • Extensions
  • Use Cases
  • Community
  • CTK
  • Other

Discussion or Issue link:
#1024
#1030

What this PR does:

  • Adds documentation about workflow and task lifecycle events

@cdavernas cdavernas added change: documentation Improvements or additions to documentation. It won't impact a version change. change: feature New feature or request. Impacts in a minor version change area: spec Changes in the Specification labels Jan 9, 2025
@cdavernas cdavernas added this to the v1.0.0 milestone Jan 9, 2025
@cdavernas cdavernas self-assigned this Jan 9, 2025
@cdavernas cdavernas linked an issue Jan 9, 2025 that may be closed by this pull request
@cdavernas cdavernas linked an issue Jan 9, 2025 that may be closed by this pull request
- [Task Lifecycle Events](#task-lifecycle-events)
+ [Task Created](#task-created-event)
+ [Task Started](#task-started-event)
+ [Task Suspended](#task-suspended-event)
Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I took a closer look, and after my yesterday comment related with workflow cancelation suspension, Im not 100% sure a task can be really suspended.
A cancelation suspension of a workflow can be understood as "let the current task to finish and hold execution of the next one" rather than "Interrupt current task and let it in a limbo"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a task is a composite one (for, do, try) it can be of course be stopped, but at the end, it will executing another non composite task (basically a call), which is kind of atomic.
I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I took a closer look, and after my yesterday comment related with workflow cancelation, Im not 100% sure a task can be really suspended.

Why not? Let's say you do a wait task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs. Same can apply to consuming events, foreach enumerations, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)

That's an option, but I think it's our loss. Workflow level updates are not enough in most cases, such as long running flows. An other option, though I'd prefer to leave it as is, is to only enforce workflow lifecycle events, while leaving task lifecycle events optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming, execute the next task after the wait or execute the wait again? I think we can avoid that question by not letting suspeding a workflow that is already waiting.
In the case of a For, what I suggested is to let the call that the loop is currently exeucting finish (the definition of finish might vary for a task, if a call, let the call finish, if a set, let the assignment finish, if a do, hold after the current task being executed finish, and so on )

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I edited my first comment, Im not talking about cancelation, but about suspension. Sorry for the confusion. Cancellation is clear, you cancel the workflow execution, interrupting if you can and not allowing resuming. But suspending-resuming is trickier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming

I think you missed my point there:

Let's say you do a wait task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs.

In the case of a For, what I suggested is to let the call that the loop is currently executing finish

If what you mean is that the iteration should be completed before suspending, I think it's a possibility, but not my personal preference, as the iteration can be a succession of perfectly suspendable tasks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not allow suspending a workflow that is waiting (so implementors do not need that resuming logic for event/wait, which might be pretty tricky in some cases)
And for other task, I think suspending should let the operation finish (the call or the jq expression) and then freeze, so you resume in the next task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not allow suspending a workflow that is waiting

Your opinion makes sense, it can be tricky indeed! Maybe we should let that up to the implementers, and add a couple of lines about those specifics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resuming tasks can be challenging. Ideally, as @cdavernas suggested, allowing the runtime to directly resume a suspended task would be the best approach. However, this is often not feasible. For a passive task, such as a wait operation, it might work. But for active tasks, like an HTTP call request, the situation becomes more complex. Even if the runtime cancels the request, the server may already be processing it, and retrying the request upon resuming could lead to unintended behaviors.

As a result, there isn't a universal solution. The suspension mechanism must depend on the nature of the task. Passive tasks, such as wait or listen, can typically be suspended and resumed. Active tasks, like call or run, would need to complete before the suspension can occur, deferring it to the subsequent task.

+ [Task Cancelled](#task-cancelled-event)
+ [Task Faulted](#task-faulted-event)
+ [Task Completed](#task-completed-event)
+ [Task Status Changed](#task-status-changed-event)
Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status, as I understood it, only applies to the worklow as a whole
https://github.com/serverlessworkflow/specification/blob/main/dsl.md#status-phases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in my opinion. Task status is extremely important in most, if not all workflow execution scenarios. Think of a long running flow which you represent in a UI. Just having flow events will let your users know the flow has started or ended, but whatever is in the middle will be unknown to them, potentially for hours or even days at a time.

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then we also need to edit that section to make it clear phase status applies both for workflow and task (I implemented it just for workflows because there was not hint on that section)
Note that I also implemented task created and task completed, because I concur on the usefulness for users, which is unclert to me is that they need more than task created and task completed ;), specially because we have compoiste task, so in theory you can have a do withint a do withitn a for withint a try and the four task will be running, so when you suspend them, you will have to send 4 events

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then we also need to edit that section to make it clear phase status applies both for workflow and task

I edited it in the PR, because as you said it only applied to workflows beforehand. Have you checked the updated section in the PR? I think it's clear, but I'd be happy to update it as you see fit!

Copy link
Member Author

@cdavernas cdavernas Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unclear to me is that they need more than task created and task completed

Well, I see what you mean, but having at least canceled and faulted seemed reasonable. Let's discuss it in today's daily!

+ [Workflow Completed](#workflow-completed-event)
+ [Workflow Status Changed](#workflow-status-changed-event)
- [Task Lifecycle Events](#task-lifecycle-events)
+ [Task Created](#task-created-event)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how a task can be created and not started

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a task can be instantiated, meaning initialized and provisioned with its input, and not have yet started. That's what we do in Synapse. Unlike workflows, tasks do not exist beforehand, you therefore need to tell the user about them being created: you cannot start something that has not been created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use an example
do

  • firstTask
  • secondTask
  • thirdTask

I guess the sequence of events will be firstTask created, first task started, ..., first task completed, second task created, second task started, ..., second task completed

so inmediatealy after firstTask instance has been created with some input, it will start execution. (obviously you have to create something before executing it, which Im discussing is the fact that a task can be created without starting it just inmediately)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Im trying to say is the same we do not have an event called "setting input to task instance before actually starting it" we do not need an event "taks created before starting it"

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im trying to save a redundant event, thats it, but if you feel the distinction between task created and task started is relevant, as far as it not fordibben for them to be simoulteneous in a particular implementation, Im fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a point, those are redundant in a way! I guess I'm a bit biased because of my EDA tunnel vision 👅
What do you guys think @JBBianchi @ricardozanini?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a simplified way and go with only started.

Copy link
Collaborator

@fjtirado fjtirado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several doubts about the concept of task life cycle.
Its doable, buy pretty tricky and Im not sure we have to force implementors to keep track of every task status.
Also, for some task, for example, wait, which should be the status? suspended?

dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl.md Show resolved Hide resolved
@cdavernas
Copy link
Member Author

cdavernas commented Jan 9, 2025

Also, for some task, for example, wait, which should be the status? suspended?

@fjtirado That's specified in the dsl.md file:

| waiting | The workflow/task execution is temporarily paused, awaiting either inbound event(s) or a specified time interval as defined by a wait task. |

cdavernas and others added 9 commits January 9, 2025 16:26
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
@cdavernas
Copy link
Member Author

I have several doubts about the concept of task life cycle.
Its doable, buy pretty tricky

I must be missing something, because I do not see the problem with it. As a matter of fact, we are already doing it in Synapse, as you can see here, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification change: documentation Improvements or additions to documentation. It won't impact a version change. change: feature New feature or request. Impacts in a minor version change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Cloud events to be publlished when workflow status change Add a new suspended workflow status phase
4 participants