-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch tasks using Dask in Argo #120
Conversation
This looks great! I have a minor observation to share: in my exploration of distributed dask (using SSHCluster) i learned that Anyways, this looks great! I am going to emulate this in my argo workflows and seek your advice on any issues that i run into. Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Is the gist of your comment that I ought to modify the template as follows? import logging
import dask_gateway
logger = logging.getLogger("DaskWorkflow")
def main():
gw = dask_gateway.Gateway(auth="jupyterhub")
try:
opts = gw.cluster_options()
opts.worker_memory = int(os.environ['DASK_OPTS__WORKER_MEMORY'])
opts.worker_cores = int(os.environ['DASK_OPTS__WORKER_CORES'])
opts.scheduler_memory = int(os.environ['DASK_OPTS__SCHEDULER_MEMORY'])
opts.scheduler_cores = int(os.environ['DASK_OPTS__SCHEDULER_CORES'])
cluster = gw.new_cluster(opts)
cluster.scale(int(os.environ['DASK_OPTS__N_WORKERS']))
client = cluster.get_client()
logger.warning(f"Client dashboard: {client.dashboard_link}")
# Client code goes here
finally:
gw.stop_cluster(client.cluster.name)
if __name__ == "__main'":
main() It's worth noting that Dask Distributed works differently to Dask Gateway, and it should not be relying on threads/processes in the same way. I've not encountered any difficulty starting a |
Indeed, that is the gist of my comment. Additionally, I think that modifying it this way makes the script work correctly when we are trying to experiment in a non-argo environment. By the way, there's a minor typo: it ought to be Thanks for considering my point! |
Oops! Typo. Thanks. I adjusted the template in the README and the base flow example. |
As an additional point, it should be noted that without more complex logic, such an example template won't be interchangeable between the cloud environment and a local Dask distributed environment, since they have different imports and setup. I think. |
Yes, point taken! |
This could be made a little more explicit like this: import logging
import dask_gateway
logger = logging.getLogger("DaskWorkflow")
def run_on_cluster(fn):
gw = dask_gateway.Gateway(auth="jupyterhub")
try:
opts = gw.cluster_options()
opts.worker_memory = int(os.environ['DASK_OPTS__WORKER_MEMORY'])
opts.worker_cores = int(os.environ['DASK_OPTS__WORKER_CORES'])
opts.scheduler_memory = int(os.environ['DASK_OPTS__SCHEDULER_MEMORY'])
opts.scheduler_cores = int(os.environ['DASK_OPTS__SCHEDULER_CORES'])
cluster = gw.new_cluster(opts)
cluster.scale(int(os.environ['DASK_OPTS__N_WORKERS']))
client = cluster.get_client()
logger.warning(f"Client dashboard: {client.dashboard_link}")
fn()
finally:
gw.stop_cluster(client.cluster.name)
def client_code():
# Client code goes here
def main():
run_on_cluster(client_code)
if __name__ == "__main'":
main() Going to try to run the example on the cluster now. |
…structure to the demo problem
@rajadain I took your advice (a bit) and modularized the template a bit more. You add sort of two levels of indirection into your code, that I simplified a bit. Check the modified README. Was there a particular reason that you wanted to elect the client code function as a higher-order function call? |
Just for clarity, so the client code is free of distraction. Your solution works well! |
Overview
This PR provides some infrastructure for running Dask jobs independent of Jupyter. This enables long-running jobs that would be cumbersome to remain logged into Jupyter for. It also opens up a workflow based on standard python scripts, rather than Jupyter notebooks. I've been using a standard form for my scripts so that the cluster can be configured via the Argo job submission interface:
Closes #112
Checklist
Rannbautoexport export .
in/opt/src/notebooks
and committed the generated scripts. This is to make reviewing notebooks easier. (Note the export will happen automatically after saving notebooks from the Jupyter web app.)Notes
This workflow will eventually be added to the cluster configs as a
ClusterWorkflowTemplate
, but that will be handled by azavea/kubernetes-deployment#34.Testing Instructions
run-dask-job.yaml
into the manual editorhttps://jupyter.noaa.azavea.com