From 835cd68b9dc59818e1d688edf66ecbe4234a07c6 Mon Sep 17 00:00:00 2001 From: vsoch Date: Fri, 26 Jul 2024 19:37:20 -0600 Subject: [PATCH] feedback: jacob review items Signed-off-by: vsoch --- .../tutorial/01_flux_tutorial.ipynb | 394 +++++++++++------- .../tutorial/02_flux_framework.ipynb | 10 +- .../03_flux_tutorial_conclusions.ipynb | 4 +- .../flux-workflow-examples/bulksubmit/0.sh | 3 + .../flux-workflow-examples/bulksubmit/1.sh | 3 + .../flux-workflow-examples/bulksubmit/2.sh | 3 + .../flux-workflow-examples/bulksubmit/3.sh | 3 + .../flux-workflow-examples/bulksubmit/4.sh | 3 + .../flux-workflow-examples/bulksubmit/5.sh | 3 + .../flux-workflow-examples/bulksubmit/6.sh | 3 + 10 files changed, 268 insertions(+), 161 deletions(-) create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/0.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/1.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/2.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/3.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/4.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/5.sh create mode 100755 2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/6.sh diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb index 3a0a267..90fd9da 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb @@ -25,10 +25,6 @@ " \n", "Flux is a flexible framework for resource management, built for your site. The framework consists of a suite of projects, tools, and libraries that may be used to build site-custom resource managers for High Performance Computing centers and cloud environments. Flux is a next-generation resource manager and scheduler with many transformative capabilities like hierarchical scheduling and resource management (you can think of it as \"fractal scheduling\") and directed-graph based resource representations.\n", "\n", - "> I'm ready! How do I do this tutorial? 😁️\n", - "\n", - "To step through examples in this notebook you need to execute cells. To run a cell, press Shift+Enter on your keyboard. If you prefer, you can also paste the shell commands in the and execute them there. This notebook provides the main Flux tutorial, and we have several other modules available:\n", - "\n", "## I'm ready! How do I do this tutorial? 😁️\n", "\n", "This tutorial is split into 3 chapters, each of which has a notebook:\n", @@ -45,7 +41,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 22, "id": "d71ecd22-8552-4b4d-9bc4-61d86f8d33fe", "metadata": { "tags": [] @@ -86,9 +82,23 @@ "tags": [] }, "source": [ + "
\n", + "\n", "# Getting started with Flux\n", "\n", - "The code and examples that this tutorial is based on can be found at [flux-framework/Tutorials](https://github.com/flux-framework/Tutorials/tree/master/2024-RADIUSS-AWS). You can also find python examples in the `flux-workflow-examples` directory from the sidebar navigation in this JupyterLab instance. To read the Flux manpages and get help, run `flux help`. To get documentation on a subcommand, run, e.g. `flux help config`. Here is an example of running `flux help` right from the notebook. Yes, did you know we are running in a Flux Instance right now?" + "The code and examples that this tutorial is based on can be found at [flux-framework/Tutorials](https://github.com/flux-framework/Tutorials/tree/master/2024-RADIUSS-AWS). You can also find python examples in the `flux-workflow-examples` directory from the sidebar navigation in this JupyterLab instance. " + ] + }, + { + "cell_type": "markdown", + "id": "ae33fef6-278c-4996-8534-fd15e548b338", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "Tip: Did you know you can get help for flux or a flux command? For example, try \"flux help\" and \"flux help jobs\"\n", + "
" ] }, { @@ -156,18 +166,6 @@ "!flux help" ] }, - { - "cell_type": "markdown", - "id": "ae33fef6-278c-4996-8534-fd15e548b338", - "metadata": { - "tags": [] - }, - "source": [ - "
\n", - "Tip: Did you know you can also get help for a specific command? For example, run, `flux help jobs` to get information on a sub-command.\n", - "
" - ] - }, { "cell_type": "code", "execution_count": 3, @@ -756,53 +754,6 @@ "For cases when you need a terminal, we will ! However, you can also select `File -> New -> Terminal` to open one on the fly. Let's next talk about flux instances." ] }, - { - "cell_type": "markdown", - "id": "70e3df1d-32c9-4996-b6f7-2fa85f4c02ad", - "metadata": { - "tags": [] - }, - "source": [ - "# Creating Flux Instances\n", - "\n", - "A Flux instance is a fully functional set of services which manage compute resources under its domain with the capability to launch jobs on those resources. A Flux instance may be running as the default resource manager on a cluster, a job in a resource manager such as Slurm, LSF, or Flux itself, or as a test instance launched locally.\n", - "\n", - "When run as a job in another resource manager, Flux is started like an MPI program, e.g., under Slurm we might run `srun [OPTIONS] flux start [SCRIPT]`. Flux is unique in that a test instance that mimics a multi-node instance can be started locally with simply:\n", - "\n", - "```bash\n", - "flux start --test-size=4\n", - "```\n", - "\n", - "This offers users to a way to learn and test interfaces and commands without access to an HPC cluster.\n", - "To start a Flux session with 4 brokers in your notebook container here, run:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "d568de50-f9e0-452f-8364-e52853013d83", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4\n" - ] - } - ], - "source": [ - "!flux start --test-size=4 flux getattr size" - ] - }, - { - "cell_type": "markdown", - "id": "e693f2d9-651f-4f58-bf53-62528caa83d9", - "metadata": {}, - "source": [ - "When you run `flux start` without a command, it will give you an interactive shell to the instance. When you provide a command (as we do above) it will run it and exit. This is what happens for the command above! The output indicates the number of brokers started successfully. As soon as we get and print the size, we exit." - ] - }, { "cell_type": "markdown", "id": "ec052119", @@ -841,7 +792,7 @@ "id": "0086e47e", "metadata": {}, "source": [ - "Flux can also bootstrap its resource graph based on static input files, like in the case of a multi-user system instance setup by site administrators. [More information on Flux's static resource configuration files](https://flux-framework.readthedocs.io/en/latest/adminguide.html#resource-configuration). Flux provides a more standard interface to listing available resources that works regardless of the resource input source: `flux resource`." + "Flux can also bootstrap its resource graph based on static input files, like in the case of a multi-user system instance setup by site administrators. [More information on Flux's static resource configuration files](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/guide/admin.html#configuration). Flux provides a more standard interface to listing available resources that works regardless of the resource input source: `flux resource`." ] }, { @@ -898,6 +849,8 @@ "tags": [] }, "source": [ + "
\n", + "\n", "# Flux Commands \n", "\n", "Here are how Flux commands map to a scheduler you are likely familiar with, Slurm. A larger table with similar mappings for LSF, Moab, and Slurm can be [viewed here](https://hpc.llnl.gov/banks-jobs/running-jobs/batch-system-cross-reference-guides). For submitting jobs, you can use the `flux` `submit`, `run`, `bulksubmit`, `batch`, and `alloc` commands.\n", @@ -964,7 +917,7 @@ "## flux run\n", "\n", "
\n", - "Description: One-off run of a single job (blocking)\n", + "Description: Running a single job (blocking)\n", "
\n", "\n", "The `flux run` command submits a job to Flux (similar to `flux submit`) but then attaches to the job with `flux job attach`, printing the job's stdout/stderr to the terminal and exiting with the same exit code as the job. It's basically doing an interactive submit, because you will be able to watch the output in your terminal, and it will block your terminal until the job completes." @@ -972,7 +925,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 19, "id": "52d26496-dd1f-44f7-bb10-8a9b4b8c9c80", "metadata": {}, "outputs": [ @@ -980,7 +933,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "8660c254a8e5\n" + "749a39b51885\n" ] } ], @@ -998,7 +951,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 20, "id": "fa40cb98-a138-4771-a7ef-f1860dddf7db", "metadata": {}, "outputs": [ @@ -1070,16 +1023,16 @@ "## flux submit\n", "\n", "
\n", - "Description: One-off run of a single job (not blocking)\n", + "Description: Running a single job (not blocking)\n", "
\n", "\n", "\n", - "The `flux submit` command submits a job to Flux and prints out the jobid. " + "The `flux submit` command submits a job to Flux and prints out the jobid." ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 1, "id": "cc2bddee-f454-4674-80d4-4a39c5f1bee2", "metadata": {}, "outputs": [ @@ -1112,7 +1065,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 2, "id": "8a5e7d41-1d8d-426c-8198-0ad4a57e7d04", "metadata": {}, "outputs": [ @@ -1120,7 +1073,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Ζ’3VqNqo3Qs\n" + "Ζ’ckWM1ZXM\n" ] } ], @@ -1128,6 +1081,66 @@ "!flux submit hostname" ] }, + { + "cell_type": "markdown", + "id": "809292e5-3f24-4528-916f-8733d065de47", + "metadata": {}, + "source": [ + "But how does one get output? To quickly see output (which will block the terminal if the job is still running) after a submit, you can do:\n", + "\n", + "```bash\n", + "flux job attach $(flux job last)\n", + "```\n", + "\n", + "To provide a custom path to an output or error file, you can provide `--out` and `--err`, respectively. Let's try those both now." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "38a4da7f-2b84-4c67-9da1-02435005d392", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ζ’ckWM1ZXM\n", + "749a39b51885\n" + ] + } + ], + "source": [ + "# What was the last job id again?\n", + "! flux job last\n", + "\n", + "# Attach to the last job id that was submitted (will block if still running and stream output)\n", + "! flux job attach $(flux job last)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "89a851d3-0179-4e5e-9e20-93bc11b5056f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ζ’feTb2bBm\n", + "Did a polar bear with a soft drink write this...?! πŸ»β€β„οΈπŸ₯€οΈπŸ˜ŽοΈ \n" + ] + } + ], + "source": [ + "# Now let's submit another one, and give it the same output and error file\n", + "! flux submit --out /tmp/hola-cola.txt --err /tmp/hola-cola.txt echo \"Did a polar bear with a soft drink write this...?! πŸ»β€β„οΈπŸ₯€οΈπŸ˜ŽοΈ \"\n", + "\n", + "# Take a look!\n", + "! cat /tmp/hola-cola.txt" + ] + }, { "cell_type": "markdown", "id": "a7e4c25e-3ca8-4277-bb70-a0e94bcd223b", @@ -1162,7 +1175,7 @@ "## flux bulksubmit\n", "\n", "
\n", - "Description: Bulk submission of jobs (not blocking)\n", + "Description: Submitting jobs in bulk (not blocking)\n", "
\n", "\n", "The `flux bulksubmit` command enqueues jobs based on a set of inputs which are substituted on the command line, similar to `xargs` and the GNU `parallel` utility, except the jobs have access to the resources of an entire Flux instance instead of only the local system." @@ -1201,12 +1214,6 @@ "The `--cc` option (akin to \"carbon copy\") to `submit` makes repeated submission even easier via, `flux submit --cc=IDSET`:" ] }, - { - "cell_type": "markdown", - "id": "392a8056-1661-4b76-9ca3-5e536c687e82", - "metadata": {}, - "source": [] - }, { "cell_type": "code", "execution_count": 16, @@ -1270,18 +1277,48 @@ "flux submit --cc=\"1-10\" echo \"Hello I am job {cc}\"\n", "\n", "# Submits scripts myscript1.sh through myscript10.sh\n", - "flux submit --cc=1-10 myscript{cc}.sh\n", + "flux submit --cc=0-6 flux-workflow-examples/bulksubmit/{cc}.sh\n", "\n", "# Bypass the key value store and write output to file with jobid\n", - "flux submit --output=job-{{id}}.out echo \"This is job {cc}\"\n", + "flux submit --cc=1-10 --output=job-{{id}}.out echo \"This is job {cc}\"\n", "\n", "# Use carbon copy to submit identical jobs with different inputs\n", - "flux bulksubmit --dry-run --cc={0} echo {1} ::: a b c ::: 0-1 0-3 0-7\n", + "flux bulksubmit --dry-run --cc={1} echo {0} ::: a b c ::: 0-1 0-3 0-7\n", "```\n", "\n", - "Of course, Flux can launch more than just single-node, single-core jobs. We can submit multiple heterogeneous jobs and Flux will co-schedule the jobs while also ensuring no oversubscription of resources (e.g., cores).\n", - "\n", - "Note: in this tutorial, we cannot assume that the host you are running on has multiple cores, thus the examples below only vary the number of nodes per job. Varying the `cores-per-task` is also possible on Flux when the underlying hardware supports it (e.g., a multi-core node)." + "Of course, Flux can launch more than just single-node, single-core jobs. We can submit multiple heterogeneous jobs and Flux will co-schedule the jobs while also ensuring no oversubscription of resources (e.g., cores). Let's run the second example here, and add a clever trick to ask for output as we submit the jobs. This is a fun one, I promise!" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "2f089be5-6d32-40db-b9e9-328e5200b754", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Once upon a time... πŸ“—οΈ\n", + "There was a little duck πŸ¦†οΈ\n", + "Her name was pizzaquack πŸ•οΈ\n", + "She was very fond of cheese πŸ§€οΈ\n", + "And running Flux πŸŒ€οΈ\n", + "And so she ran Flux, while she ate her cheese πŸ˜‹οΈ\n", + "And was so happy! The end. 🌈️\n" + ] + } + ], + "source": [ + "! for jobid in $(flux submit --cc=0-6 /bin/bash flux-workflow-examples/bulksubmit/{cc}.sh); do flux job attach ${jobid}; done" + ] + }, + { + "cell_type": "markdown", + "id": "6d3623b2-ca25-4d42-8e43-0c8e038464b4", + "metadata": {}, + "source": [ + "Note: in this tutorial, we cannot assume that the host you are running on has multiple cores, thus the examples below only vary the number of nodes per job. Varying the `cores-per-task` is also possible on Flux when the underlying hardware supports it (e.g., a multi-core node). Let's run the middle example - it's a fun one, I promise!" ] }, { @@ -1731,7 +1768,7 @@ "Description: Allocation for an interactive instance\n", "\n", "\n", - "You might want to request an allocation for a set of resources (an allocation) and then attach to the interactively. This is the goal of flux alloc. Since we can't easily do that in a cell, try opening up the and doing: \n", + "You might want to request an allocation for a set of resources (an allocation) and then attach to them interactively. This is the goal of flux alloc. Since we can't easily do that in a cell, try opening up the and doing: \n", "\n", "```bash\n", "# Look at the resources you have outside of the allocation\n", @@ -2347,9 +2384,7 @@ "id": "75c0ae3f-2813-4ae8-83be-00be3df92a4b", "metadata": {}, "source": [ - "Each of `flux batch` and `flux alloc` hints at creating a Flux instance. How deep can we go into that rabbit hole, perhaps for jobs and workflows with nested logic or more orchestration complexity?\n", - "\n", - "" + "Each of `flux batch` and `flux alloc` hints at creating a Flux instance. How deep can we go into that rabbit hole, perhaps for jobs and workflows with nested logic or more orchestration complexity?" ] }, { @@ -2377,7 +2412,7 @@ "Now that we understand nested instances, let's look at another batch example that better uses them. Here we have two job scripts:\n", "\n", "- [sub_job1.sh](sub_job1.sh): Is going to be run with `flux batch` and submit sub_job2.sh\n", - "- [sub_job2.sh](sub_job2.sh): Is going to be submit by sub_job1.sh.\n", + "- [sub_job2.sh](sub_job2.sh): Is going to be submitted by sub_job1.sh.\n", "\n", "Take a look at each script to see how they work, and then submit it!" ] @@ -2455,17 +2490,40 @@ "You can also try a more detailed view with `flux pstree -a -X`!" ] }, + { + "cell_type": "code", + "execution_count": 37, + "id": "72567af7-aa40-46b7-be43-c9e8124c1c7e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", + "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", + "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", + "[1-3]: Exit 1\n" + ] + } + ], + "source": [ + "!flux exec -r all -x 0 flux archive extract --name myarchive --directory $(pwd) shared-file.txt" + ] + }, { "cell_type": "markdown", "id": "eda1a33c-9f9e-4ba0-a013-e97601f79e41", "metadata": {}, "source": [ + "
\n", + "\n", "# Process, Monitoring, and Job Utilities βš™οΈ\n", "\n", "## flux exec πŸ‘ŠοΈ\n", "\n", "
\n", - "Description: Execute commands across ranks\n", + "Description: Executing commands across ranks\n", "
\n", "\n", "Have you ever wanted a quick way to execute a command to all of your nodes in a flux instance? It might be to create a directory, or otherwise interact with a file. This can be hugely useful in environments where you don't have a shared filesystem, for example. This is a job for flux exec! Here is a toy example to execute the command to every rank (`-r all`) to print." @@ -2591,7 +2649,7 @@ "## flux archive πŸ“šοΈ\n", "\n", "
\n", - "Description: Create file and content archives to access later and between ranks\n", + "Description: Creating file and content archives to access later and between ranks\n", "
\n", "\n", "As Flux is used more in cloud environments, we might find ourselves in a situation where we have a cluster without a shared filesystem. The `flux archive` command helps with this situation. At a high level, `flux archive` allows us to save named pieces of data (e.g., files) to the Flux KVS for later retrieval.\n", @@ -2640,27 +2698,6 @@ "Now that the directory has been created on all our nodes, we can extract the archive onto those nodes by combining `flux exec` and `flux archive extract`." ] }, - { - "cell_type": "code", - "execution_count": 37, - "id": "72567af7-aa40-46b7-be43-c9e8124c1c7e", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", - "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", - "flux-archive: shared-file.txt: write: Attempt to overwrite existing file\n", - "[1-3]: Exit 1\n" - ] - } - ], - "source": [ - "!flux exec -r all -x 0 flux archive extract --name myarchive --directory $(pwd) shared-file.txt" - ] - }, { "cell_type": "markdown", "id": "8b35f8a6-869b-4f4f-874a-074919dfcc51", @@ -2695,7 +2732,7 @@ "## flux uptime\n", "\n", "
\n", - "Description: Show how long this flux instance has been running\n", + "Description: Showing how long a flux instance has been running\n", "
\n", "\n", "Did someone say... [uptime](https://youtu.be/SYRlTISvjww?si=zDlvpWbBljUmZw_Q)? β˜οΈπŸ•‘οΈπŸ•ΊοΈ\n", @@ -2729,17 +2766,17 @@ "## flux top \n", "\n", "
\n", - "Description: Show a table of real-time Flux processes\n", + "Description: Showing a table of real-time Flux processes\n", "
\n", "\n", - "Flux provides a feature-full version of `top` for nested Flux instances and jobs. In the JupyterLab terminal, invoke `flux top` to see the \"sleep\" jobs. If they have already completed you can resubmit them. \n", + "Flux provides a feature-full version of `top` for nested Flux instances and jobs. In the invoke `flux top` to see the \"sleep\" jobs. If they have already completed you can resubmit them. \n", "\n", "We recommend not running `flux top` in the notebook as it is not designed to display output from a command that runs continuously.\n", "\n", "## flux pstree \n", "\n", "
\n", - "Description: Show a flux process tree (and see nesting in instances)\n", + "Description: Showing a flux process tree (and seeing nesting in instances)\n", "
\n", "\n", "In analogy to `top`, Flux provides `flux pstree`. Try it out in the or here in the notebook.\n", @@ -2747,10 +2784,30 @@ "## flux proxy\n", "\n", "
\n", - "Description: Interact with a job hierarchy\n", + "Description: Interacting with a job hierarchy\n", "
\n", "\n", - "Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. You may want to edit `sleep_batch.sh` with the JupyterLab text editor (double click the file in the window on the left) to sleep for `60` or `120` seconds. Then from the run the commands below!" + "Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. From the run the commands below!\n", + "\n", + "```bash\n", + "# Outputs the JOBID\n", + "flux batch --nslots=2 --cores-per-slot=1 --nodes=2 ./sleep_batch.sh\n", + "\n", + "# Put the JOBID into an environment variable\n", + "JOBID=$(flux job last)\n", + "\n", + "# See the flux process tree\n", + "flux pstree -a\n", + "\n", + "# Connect to the Flux instance corresponding to JOBID above\n", + "flux proxy ${JOBID}\n", + "\n", + "# Note the depth is now 1 and the size is 2: we're one level deeper in a Flux hierarchy and we have only 2 brokers now.\n", + "flux uptime\n", + "\n", + "# This instance has 2 \"nodes\" and 2 cores allocated to it\n", + "flux resource list\n", + "```" ] }, { @@ -2761,7 +2818,7 @@ "## flux queue\n", "\n", "
\n", - "Description: Interact with and inspect Flux queues\n", + "Description: Interacting with and inspecting Flux queues\n", "
\n", "\n", "Flux has a command for controlling the queue within the `job-manager`: `flux queue`. This includes disabling job submission, re-enabling it, waiting for the queue to become idle or empty, and checking the queue status:" @@ -2804,7 +2861,7 @@ "## flux getattr\n", "\n", "
\n", - "Description: Get attributes about your system and environment\n", + "Description: Getting attributes about your system and environment\n", "
\n", "\n", "Each Flux instance has a set of attributes that are set at startup that affect the operation of Flux, such as `rank`, `size`, and `local-uri` (the Unix socket usable for communicating with Flux). Many of these attributes can be modified at runtime, such as `log-stderr-level` (1 logs only critical messages to stderr while 7 logs everything, including debug messages). Here is an example set that you might be interested in looking at:" @@ -2895,10 +2952,10 @@ "## flux module\n", "\n", "
\n", - "Description: Manage Flux extension modules\n", + "Description: Managing Flux extension modules\n", "
\n", "\n", - "Services within a Flux instance are implemented by modules. To query and manage broker modules, use `flux module`. Modules that we have already directly interacted with in this tutorial include `resource` (via `flux resource`), `job-ingest` (via `flux` and the Python API) `job-list` (via `flux jobs`) and `job-manager` (via `flux queue`), and we will interact with the `kvs` module in a few cells. For the most part, services are implemented by modules of the same name (e.g., `kvs` implements the `kvs` service and thus the `kvs.lookup` RPC). In some circumstances, where multiple implementations for a service exist, a module of a different name implements a given service (e.g., in this instance, `sched-fluxion-qmanager` provides the `sched` service and thus `sched.alloc`, but in another instance `sched-simple` might provide the `sched` service)." + "Services within a Flux instance are implemented by modules. To query and manage broker modules, use `flux module`. Modules that we have already directly interacted with in this tutorial include `resource` (via `flux resource`), `job-ingest` (via `flux` and the Python API) `job-list` (via `flux jobs`) and `job-manager` (via `flux queue`). For the most part, services are implemented by modules of the same name. In some circumstances, where multiple implementations for a service exist, a module of a different name implements a given service (e.g., in this instance, `sched-fluxion-qmanager` provides the `sched` service and thus `sched.alloc`, but in another instance `sched-simple` might provide the `sched` service)." ] }, { @@ -2951,7 +3008,7 @@ "## flux dmesg\n", "\n", "
\n", - "Description: View Flux system messages\n", + "Description: Viewing Flux system messages\n", "
\n", "\n", "\n", @@ -3007,11 +3064,61 @@ "!flux dmesg" ] }, + { + "cell_type": "markdown", + "id": "70e3df1d-32c9-4996-b6f7-2fa85f4c02ad", + "metadata": { + "tags": [] + }, + "source": [ + "### flux start\n", + "\n", + "
\n", + "Description: Interactively starting a set of resources\n", + "
\n", + "\n", + "Sometimes you need to interactively start a set of compute resources. We call this subset a flux instance. You can launch jobs under this instance, akin to how you've done above! In fact, this entire tutorial is started (to give you 4 faux nodes) with a `flux start` command: \n", + "\n", + "```bash\n", + "flux start --test-size=4\n", + "```\n", + "\n", + "A Flux instance may be running as the default resource manager on a cluster, a job in a resource manager such as Slurm, LSF, or Flux itself, or as a test instance launched locally. This is really neat because it means you can launch Flux under other resource managers where it is not installed as the system workload manager. You can also execute \"one off\" commands to it, for example, to see the instance size:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "d568de50-f9e0-452f-8364-e52853013d83", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4\n" + ] + } + ], + "source": [ + "!flux start --test-size=4 flux getattr size" + ] + }, + { + "cell_type": "markdown", + "id": "e693f2d9-651f-4f58-bf53-62528caa83d9", + "metadata": {}, + "source": [ + "When you run `flux start` without a command, it will give you an interactive shell to the instance. When you provide a command (as we do above) it will run it and exit. This is what happens for the command above! The output indicates the number of brokers started successfully. As soon as we get and print the size, we exit." + ] + }, { "cell_type": "markdown", "id": "997faffc", "metadata": {}, "source": [ + "
\n", + "\n", "# Python Submission API 🐍️\n", "Flux also provides first-class python bindings which can be used to submit jobs programmatically. \n", "\n", @@ -3102,13 +3209,11 @@ ] }, { - "cell_type": "code", - "execution_count": 47, - "id": "810b72b9-2de2-4b62-9330-252eede22abb", + "cell_type": "markdown", + "id": "197ee252-dfc9-4256-8d45-df40718c5c3f", "metadata": {}, - "outputs": [], "source": [ - "### `flux jobs`" + "You can now run `flux jobs` to see the jobs that we submit from Python." ] }, { @@ -3205,29 +3310,6 @@ "print(compute_jobreq.dumps(indent=2))" ] }, - { - "cell_type": "code", - "execution_count": 50, - "id": "pregnant-creativity", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " Ζ’3VyvraktF jovyan compute.py F 1 1 0.014s 8660c254a8e5\n", - " Ζ’3Vyv4d99Z jovyan compute.py F 1 1 0.020s 8660c254a8e5\n", - " Ζ’2YnijmLwy jovyan compute.py F 1 1 0.031s 8660c254a8e5\n", - " Ζ’2YiqfxNdm jovyan compute.py F 1 1 0.012s 8660c254a8e5\n", - " Ζ’2YYgVHnyV jovyan compute.py F 1 1 0.062s 8660c254a8e5\n", - " Ζ’2YYE7Ja9d jovyan compute.py F 1 1 0.048s 8660c254a8e5\n" - ] - } - ], - "source": [ - "!flux jobs -a | grep compute" - ] - }, { "cell_type": "markdown", "id": "a8051640", @@ -3350,7 +3432,7 @@ "1. Submitting jobs with Flux\n", "2. The Flux Hierarchy\n", "3. Flux Process and Job Utilities\n", - "4. Deeper Dive into Flux Internals\n", + "4. Python Submission API\n", "\n", "To continue with the tutorial, open [Chapter 2](./02_flux_framework.ipynb)" ] diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb index e4030be..1ae33bc 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb @@ -18,6 +18,8 @@ "3. Examples `flux kvs` that powers a lot of higher level commands\n", "4. Advanced job specification interaction with flux job\n", "\n", + "
\n", + "\n", "## The structure of Flux instances\n", "\n", "As mentioned in [Chapter 2](./01_flux_tutorial.ipynb), a Flux instance is comprised of one or more Flux brokers. A high-level depiction of the design of a Flux broker is shown in the figure below.\n", @@ -29,7 +31,7 @@ "\n", "\n", "Each broker is a program built on top of the βˆ…MQ networking library. The broker contains two main components. First, the broker implements Flux-specific networking abstractions over βˆ…MQ, such as remote-proceedure call (RPC) and publication-subscription (pub-sub). Second, the broker contains several core services, such as PMI (for MPI support), run control support (for enabling automatic startup of other services), and, most importantly, broker module management. The remainder of a Flux broker's functionality comes from broker modules: specially designed services that the broker can deploy in independent OS threads. Some examples of broker modules provided by Flux include:\n", - "* Job scheduling (both [traditional and hierarchical](./02_flux_scheduling.ipynb))\n", + "* Job scheduling (both traditional and hierarchical)\n", "* [Fluxion](https://github.com/flux-framework/flux-sched) (Flux's advanced graph-based scheduler)\n", "* Banks and accounting (for system-wide deployments of Flux)\n", "* [PMIx](https://github.com/openpmix/openpmix) (for OpenMPI)\n", @@ -58,6 +60,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "
\n", + "\n", "## Flux Modules\n", "\n", "To manage and query modules, Flux provides the `flux module` command. The sub-commands provided by `flux module` can be seen by running the cell below." @@ -99,7 +103,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "While going through [Module 2](./02_flux_scheduling.ipynb), we've already encountered some built-in services provided by Flux, such as:\n", + "Some examples of Flux modules include:\n", "* `job-ingest` (used by Flux submission commands like `flux batch` and `flux run`)\n", "* `job-list` (used by `flux jobs`)\n", "* `sched-fluxion-qmanager` (used by `flux tree`)\n", @@ -289,6 +293,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "
\n", + "\n", "## flux jobspec generation\n", "\n", "Underlying much interaction with jobs is the creation of job specifications. When you use the command line or Python SDK and submit from a command or script, under the hood (back to that plumbing reference) we are creating a job specification \"Jobspec\" that is passed further through Flux. The command `flux submit` makes it possible to provide a similar command, but instead of running it, to generate the jobspec. Let's do that now. We will generate and view a Jobspec for a simple \"hello world\" job. We do that by adding `--dry-run`." diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/03_flux_tutorial_conclusions.ipynb b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/03_flux_tutorial_conclusions.ipynb index 8eae6e9..e89ec3d 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/03_flux_tutorial_conclusions.ipynb +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/03_flux_tutorial_conclusions.ipynb @@ -31,7 +31,7 @@ "\n", "> Where can I learn to set this up on my own?\n", "\n", - "If you're interested in installing Flux on your cluster, take a look at the [system instance instructions](https://flux-framework.readthedocs.io/en/latest/adminguide.html). If you are interested in running Flux on Kubernetes, check out the [Flux Operator](https://github.com/flux-framework/flux-operator). \n", + "If you're interested in installing Flux on your cluster, take a look at the [system instance instructions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/guide/admin.html). If you are interested in running Flux on Kubernetes, check out the [Flux Operator](https://github.com/flux-framework/flux-operator). \n", "\n", "> How can I run this tutorial on my own?\n", "\n", @@ -60,8 +60,6 @@ "* [DYAD's ReadTheDocs page](https://dyad.readthedocs.io/en/latest/)\n", "* [DYAD's GitHub repository](https://github.com/flux-framework/dyad)\n", "* [eScience 2022 Short Paper](https://dyad.readthedocs.io/en/latest/_downloads/27090817b034a89b76e5538e148fea9e/ShortPaper_2022_eScience_LLNL.pdf)\n", - "* [SC 2023 ACM Student Research Competition Extended Abstract](https://github.com/flux-framework/dyad/blob/main/docs/_static/ExtendedAbstract_2023_SC_ACM_SRC_DYAD.pdf)\n", - "* [IPDPS 2024 HiCOMB Workshop Paper](https://github.com/flux-framework/dyad/blob/main/docs/_static/Paper_2024_IPDPS_HiCOMB_DYAD.pdf)\n", "\n", "And, of course, you can always reach out to us on any of our [project repositories](https://flux-framework.org) and ask any questions that you have. We'd love your contribution to code, documentation, or just saying hello!\n", "\n", diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/0.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/0.sh new file mode 100755 index 0000000..a11e231 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/0.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "Once upon a time... πŸ“—οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/1.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/1.sh new file mode 100755 index 0000000..f958b58 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/1.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "There was a little duck πŸ¦†οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/2.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/2.sh new file mode 100755 index 0000000..fe6930b --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/2.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "Her name was pizzaquack πŸ•οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/3.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/3.sh new file mode 100755 index 0000000..7ba6b82 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/3.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "She was very fond of cheese πŸ§€οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/4.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/4.sh new file mode 100755 index 0000000..089c949 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/4.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "And running Flux πŸŒ€οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/5.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/5.sh new file mode 100755 index 0000000..3c00920 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/5.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "And so she ran Flux, while she ate her cheese πŸ˜‹οΈ" diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/6.sh b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/6.sh new file mode 100755 index 0000000..9233634 --- /dev/null +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/flux-workflow-examples/bulksubmit/6.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +echo "And was so happy! The end. 🌈️"