Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream-main/main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
bgunnar5 committed Oct 28, 2024
2 parents acb1820 + 9e27798 commit 5bf7516
Show file tree
Hide file tree
Showing 94 changed files with 2,977 additions and 610 deletions.
22 changes: 20 additions & 2 deletions .github/workflows/push-pr_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,29 @@ jobs:
if: github.event_name == 'pull_request'

steps:
- uses: actions/checkout@v1
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0 # Checkout the whole history, in case the target is way far behind

- name: Check if target branch has been merged
run: |
if git merge-base --is-ancestor ${{ github.event.pull_request.base.sha }} ${{ github.sha }}; then
echo "Target branch has been merged into the source branch."
else
echo "Target branch has not been merged into the source branch. Please merge in target first."
exit 1
fi
- name: Check that CHANGELOG has been updated
run: |
# If this step fails, this means you haven't updated the CHANGELOG.md file with notes on your contribution.
git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | grep '^CHANGELOG.md$' && echo "Thanks for helping keep our CHANGELOG up-to-date!"
if git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | grep -q '^CHANGELOG.md$'; then
echo "Thanks for helping keep our CHANGELOG up-to-date!"
else
echo "Please update the CHANGELOG.md file with notes on your contribution."
exit 1
fi
Lint:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -95,6 +112,7 @@ jobs:
python3 -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip3 install -r requirements/dev.txt
pip freeze
- name: Install singularity
run: |
Expand Down
52 changes: 52 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,58 @@ All notable changes to Merlin will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.12.2b1]
### Added
- Conflict handler option to the `dict_deep_merge` function in `utils.py`
- Ability to add module-specific pytest fixtures
- Added fixtures specifically for testing status functionality
- Added tests for reading and writing status files, and status conflict handling
- Added tests for the `dict_deep_merge` function
- Pytest-mock as a dependency for the test suite (necessary for using mocks and fixtures in the same test)
- New github action test to make sure target branch has been merged into the source first, so we know histories are ok
- Check in the status commands to make sure we're not pulling statuses from nested workspaces
- Added `setuptools` as a requirement for python 3.12 to recognize the `pkg_resources` library
- Patch to celery results backend to stop ChordErrors being raised and breaking workflows when a single task fails
- New step return code `$(MERLIN_RAISE_ERROR)` to force an error to be raised by a task (mainly for testing)
- Added description of this to docs
- New test to ensure a single failed task won't break a workflow

### Changed
- `merlin info` is cleaner and gives python package info
- merlin version now prints with every banner message
- Applying filters for `merlin detailed-status` will now log debug statements instead of warnings
- Modified the unit tests for the `merlin status` command to use pytest rather than unittest
- Added fixtures for `merlin status` tests that copy the workspace to a temporary directory so you can see exactly what's run in a test
- Batch block and workers now allow for variables to be used in node settings
- Task id is now the path to the directory

### Fixed
- Bugfix for output of `merlin example openfoam_wf_singularity`
- A bug with the CHANGELOG detection test when the target branch isn't in the ci runner history
- Link to Merlin banner in readme
- Issue with escape sequences in ascii art (caught by python 3.12)
- Bug where Flux wasn't identifying total number of nodes on an allocation
- Not supporting Flux versions below 0.17.0


## [1.12.1]
### Added
- New Priority.RETRY value for the Celery task priorities. This will be the new highest priority.
- Support for the status command to handle multiple workers on the same step
- Documentation on how to run cross-node workflows with a containerized server (`merlin server`)

### Changed
- Modified some tests in `test_status.py` and `test_detailed_status.py` to accommodate bugfixes for the status commands

### Fixed
- Bugfixes for the status commands:
- Fixed "DRY RUN" naming convention so that it outputs in the progress bar properly
- Fixed issue where a step that was run with one sample would delete the status file upon condensing
- Fixed issue where multiple workers processing the same step would break the status file and cause the workflow to crash
- Added a catch for the JSONDecodeError that would potentially crash a run
- Added a FileLock to the status write in `_update_status_file()` of `MerlinStepRecord` to avoid potential race conditions (potentially related to JSONDecodeError above)
- Added in `export MANPAGER="less -r"` call behind the scenes for `detailed-status` to fix ASCII error

## [1.12.0]
### Added
- A new command `merlin queue-info` that will print the status of your celery queues
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#
# LLNL-CODE-797170
# All rights reserved.
# This file is part of Merlin, Version: 1.12.0.
# This file is part of Merlin, Version: 1.12.2b1.
#
# For details, see https://github.com/LLNL/merlin.
#
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![Pull requests](https://img.shields.io/github/issues-pr/LLNL/merlin)](https://github.com/LLNL/merlin/pulls)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/LLNL/merlin.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/LLNL/merlin/context:python)

![Merlin](https://raw.githubusercontent.com/LLNL/merlin/main/docs/images/merlin.png)
![Merlin](https://raw.githubusercontent.com/LLNL/merlin/main/docs/assets/images/merlin_banner_white.png)

## A brief introduction to Merlin
Merlin is a tool for running machine learning based workflows. The goal of
Expand Down
26 changes: 13 additions & 13 deletions docs/tutorial/4_run_simulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,8 @@ In the `openfoam_wf_singularity` directory you should see the following:
<figcaption>Fig 3. openfoam_wf Directory Structure</figcaption>
</figure>

- `openfoam_wf.yaml` -- this spec file is partially blank. You will fill in the gaps as you follow this module's steps.
- `openfoam_wf_template.yaml` -- this is a complete spec file. You can always reference it as an example.
- `openfoam_wf_singularity.yaml` -- this spec file is partially blank. You will fill in the gaps as you follow this module's steps.
- `openfoam_wf_singularity_template.yaml` -- this is a complete spec file. You can always reference it as an example.
- `scripts` -- This directory contains all the necessary scripts for this module.
- We'll be exploring these scripts as we go with the tutorial.
- `requirements.txt` -- this is a text file listing this workflow's python dependencies.
Expand All @@ -87,22 +87,22 @@ We are going to build a spec file that produces this DAG:
<figcaption>Fig 4. OpenFOAM DAG</figcaption>
</figure>

**To start, open** `openfoam_wf.yaml` **using your favorite text editor.**
**To start, open** `openfoam_wf_singularity.yaml` **using your favorite text editor.**

It should look something like this:

???+ abstract "Initial Contents of the Spec"

<!--codeinclude-->
[openfoam_wf.yaml](../../merlin/examples/workflows/openfoam_wf_singularity/openfoam_wf.yaml)
[openfoam_wf_singularity.yaml](../../merlin/examples/workflows/openfoam_wf_singularity/openfoam_wf_singularity.yaml)
<!--/codeinclude-->

### Variables

First we specify some variables to make our life easier. Locate the `env` block in our yaml spec:

<!--codeinclude-->
[](../../merlin/examples/workflows/openfoam_wf_singularity/openfoam_wf.yaml) lines:9-15
[](../../merlin/examples/workflows/openfoam_wf_singularity/openfoam_wf_singularity.yaml) lines:9-15
<!--/codeinclude-->

The `OUTPUT_PATH` variable is set to tell Merlin where you want your output directory to be written. The default is the current working directory.
Expand Down Expand Up @@ -254,7 +254,7 @@ nonsimworkers:

### Putting It All Together

By the end, your `openfoam_wf.yaml` should look like the template version in the same directory:
By the end, your `openfoam_wf_singularity.yaml` should look like the template version in the same directory:

???+ abstract "Complete Spec File"

Expand All @@ -273,21 +273,21 @@ Now that you are done with the Specification file, use the following commands fr
Create the DAG and send tasks to the server with:

```bash
merlin run openfoam_wf.yaml
merlin run openfoam_wf_singularity.yaml
```

Open a new terminal window, then start the workers that will consume the tasks we just queued by using:

```bash
merlin run-workers openfoam_wf.yaml
merlin run-workers openfoam_wf_singularity.yaml
```

But wait! We realize that 10 samples is not enough to train a good model. We would like to restart with 100 samples instead of 10 (should take about 6 minutes):

After sending the workers to start on their queues, we need to first stop the workers:

```bash
merlin stop-workers --spec openfoam_wf.yaml
merlin stop-workers --spec openfoam_wf_singularity.yaml
```

!!! tip
Expand All @@ -297,25 +297,25 @@ merlin stop-workers --spec openfoam_wf.yaml
We stopped these tasks from running but if we were to run the workflow again (with 100 samples instead of 10), we would continue running the 10 samples first! This is because the queues are still filled with the previous attempt's tasks. This can be seen with:

```bash
merlin status openfoam_wf.yaml
merlin status openfoam_wf_singularity.yaml
```

We need to purge these queues first in order to repopulate them with the appropriate tasks. This is where we use the `merlin purge` command:

```bash
merlin purge openfoam_wf.yaml
merlin purge openfoam_wf_singularity.yaml
```

Now we are free to repopulate the queues with the 100 samples. In our terminal window that's not designated for our workers, we'll queue up tasks again, this time with 100 samples:

```bash
merlin run openfoam_wf.yaml --vars N_SAMPLES=100
merlin run openfoam_wf_singularity.yaml --vars N_SAMPLES=100
```

Then in our window for workers, we'll execute:

```bash
merlin run-workers openfoam_wf.yaml
merlin run-workers openfoam_wf_singularity.yaml
```

To see your results, look inside the `learn` output directory. You should see a png that looks like this:
Expand Down
Loading

0 comments on commit 5bf7516

Please sign in to comment.