Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for typo and directory changes in DYAD #40

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/docker-builds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ jobs:
strategy:
fail-fast: false
matrix:
test: [["2024-RADIUSS-AWS/JupyterNotebook", "docker/Dockerfile.hub", "ghcr.io/flux-framework/flux-jupyter-hub:radiuss-2024"],
["2024-RADIUSS-AWS/JupyterNotebook", "docker/Dockerfile.init", "ghcr.io/flux-framework/flux-jupyter-init:radiuss-2024"],
["2024-RADIUSS-AWS/JupyterNotebook", "docker/Dockerfile.spawn", "ghcr.io/flux-framework/flux-jupyter-spawn:radiuss-2024"]]
test: [["2024-HPCIC-AWS/JupyterNotebook", "docker/Dockerfile.hub", "ghcr.io/flux-framework/flux-jupyter-hub:hpcic-2024"],
["2024-HPCIC-AWS/JupyterNotebook", "docker/Dockerfile.init", "ghcr.io/flux-framework/flux-jupyter-init:hpcic-2024"],
["2024-HPCIC-AWS/JupyterNotebook", "docker/Dockerfile.spawn", "ghcr.io/flux-framework/flux-jupyter-spawn:hpcic-2024"]]

# Tutorials are over - these builds are disabled
# ["2023-RADIUSS-AWS/JupyterNotebook", "docker/Dockerfile.hub", "ghcr.io/flux-framework/flux-jupyter-hub:2023"],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Let's build a set of images - one spawner and one hub, and an init. You can cust
Remember that if you just want to test locally, you can jump to the [local usage](#local-usage) section.

```bash
docker build -t ghcr.io/flux-framework/flux-jupyter-hub:radiuss-2024 -f docker/Dockerfile.hub .
docker build -t ghcr.io/flux-framework/flux-jupyter-spawn:radiuss-2024 -f docker/Dockerfile.spawn .
docker build -t ghcr.io/flux-framework/flux-jupyter-init:radiuss-2024 -f docker/Dockerfile.init .
docker build -t ghcr.io/flux-framework/flux-jupyter-hub:hpcic-2024 -f docker/Dockerfile.hub .
docker build -t ghcr.io/flux-framework/flux-jupyter-spawn:hpcic-2024 -f docker/Dockerfile.spawn .
docker build -t ghcr.io/flux-framework/flux-jupyter-init:hpcic-2024 -f docker/Dockerfile.init .
```

Note that these are available under the flux-framework organization GitHub packages, so you shouldn't need
Expand Down Expand Up @@ -79,8 +79,8 @@ you should install.
# Create an EKS cluster with autoscaling with default storage
eksctl create cluster --config-file aws/eksctl-config.yaml

# Create an EKS cluster with io1 node storage but no autoscaling, used for the RADIUSS 2023 tutorial
eksctl create cluster --config-file aws/eksctl-radiuss-tutorial-2023.yaml
# Create an EKS cluster with io1 node storage but no autoscaling, used for the HPCIC 2024 tutorial
eksctl create cluster --config-file aws/eksctl-hpcic-tutorial-2023.yaml
```

You can find vanilla (manual) instructions [here](https://z2jh.jupyter.org/en/stable/kubernetes/amazon/step-zero-aws-eks.html) if you
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ hub:
# This is the image I built based off of jupyterhub/k8s-hub, 3.0.2 at time of writing this
image:
name: ghcr.io/flux-framework/flux-jupyter-hub
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always

# # https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders
Expand All @@ -46,7 +46,7 @@ proxy:
singleuser:
image:
name: ghcr.io/flux-framework/flux-jupyter-spawn
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always
cpu:
limit: 2
Expand All @@ -59,7 +59,7 @@ singleuser:
# This runs as the root user, who clones and changes ownership to uid 1000
initContainers:
- name: init-myservice
image: ghcr.io/flux-framework/flux-jupyter-init:radiuss-2024
image: ghcr.io/flux-framework/flux-jupyter-init:hpcic-2024
command: ["/entrypoint.sh"]
volumeMounts:
- name: flux-tutorial
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ hub:
# This is the image I built based off of jupyterhub/k8s-hub, 3.0.2 at time of writing this
image:
name: ghcr.io/flux-framework/flux-jupyter-hub
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always

# https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders
Expand All @@ -32,7 +32,7 @@ scheduling:
singleuser:
image:
name: ghcr.io/flux-framework/flux-jupyter-spawn
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always
cpu:
limit: 1
Expand All @@ -43,7 +43,7 @@ singleuser:
# This runs as the root user, who clones and changes ownership to uid 1000
initContainers:
- name: init-myservice
image: ghcr.io/flux-framework/flux-jupyter-init:radiuss-2024
image: ghcr.io/flux-framework/flux-jupyter-init:hpcic-2024
command: ["/entrypoint.sh"]
volumeMounts:
- name: flux-tutorial
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ RUN python3 -m pip install -r requirements.txt && \
python3 -m IPython kernel install

# This is code to install DYAD
# This was added to the RADIUSS 2023 tutorials on AWS
# This was added to the RADIUSS 2023 tutorials on AWS (changed to HPCIC in 2024)
RUN git clone https://github.com/openucx/ucx.git \
&& cd ucx \
&& git checkout v1.13.1 \
Expand Down Expand Up @@ -91,7 +91,7 @@ RUN apt-get update && apt-get install -y nodejs && apt-get clean && rm -rf /var/
RUN wget https://nodejs.org/dist/v20.15.0/node-v20.15.0-linux-x64.tar.xz && \
apt-get update && apt-get install -y xz-utils && rm -rf /var/lib/apt/lists/* && \
xz -d -v node-v20.15.0-linux-x64.tar.xz && \
tar -C /usr/local --strip-components=1 -xvf node-v20.15.0-linux-x64.tar
tar -C /usr/local --strip-components=1 -xvf node-v20.15.0-linux-x64.tar

# This customizes the launcher UI
# https://jupyter-app-launcher.readthedocs.io/en/latest/usage.html
Expand All @@ -113,6 +113,11 @@ COPY ./docker/flux-icon.png $HOME/flux-icon.png
# note that previous examples are added via git volume in config.yaml
ENV SHELL=/usr/bin/bash
ENV FLUX_URI_RESOLVE_LOCAL=t
# Prepend /usr/lib to LD_LIBRARY_PATH because Ubuntu Jammy comes with
# UCX 1.12. Without this, DYAD will build (correctly) with UCX 1.13.1, but
# it will try to run with the system install of UCX 1.12, which can cause
# either a crash or a hang.
ENV LD_LIBRARY_PATH="/usr/lib:$LD_LIBRARY_PATH"

EXPOSE 8888
ENTRYPOINT ["tini", "--"]
Expand All @@ -132,7 +137,7 @@ RUN mkdir -p $HOME/.local/share && \
# flux start flux account add-user --username=jovyan --bank=default && \
# flux start flux jobtap load mf_priority.so && \
# flux start flux account-update-db

USER ${NB_USER}

CMD ["flux", "start", "--test-size=4", "jupyter", "lab"]
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,4 @@

# We need to clone to the user home, and then change permissions to uid 1000
# That uid is shared by jovyan here and the spawn container
# git clone https://github.com/rse-ops/flux-radiuss-tutorial-2023 /home/jovyan/flux-tutorial
chown -R 1000 /home/jovyan
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ hub:
# This is the image I built based off of jupyterhub/k8s-hub, 3.0.2 at time of writing this
image:
name: ghcr.io/flux-framework/flux-jupyter-hub
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always

# https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders
Expand All @@ -32,22 +32,14 @@ scheduling:
singleuser:
image:
name: ghcr.io/flux-framework/flux-jupyter-spawn
tag: "radiuss-2024"
tag: "hpcic-2024"
pullPolicy: Always
cpu:
limit: 1
memory:
limit: '4G'
cmd: /entrypoint.sh

# initContainers:
# - name: init-myservice
# image: alpine/git
# command: ["git", "clone", "https://github.com/rse-ops/flux-radiuss-tutorial-2023", "/home/jovyan/flux-tutorial"]
# volumeMounts:
# - name: flux-tutorial
# mountPath: /home/jovyan

# This is how we get the tutorial files added
storage:
type: none
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
"\n",
"# Getting started with Flux\n",
"\n",
"The code and examples that this tutorial is based on can be found at [flux-framework/Tutorials](https://github.com/flux-framework/Tutorials/tree/master/2024-RADIUSS-AWS). You can also find python examples in the `flux-workflow-examples` directory from the sidebar navigation in this JupyterLab instance. "
"The code and examples that this tutorial is based on can be found at [flux-framework/Tutorials](https://github.com/flux-framework/Tutorials/tree/master/2024-HPCIC-AWS). You can also find python examples in the `flux-workflow-examples` directory from the sidebar navigation in this JupyterLab instance. "
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"\n",
"> But what do I do now?\n",
"\n",
"Feel free to experiment more with Flux here, or (for more freedom) in the terminal. You can try more of the examples in the `flux-workflow-examples` directory in the window to the left. If you're using a shared system like the one on the RADIUSS AWS tutorial please be mindful of other users and don't run compute intensive workloads. If you're running the tutorial in a job on an HPC cluster... compute away! ⚾️\n",
"Feel free to experiment more with Flux here, or (for more freedom) in the terminal. You can try more of the examples in the `flux-workflow-examples` directory in the window to the left. If you're using a shared system like the one on the HPCIC AWS tutorial please be mindful of other users and don't run compute intensive workloads. If you're running the tutorial in a job on an HPC cluster... compute away! ⚾️\n",
"\n",
"> Where can I learn to set this up on my own?\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ reader:
read_threads: 1
file_shuffle: seed
sample_shuffle: seed
multiprocessing_context: spawn
multiprocessing_context: fork
data_loader_classname: dyad_torch_data_loader.DyadTorchDataLoader
data_loader_sampler: index

Expand All @@ -32,4 +32,4 @@ checkpoint:
checkpoint_folder: checkpoints/unet3d
checkpoint_after_epoch: 5
epochs_between_checkpoints: 2
model_size: 499153191
model_size: 499153191
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ def __init__(self, format_type, dataset_type, epoch, num_samples, num_workers, b
self.reader = None
self.num_images_read = 0
self.batch_size = batch_size
self.broker_per_node = 1
args = ConfigArguments.get_instance()
self.serial_args = pickle.dumps(args)
if num_workers == 0:
self.worker_init(-1)
self.broker_per_node = 1

def worker_init(self, worker_id):
# Configure PyTorch components
Expand Down Expand Up @@ -138,12 +138,10 @@ def read(self):
prefetch_factor = math.ceil(self._args.prefetch_size / self._args.read_threads)
else:
prefetch_factor = self._args.prefetch_size
if prefetch_factor > 0:
if self._args.my_rank == 0:
else:
if prefetch_factor <= 0:
prefetch_factor = 2
if self._args.my_rank == 0:
logging.debug(f"{utcnow()} Setup dataloader with {self._args.read_threads} workers {torch.__version__}")
logging.debug(f"{utcnow()} Setup dataloader with {self._args.read_threads} workers {torch.__version__}")
if self._args.read_threads==0:
kwargs={}
else:
Expand Down
Loading
Loading