This recipe creates a ParallelCluster system where you can try out Amazon EC2 Hpc7a instances.
The cluster design includes the following features:
- It is configured with four queues: one for each Hpc7a instance type.
- Memory-aware Slurm scheduling is enabled.
- General-purpose shared storage is available at
/shared
. - High-performance shared scratch storage (based on Amazon FSx for Lustre) is available at
/fsx
.
- Navigate to the AWS Service Quotas console. Change to either the us-east-2 or eu-west-1 Region, depending on where you want to launch your test cluster.
- Search for hpc7a and make sure your Applied quota value is sufficient to allow instance launches.
- If it is not, choose the Request increase at account-level option and wait for your request to be processed. Then, return to this exercise.
- Ensure you have an Amazon EC2 SSH key created in the Region where you want to launch your cluster.
- Launch the template:
- Follow the instructions in the AWS CloudFormation console. When you configure the queue sizes (i.e.
ComputeInstanceMax12
), choose a value that is consistent with your service quota. - Monitor the status of the AWS CloudFormation stack. When its status reaches
CREATE_COMPLETE
, navigate to its Outputs tab. There is information there you can use to access the new cluster.
Note: This template creates a VPC and subnets associcated with the cluster. If you wish to use your own networking configuration, launch your cluster using the alternative CloudFormation template.
If you want to use SSH to access the cluster, you will need its public IP (from above). Using your local terminal, connect via SSH like so: ssh -i KeyPair.pem ec2-user@HeadNodeIp
where KeyPair.pem
is the path to the EC2 keypair you specified when launcing the cluster and HeadNodeIp
is the IP address from above. If you chose one of the Ubuntu operating systems for your cluster, the login name may be ubuntu
rather than ec2-user
.
You can also use AWS Systems Manager to access the cluster. You can follow the link found in Outputs > SystemManagerUrl. Or, you can navigate to the Instances panel in the Amazon EC2 Console. Find the instance named HeadNode - this is your cluster's access node. Select that instance, then choose Actions followed by Connect. On the Connect to instance page, navigate to Session Manager then choose Connect.
Once you are on the system, you can inspect the queues. You will see one queue for each Hpc7a instance.
% sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
q96x* up infinite 64 idle~ q96x-dy-nodes-[1-32]
q12x up infinite 64 idle~ q12x-dy-nodes-[1-32]
q24x up infinite 64 idle~ q24x-dy-nodes-[1-32]
q48x up infinite 64 idle~ q48x-dy-nodes-[1-32]
You can use the /shared
directory for common software and data files, while the /fsx
directory is well-suited for running jobs.
When you are done using your cluster, you can delete it and all its associated resources by navigating to the AWS CloudFormation console and deleting the relevant stack. Note that data on the /shared
and /fsx
volumes will be deleted. If you want to keep it, find the relevant Elastic Block Store and FSx for Lustre volumes in the AWS console and back them up.