Skip to content

Latest commit

 

History

History
52 lines (34 loc) · 4.2 KB

File metadata and controls

52 lines (34 loc) · 4.2 KB

Hpc7a Test Cluster

Info

This recipe creates a ParallelCluster system where you can try out Amazon EC2 Hpc7a instances.

The cluster design includes the following features:

  • It is configured with four queues: one for each Hpc7a instance type.
  • Memory-aware Slurm scheduling is enabled.
  • General-purpose shared storage is available at /shared.
  • High-performance shared scratch storage (based on Amazon FSx for Lustre) is available at /fsx.

Usage

Check your Service Quota

  1. Navigate to the AWS Service Quotas console. Change to either the us-east-2 or eu-west-1 Region, depending on where you want to launch your test cluster.
  2. Search for hpc7a and make sure your Applied quota value is sufficient to allow instance launches.
  3. If it is not, choose the Request increase at account-level option and wait for your request to be processed. Then, return to this exercise.

Launch the Cluster

  1. Ensure you have an Amazon EC2 SSH key created in the Region where you want to launch your cluster.
  2. Launch the template: Launch stack
  3. Follow the instructions in the AWS CloudFormation console. When you configure the queue sizes (i.e. ComputeInstanceMax12), choose a value that is consistent with your service quota.
  4. Monitor the status of the AWS CloudFormation stack. When its status reaches CREATE_COMPLETE, navigate to its Outputs tab. There is information there you can use to access the new cluster.

Note: This template creates a VPC and subnets associcated with the cluster. If you wish to use your own networking configuration, launch your cluster using the alternative CloudFormation template.

Access the Cluster

If you want to use SSH to access the cluster, you will need its public IP (from above). Using your local terminal, connect via SSH like so: ssh -i KeyPair.pem ec2-user@HeadNodeIp where KeyPair.pem is the path to the EC2 keypair you specified when launcing the cluster and HeadNodeIp is the IP address from above. If you chose one of the Ubuntu operating systems for your cluster, the login name may be ubuntu rather than ec2-user.

You can also use AWS Systems Manager to access the cluster. You can follow the link found in Outputs > SystemManagerUrl. Or, you can navigate to the Instances panel in the Amazon EC2 Console. Find the instance named HeadNode - this is your cluster's access node. Select that instance, then choose Actions followed by Connect. On the Connect to instance page, navigate to Session Manager then choose Connect.

Once you are on the system, you can inspect the queues. You will see one queue for each Hpc7a instance.

% sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
q96x*       up   infinite     64  idle~ q96x-dy-nodes-[1-32]
q12x        up   infinite     64  idle~ q12x-dy-nodes-[1-32]
q24x        up   infinite     64  idle~ q24x-dy-nodes-[1-32]
q48x        up   infinite     64  idle~ q48x-dy-nodes-[1-32]

You can use the /shared directory for common software and data files, while the /fsx directory is well-suited for running jobs.

Cleaning Up

When you are done using your cluster, you can delete it and all its associated resources by navigating to the AWS CloudFormation console and deleting the relevant stack. Note that data on the /shared and /fsx volumes will be deleted. If you want to keep it, find the relevant Elastic Block Store and FSx for Lustre volumes in the AWS console and back them up.