Yuegroup

Hardware

  • Head node: yuegroup.cac.cornell.edu.
  • Access modes: ssh
  • OpenHPC 3 with Rocky Linux 9.3
  • 4 compute nodes (c0001-c0004). Each node has dual 64-core AMD EPYC 7713 processors, 1 TB of RAM, and 4 NVidia A100 GPUs
  • Hyperthreading is enabled on all nodes, i.e., each physical core is considered to consist of two logical CPUs
  • Interconnect is 10 Gbps ethernet
  • Submit HELP requests: help OR by sending an email to CAC Support please include Yuegroup in the subject area.

File Systems

  • 82TB in /home

Home Directories

  • Path: ~

    User home directories is located on a NFS export from the head node. Use your home directory (~) for archiving the data you wish to keep. Data in user's home directories is NOT backed up.

Scheduler/Partitions

  • The cluster scheduler is Slurm.
  • Partitions are: sy593_0001 (nodes c0001-c0004) and eht45_0001 (c0005).
  • There are no time limits.
  • See Slurm documentation page for details. The Slurm Quick Start guide is a great place to start. See the Requesting GPUs section for information on how to request GPUs on compute nodes for your jobs.

    1. --gres=gpu:3g.20gb:<number of MIG devices> to request MIG devices. The job will land on c0004.
    2. --gres=gpu:a100:<number of GPUs> to request entire A100 GPUs. The job will land on node c0001, c0002, or c0003.
    3. --gres=gpu:a6000:<number of GPUs> to request A6000 GPUs. The job will land on c0005.
    4. interactive login: srun --pty --gres gpu:a100:<number of GPU to request up to 4> /bin/bash
    5. submit a batch job: sbatch --gres gpu:a100:<number of GPU to request up to 4> <job script>
  • Remember, hyperthreading is enabled on the cluster, so Slurm considers each physical core to consist of two logical CPUs.

Software

Work with Environment Modules

Set up the working environment for each package using the module command. The module command will activate dependent modules if there are any.

To show currently loaded modules: (These modules are loaded by default system configurations)

-bash-4.2$ module list

To show all available modules:

-bash-4.2$ module avail
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

Manage Modules in Your Python Virtual Environment

Using Anaconda

To use anaconda for the first time:

-bash-4.2$ source /opt/ohpc/pub/utils/anaconda/2024.02/bin/activate 
-bash-4.2$ conda init

Log out and log back in. The "base" environment should be automatically loaded.

Users can create their own environment like this:

-bash-4.2$ conda create -n <name of the environment> <packages to be installed>

Help

  • Submit HELP requests: help OR by sending an email to CAC Support please include Yuegroup in the subject area.