×
Yeni bir madde oluştur
Sayfa başlığınızı buraya yazın:
We currently have 15 articles on UHeM Wiki. Type your article name above or create one of the articles listed here!



    UHeM Wiki
    15Maddeler

    Submitting Jobs to Sariyer Cluster

    In order to use the Sariyer cluster, you should ssh to sariyer.uhem.itu.edu.tr.


    Quick Start Manual

    To use use the Sariyer cluster, one needs to login to sariyer.uhem.itu.edu.tr (via ssh or x2go). Jobs cannot be run on this login node. Instead, a SLURM script should be prepared and the job should be submitted to the queue. SLURM will then try to allocate computing node(s) according to the resources requested in the script, and start your job if such nodes are already available. If resources are not available, SLURM will queue the job and schedule it to run as soon as the resources become available.

    In a glance, these are the steps involved in using Sariyer:

    1. Establish a VPN connection and keep it on during your session (VPN Connection to UHeM),
    2. Upload any files you may need to run your job to sariyer.uhem.itu.edu.tr, using scp/WinScp, etc.,
    3. Establish an ssh connection (e.g., using Putty) to sariyer.uhem.itu.edu.tr and execute the following commands/tasks there (Accessing the Login Servers via ssh):
      1. Prepare a SLURM Job Script (Job Script):
        1. Select a project. The projelerim command shows your projects,
        2. Chose a queue. The bosmakinalar command shows available queues and current availability,
        3. Load the necessary module(s) if any,
        4. Prepare a SLURM script, using nano or any other text editor,
      2. Submit the job with the sbatch command (Submitting jobs),
      3. Check the job(s) status with the isler command,
    4. Transfer (if needed) output files from sariyer.uhem.itu.edu.tr to your local machine using Scp/WinScp,
    5. Close the ssh/Putty connection,
    6. Close the VPN connection.

    Once your job is submitted and accepted by the queue, no matter what the status of the job is (PENDING, RUNNING, etc.), the ssh and VPN connections can be closed. Your job will start whenever the required resources become available and any output will be saved to a file by SLURM. You can check your jobs/outputs any time you like, by connecting to VPN again and logging in to sariyer via ssh.

    Sarıyer cluster hardware information

    Compute node hardware:

    Node Name Processor Information Number of Cores Amount of Memory Specialized Hardware Queues Defined on

    These Nodes

    sariyer (login node) Intel(R) Xeon(R) CPU

    E5-2680 v4 @ 2.40GHz

    28 128GB N/A Login makinasında

    can't run jobs

    s001 - s080 Intel(R) Xeon(R) CPU

    E5-2680 v4 @ 2.40GHz

    28 128GB N/A defq (default queue, 7 days)

    shortq (1 hour)

    f001 - f003 Intel(R) Xeon(R) CPU

    E5-2680 v4 @ 2.40GHz

    28 128GB Nvidia GP-GPU

    (Tesla K20m)

    gpuq (7 days)
    f004 - f013 Intel(R) Xeon(R) CPU

    E5-2680 v4 @ 2.40GHz

    28 512GB N/A bigmemq (7 days)
    s101 - s115 Intel(R) Xeon(R) CPU

    E5-2680 v3 @ 2.50GHz

    24 64GB N/A longq (21 days)
    s201 - s235 Intel(R) Xeon(R) Gold

    6148 CPU @ 2.40GHz

    40 192GB N/A core40q (7 days)

    All computers in the Sariyer cluster run the CentOS 7 Linux operating system.

    Disk System

    The disc structure of the Sarıyer cluster is different from the one on the Karadeniz machine. On Sariyer, there are no separate home directories in the form of /AKDENIZ and /RS. Also, because Sariyer’s disk is completely separate from the disk mounted on the older systems, whenever you need to access files that are located on the other cluster, you will have to copy those files. For instance, in the Sariyer system, if you need to work on files that are actually located on the older systems (i.e., Karadeniz, Anadolu, Ege, etc.), you will have to copy/transfer those files to the sariyer machine (via scp, sftp, etc.).

    Sariyer’s disk is named /okyanus and /okyanus/progs is the directory where you will find some of the most widely used programs at UHeM.

    Directories such as /RS/progs and /RS/progs2 that were present in the older systems, were carried to Sariyer as is. There is no guarantee that any of the programs in these directories will work properly, so try not to use them unless really needed. If you need to use them, please make sure they produce meaningful results in terms of accuracy and timing.

    The module files in the old systems have also been copied to Sariyer. In order to access modules from the Karadeniz and Maslak machines, you need to load the modules eski-moduller-karadeniz and eski-moduller-maslak, respectively.

    Many of the modules needed to run programs that were natively compiled for the Sariyer cluster can be seen without loading any modules (module avail will do the job). However, there are many more modules that can be seen only after loading additional modules such as ilave-moduller, ilave-moduller2, etc. (i.e., module load ilave-moduller, etc.).

    Queuing System

    A queuing system essential in managing a computer cluster and ensuring its most efficient use. It is software that queues, prioritizes, and monitors the computing jobs submitted by the users. It aims at preventing different user jobs from competing for the same resources (and from crashing when they are depleted by other jobs), by scheduling particular resources (such as compute nodes) for each job, based on the requirements specified by the users.

    Sariyer uses SLURM as a queuing system.

    Cluster structure.jpg

    Users cannot access the compute nodes directly. Instead, they connect to a machine, called login node, which serves as a means to submit their computing jobs to the compute nodes, using SLURM directives. If there are enough empty processors (and other required resources, such as RAM) in the system, the job starts immediately. If the required resources are not available, the job is held in the queue until the requested resources become available.

    The computer serving as a login node for the Sariyer cluster is called sariyer. Once you have established a VPN Connection, you can reach this machine via ssh (ssh your_username@sariyer.uhem.itu.edu.tr)


    Available Queues and Usage Stats

    To see the available SLURM queues and their current usage stats, you can either use the standard SLURM commands, or our script bosmakinalar. A sample output of this script looks like this:

    $ bosmakinalar
    
          KUYRUK	ISLEMCI	ISLEMCI	BEKLYEN	MAKINA	MAKINA	ENFAZLA-SURE	ENFAZLA
             ADI	   BOS	  TUMU	 ISTEK	   BOS	  TUMU	GUN-SA:DA:SN	MAKINA/IS
      ==========	=======	=======	=======	======	======	=============	==========
            defq	  392	 1176	  224	   14	   42	  5-00:00:00	SINIRSIZ
          shortq	   56	  196	    0	    2	    7	    01:00:00	SINIRSIZ
           longq	  196	  560	    0	    7	   20	 21-00:00:00	SINIRSIZ
            gpuq	   56	   84	    0	    2	    3	  3-00:00:00	SINIRSIZ
         bigmemq	    0	  112	    0	    0	    4	  7-00:00:00	SINIRSIZ
            mixq	  168	  364	    0	    6	   13	  5-00:00:00	SINIRSIZ
    

    Here the columns provide the following information for the given queue:

    • Column 1: available queues
    • Column 2: number of cores ready for immediate allocation
    • Column 3: total number of cores
    • Column 4: total number of cores requested by the pending jobs
    • Column 5: nodes ready for immediate use
    • Column 6: total number of nodes assigned to the queue
    • Column 7: maximum duration (wall time) the queue can be used in Days-Hours:Minutes:Seconds format
    • Column 8: maximum number of nodes per job

    defq (default queue) is the queue where your job will be submitted if you do not specify a queue name in your SLURM script. This queue is usually the best choice for ordinary jobs, as it is defined on more compute nodes compared to other queues.

    shortq is for quick tests (max 1 hour), gpuq is intended for jobs that require a GPGPU. bigmemq is intended for jobs requiring a very large amount of RAM (512 GB available on each node). The use of this queue is subject to an extra x1.5 charge (that is, each 1 hour of use is charged as 1.5 hours to your account). Also, keep in mind that running a job that doesn't use much memory on a computer that has a lot of RAM, will run a bit slower than normal. Therefore, you are advised not to use bigmemq unless really needed. Nodes that are not assigned to bigmemq have 128 GB of RAM, which is usually more than enough for most jobs.

    A Simple SLURM Script

    Although SLURM is different from the LSF queuing system (which is the one UHeM is using on the lnode cluster), the general logic is the same. Therefore, the structure of a SLURM script is very similar to that of an LSF one. The parameters given by #BSUB in LSF are given by #SBATCH in SLURM. In its simplest form, a job file looks like the following. Of course, there are many additional parameters for different needs. You can find this script in /okyanus/progs/slurm_betikleri/basit.sh:

    #!/bin/bash
    #SBATCH -A hsaat                # account / project name
    #SBATCH -n 4                    # number of cores 
    
    module load mpi/openmpi-x86_64
    
    mpirun ./your_executable.x
    

    In this example, a queue (partition) has not been defined; therefore the job will be submitted to default queue (defq). The program output will be written to a file named slurm-XXX.out, where XXX stands for job ID.

    A More Complicated SLURM Script

    #!/bin/bash
    
    #SBATCH -J "text here"     # job name 
    
    #SBATCH -A hsaat                         # account (project name)
    #SBATCH -p bigmemq                       # partition (queue) name
    
    #SBATCH -o slurm.%j.out                  # output file: %j will be replaced with job ID. There is no need to give these parameters.
    #SBATCH -e slurm.%j.err                  # error file: %j will be replaced with job ID. There is no need to give these parameters.
    
    #SBATCH -n 4                                  # number of cores
    #SBATCH -N 1                                  # number of nodes
    #SBATCH -t 0-3:00                             # run the job for 3 hours (D-HH:MM)
                                               
    #SBATCH --mail-type=END,FAIL                  # send an e-mail when job is completed 
    #SBATCH --mail-user=ahmetmc@itu.edu.tr        # e-mail address
    
    #SBATCH --checkpoint=0-0:45                   # checkpoint frequency (every 45 min)
    #SBATCH --checkpoint-dir=/okyanus/users/ali   # where to save checkpoint file
    
    #SBATCH --gres=gpu:1                          # extra resources (need 1 gpu)
    #SBATCH --exclude=m014                        # a list of nodes to be excluded
    
    
    module load mpi/openmpi-x86_64
    
    mpirun ./your_executable.x
    

    Submitting a Job and Checking its Status

    Once you have written the job script, it can be submitted to SLURM using the sbatch command:

     $ sbatch your_script.sh
    Submitted batch job 182
    

    You can check the status of the submitted job using the squeue command.

     $ squeue -u your_username
    

    Typing just squeue (without the -u parameter) will print information about all users' jobs.

    Alternatively, you can use our "isler" script to get information about the status of your jobs:

     $ isler
    Tue Feb  2 16:08:09 2016
                 JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
                   182      defq basit.sh    ahmet  RUNNING       0:32 5-00:00:00      1 s001
    

    The same job, if submitted under another SLURM account (with the -A parameter) which has ran out of CPU time, will still be accepted by SLURM, however its status will be listed as PENDING. Under the "REASON" column, there will be the explanation AssocGrpCPUMinsLimit, notifying you that there is no CPU time left in that particular account.

     $ isler
    Tue Feb  2 16:09:09 2016
                 JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
                   183      defq basit.sh    ahmet  PENDING       0:00 5-00:00:00      1 (AssocGrpCPUMinsLimit)
    


    After a job is completed, its status (such as COMPLETED, CANCELED, or FAILED) is updated in a few seconds:

     $ isler
           JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
    ------------ ---------- ---------- ---------- ---------- ---------- --------
    181            basit.sh       defq      hsaat          4 CANCELLED+      0:0
    181.batch         batch                 hsaat          4  CANCELLED     0:15
    182            basit.sh       defq      hsaat          4    RUNNING      0:0
    183            basit.sh       defq      t0111          4    PENDING      0:0
    

    Checking the CPU and Memory Usage of Running Jobs

    It is important to check whether your job is running efficiently (or as intended). One way of doing this is checking the load on every node you use, by connecting to it via ssh. This is not very practical for jobs using many nodes, but our "isler" script can do this for you if you pass the Job ID to it. For example:

    $ isler 1500
    | MAKINA  YUK      HAFIZA  SWAP | MAKINA  YUK      HAFIZA  SWAP | MAKINA  YUK      HAFIZA  SWAP |
    |-------------------------------|-------------------------------|-------------------------------|
    | s001  27/28     21/3/128  0/4 | s002  26/28     19/3/128  0/4 | s003  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s004  26/28     19/3/128  0/4 | s005  26/28     19/3/128  0/4 | s006  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s007  26/28     19/3/128  0/4 | s008  26/28     19/3/128  0/4 | s009  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s010  27/28     19/3/128  0/4 | s011  26/28     19/3/128  0/4 | s012  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s013  26/28     19/3/128  0/4 | s014  26/28     19/3/128  0/4 | s015  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s016  26/28     19/3/128  0/4 | s017  25/28     19/3/128  0/4 | s018  26/28     19/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    | s019  25/28     18/3/128  0/4 |
    |-------------------------------|-------------------------------|-------------------------------|
    

    Here the MAKINA column lists the node names, YUK column shows the load and number of cores on that node, HAFIZA column lists the memory information (requested memory in GB / actual usage in GB / total RAM installed on the node in GB), and the SWAP column shows if your job has ran out of memory and started using swap (used swap in GB / total swap in GB).

    "Requested memory" in the above output is the sum of Virtual Memory Size values ("VIRT" column in the Unix "top" command, or "VSZ" in the "ps" command"). This information is important, since programs can run into trouble when they request a big amount of memory and fail to get it.


    If your job is running on a node containing a GPU, the output of the "isler" script will also list the name and load-memory usage percentage of the GPU card:

    $ isler 1504
    | MAKINA  YUK      HAFIZA  SWAP | MAKINA  YUK      HAFIZA  SWAP | MAKINA  YUK      HAFIZA  SWAP |
    |-------------------------------|-------------------------------|-------------------------------|
    | f001  28/28    93/47/128  0/4 | f001 Tesla-K20m Yuk% 0 Hfz% 0 |
    |-------------------------------|-------------------------------|-------------------------------|
    

    Here, the column lists the load, memory, and swap information for the compute node (f001 in this example), and the second column lists the load and memory usage of the GPU card (Tesla-K20m here).

    Ending a Running Job

    If you want to end a job that is currently running (or queued), you can use the scancel command with the job ID as a parameter:

     $ scancel  182
    

    scancel will end job 182 without printing anything on the screen. This job will show as CANCELLED+ in the output of the sacct command.

    Logging in to the Compute Node where a Job is Running

    Job 182 in the above examples is labeled as RUNNING. One may want to access the actual node(s) where this job is running with the purpose of debugging, checking the memory, CPU usage, etc. It is possible to do this via ssh. To obtain a list a of the nodes where a job is running, one can use the squeue command (e.g. squeue -l -u your_username).

    Please note that you can only access the relevant compute nodes only while your job is running. If you try to login to a node where you are not currently running a job, you get the following message:

    $ ssh s006
    Access denied: user ahmet (uid=14) has no active jobs on this node.
    Connection closed by 10.12.73.6
    


    Starting an Interactive Job

    Sometimes you may need to start a job on the machine you are logged in. However this is discouraged on the login (sariyer) node to prevent a system crash which would affect all users. Any user processes using considerable CPU or memory resources on the login node will be automatically killed.

    Instead, for testing or debugging purposes you can start an interactive job on a compute node from the command line using the srun command. Any queue can be used to do that, however, if all compute nodes are busy at the moment, it usually makes sense to use the shortq queue for this purpose.

    [ahmet@sariyer ~ ]$ srun -A hsaat -N 1 -n 28 -p shortq --pty bash -i
    srun: job 8475 queued and waiting for resources
    srun: job 8475 has been allocated resources
    [ahmet@f002 ~ 8475]$
    

    As seen above, we started a job with the srun command while on the login (sariyer) node, however now we are on node f002. This machine is ours until we exit the bash shell that we started, using the exit command or pressing Ctrl-C, or until the time limit of the queue is reached. Please note that even if you are not actually running any programs on the compute node, your account will be charged as long as it is allocated to you. While a node is allocated to you, you will see the JobID assigned by SLURM on the command line (8475 in the example case)


    Accessing Sariyer via a Graphical User Interface

    In order to provide easy graphical interface access to Sariyer, x2go has been installed. This software alleviates the problem of extremely slow graphics typical of opening X11 windows via ssh connections. However, establishing a VPN connection prior to using x2go is still required. In order to use use x2go, the client program (http://wiki.x2go.org/doku.php/download:start) should be installed on the computer being used to connect to Sariyer

    Using Singularity and Docker Containers

    Docker containers, which are very popular today, are unfortunately not very suitable for the HPC environment. One of the few software packages that aim to solve the problems targeted by Docker, and offer the container technology to the service of compute clusters (i. e., HPC clusters) is Singularity software. Thanks to this software package, docker containers can be used by converting them to Singularity container format in an automated way.


    Singularity software is installed in the Sariyer cluster. In addition, many prepackaged containers such as:

    • ubuntu-16.04
    • ubuntu-18.04
    • ubuntu-18.10
    • centos5-vault
    • centos-6
    • openfoam.com-v1812
    • wgd (A software suite for processing genome data)
    • bowtie2

    are readily available at "/okyanus/progs/Containers" on our Saryer system.

    If you want to use a container other than these containers, there are two available options: The first option is to send us the link of the container you want to use and allow us to make it usable by converting it, or you can install the Singularity software on your own linux computer and prepare your own container, and then transfer it to our system for future usage.