ISS-AF howto and usage documentation#
ISS Analisys Facility is a (small) cluster dedicated to local research activities (no GRID)
Cluster information#
Compute resources#
Particular nodes specifications:
HOSTNAMES PARTITION S:C:T CPUS MEMORY WEIGHT
issaf-0-0 CLUSTER 2:32:2 128 515000 100
issaf-0-1 CLUSTER 2:32:2 128 515000 100
issaf-0-2 CLUSTER 1:64:2 128 515000 50
issaf-0-3 CLUSTER 1:64:2 128 515000 50
issaf-0-4 CLUSTER 1:64:2 128 515000 50
issaf-0-5 CLUSTER 1:64:2 128 515000 50
issaf-0-6 CLUSTER 2:128:2 512 2097152 10
issaf-0-7 CLUSTER 2:144:2 576 2310755 10
Total resources:
Partition #Nodes #CPU_cores Job_Nodes MaxJobTime Cores Mem/Node
Name State Total Total Min Max Day-hr:mn /node (GB)
CLUSTER:* up 8 1856 1 2-00:00 128 515+
MPI up 8 1856 1 infin 1-12:00 128 515+
GPU up 1 512 1 infin 1-12:00 512 2097
The issaf-0-6 node have two H200 NVidia GPUs and is has associated the GPU queue:
HOSTNAMES PARTITION S:C:T CPUS MEMORY WEIGHT
issaf-0-6 GPU 2:128:2 512 2097152 10
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05 Driver Version: 595.71.05 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H200 NVL Off | 00000000:D1:00.0 Off | 0 |
| N/A 31C P0 73W / 600W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H200 NVL Off | 00000000:F1:00.0 Off | 0 |
| N/A 32C P0 74W / 600W | 0MiB / 143771MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 224-231,480-487 28 N/A
GPU1 NV18 X 128-135,384-391 16 N/A
Storage resources#
Filesystem Size Mounted on
/dev/md126p1 1.8T /export <-- /home, RAID1 NVME
/dev/md123 1.8T /software <-- RAID1 HDD
/dev/md120p1 66T /data <-- RAID6 8xHDD
/dev/md121p1 200T /data2 <-- RAID6 12xHDD
/dev/md119p1 200T /data3 <-- RAID6 12xHDD
/dev/md124p1 200T /data4 <-- RAID6 12xHDD
10.99.99.200:/storage01 200T /data5 <-- RAID6 12xHDD
10.99.99.200:/storage02 200T /data6 <-- RAID6 12xHDD
10.99.99.200:/storage03 200T /data7 <-- RAID6 12xHDD
Resource usage#
Compute resources#
Warning
No compute loads, or long running processing should be run on the front-end node.
The front-end is both login and service node and this can impact overall usage.
Slurm scheduling is for the moment of type "first come, first served".
To be converted to fair share.
Storage resources#
/homeshould be used for small personal files, small analysis codes/softwareis dedicated for software frameworks (locally compiled)- each user have an symbolic link
$HOME/datapointing to/data/data_${USER}pointing to a much larger storage area -
for requirements over 1TB, a request should be sent to administrator for creation of a personal directory in
/data{2..7}Institute management approval required
For data usage for more than 1 TB a request to institute management is required. First step is to send an email to institute director with cc to
issaf AT spacescience DOT rothen more information will follow in subsequent e-mails.