...

Model

User

Pro SMT

Contra SMT

idle

admin

-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng stream

admin

--

~13% slower

FESOM

NEC

Using 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)

Using 256 Thread per node: 10% slower

Python AI

vhelm

no impact/difference

matlab
#SBATCH --cpus-per-task=16

vhelm

Runtime: 1440s instead of 1366s → ~5% slower

unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/input/* ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G

lkalesch
mthoma

If a user does only request --ntasks-per-node=1 (the default) and uses P=0 (use all available cores) then you get

$P	smp (sec)	smpht (sec)
0	175	113

But please note: This is a improper use of slurm/HPC, so this "advantage" does not justify SMT

no advantage (if slurm is used properly)

$P	smp (sec)	smpht (sec)	Note
1	175	175	without parallisation
2		88
6	33	33
12	20	20
36	12	12
128	10	10
256	9	9
0	8	8	use all cores

GPU nodes (A40 vs. A100)

Model	User	A40 vs. A100
tensorflow-gpu AI application	vhelm	no difference
python3, matrix operations with with numpy (fat) vs cupy (gpu)	sviquera

...

		albedo			ollie
Application	user	node internal /tmp (NVMe)	100 Gb Infiniband /albedo (GPFS)	10 Gb Ethernet /isibhv (NVMe)	node internal /tmp (SSD)	100 Gb Omnipath /work (BeeGFS)	10 Gb Ethernet /isibhv (NVMe)
idmidl: reading 244 data files	vhelm	~9 sec	10~13 sec	8~11 sec spikes up to 181 sec	27~29 sec	27~37 sec	29~60 sec spikes up to 98 sec
ls -f ls # default with stat/color ls -f	directory with 30000 entires	0.08 sec 0.19 sec	0.04 sec 6~15 0.3 sec	0.03 sec 0.2 sec	0.1 sec 0.4 sec	0.2 sec 1.6 sec	0.08 sec 0.3~0.7 sec

...

Runtime compared with ollie

Model	User		albedo GPFS	albedo local NVMe	ollie BeeGFS
idl vhelm	vhelm	Cumulative time for loop and if conditions: Cumulative time file open: Cumulative time data read and file close: Total cumulative time: Total amount of imported data (Mega Bytes):	0.32 s ( 3.11 %) 0.03 s ( 0.26 %) 9.95 s (96.62 %) 10.30 s 34269 ( 3442 MB/s)	0.13 s ( 1.40 %) 0.01 s ( 0.06 %) 8.94 s (98.54 %) 9.07 s 34269 ( 3832 MB/s)	3.48 s (12.73 %) 0.05 s ( 0.19 %) 23.77 s (87.07 %) 27.30 s 34269 ( 1441 MB/s)
tensorflow	vhelm	AI for trained modell	4 times slower		~10 minutes

Space shortcuts

Page tree

Versions Compared

Old Version 9

New Version Current

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 9

New Version Current

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie