...

Model

User

Pro SMT

Contra SMT

idle

admin

-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng stream

admin

--

~13% slower

FESOM

NEC

Using 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)

Using 256 Thread per node: 10% slower

Python AI

vhelm

no impact/difference

matlab
#SBATCH --cpus-per-task=16

vhelm

Runtime: 1440s instead of 1366s → ~5% slower

unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/lkaleschke-hu/ ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G

lkalesch
mthoma

-

no advantage

$P	smp (sec)	smpht (sec)	Note
1		177	without parallisation
6	33	33
12	20	20
36	12	12
128	10	10
256	9	9
0	8	8	use all cores

GPU nodes (A40 vs. A100)

Model	User	A40 vs. A100
tensorflow-gpu AI application	vhelm	no difference
python3, matrix operations with with numpy (fat) vs cupy (gpu)	sviquera

...

Runtime compared with ollie

albedo 
GPFS

albedo 
local NVMe

ollie
BeeGFS

idl vhelm

Cumulative time for loop and if conditions:
Cumulative time file open:
Cumulative time data read and file close:
Total cumulative time:
Total amount of imported data (Mega Bytes):

0.32 s ( 3.11 %)
0.03 s ( 0.26 %)
9.95 s (96.62 %)
10.30 s
34269 ( 3442 MB/s)

0.13 s ( 1.40 %)
0.01 s ( 0.06 %)
8.94 s (98.54 %)
9.07 s
34269 ( 3832 MB/s)

3.48 s (12.73 %)
0.05 s ( 0.19 %)
23.77 s (87.07 %)
27.30 s
34269 ( 1441 MB/s)

Space shortcuts

Page tree

Versions Compared

Old Version 15

New Version 16

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 15

New Version 16

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie