Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ModelUserPro SMT (smile)Contra SMT (sad)
idleadmin-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng streamadmin--

~13% slower

FESOMNECUsing 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)Using 256 Thread per node: 10% slower
Python AIvhelmno impact/differenceno impact/difference
matlab
#SBATCH --cpus-per-task=16
vhelm
Runtime: 1440s instead of 1366s → ~5% slower
unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/lkaleschke-hu/  ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp     --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G
lkalesch
mthoma
-

no advantage

$Psmp
(sec)
smpht
(sec)
Note
1
177without parallisation
63333
122020
361212
1281010
25699
088use all cores



GPU nodes (A40 vs. A100)

ModelUserA40 vs. A100
 tensorflow-gpu AI applicationvhelmno difference

python3, matrix operations with with numpy (fat) vs cupy (gpu)

sviquera






...

Runtime compared with ollie




albedo
GPFS
albedo
local NVMe
ollie
BeeGFS
idl vhelm

Cumulative time for loop and if conditions:  
Cumulative time file open: 
Cumulative time data read and file close: 
Total cumulative time: 
Total amount of imported data (Mega Bytes):   


0.32 s ( 3.11 %)
0.03 s ( 0.26 %)
9.95 s (96.62 %)
10.30 s
34269 (  3442 MB/s)


0.13 s ( 1.40 %)
0.01 s ( 0.06 %)
8.94 s (98.54 %)
9.07 s
34269 (  3832 MB/s)


3.48 s (12.73 %)
0.05 s ( 0.19 %)
23.77 s (87.07 %)
27.30 s
34269 (  1441 MB/s)