Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Suggestions/good to know

  1. Storing millions of files in one directory slows down any filesystem. Therefore we suggest to limit the number of files whenever possible (per design, workflow, zip, tar, ...)
  2. We suggest to limit the size of each individual file to a maximum of 500 GB per file. E.g. with
    tar -cPf - $INDIRECTORY | pigz -c | split -a3 -d -b500GB - $OUTFILE.
  3. If you access(read) more than one file from the HSM, please stage all files you plan to access beforehand at once! This will decrease the overall access time significantly by reducing the necessary robotic accesses to a minimum. However, please limit yourself to access only about a dozen TB at once. 

...

File Storage (POSIX, smb, nfs)

...

Connect to \\hssrv2.dmawi.de\ within windows explorer (use the right mouse button and add a network device). You can either search for shared directories or directly connect with e.g., \\hssrv2.dmawi.de\projects\<project>.

The Linux Way

...

nfs

...

The HSM file systems are shared and mounted automatically on most Linux server and clients (but not on laptops). A simple ls /hs/projects/ should do. If you should miss a directory please contact hsm-support@awi.de

cifs/smb

On laptops you can mount HSM directories via smb. You can use any file manager (thunar, nautilus, ...) or the commad line:

Code Block
languagebash
sudo mount    -t cifs -o username=$USER@dmawi.de          //hssrv2/Projects          /mnt  # short form
sudo mount -v -t cifs -o username=$USER@dmawi.de,vers=3.0 //hssrv2.dmawi.de/Projects /mnt # long form


# "sometimes" these work, too:
# kinit $USER
# sudo mount -t cifs -o sec=krb5 //hssrv2a.dmawi.de/Projects /mnt # short form
# sudo mount -t cifs -o username=$USER,domain=DMAWI,workgroup=DMAWI.DE,password=$PW,sec=krb5,vers=3.11 //hssrv2b.dmawi.de/Projects /mnt  # long form

Read/Write Access

The HSM system ist shared/exported/mounted read only. To write data into the tape archive use one of the following options:

 way transfering reduce accessa
SuggestionCommandImportant Notes
(big grin) Best choice

rsync -e ssh -Pauv <file|dir> <username>@hssrv2.dmawi.de:<destination-dir>

rsync is the most versatile

way of

transferring data. E.g., it allows updates with the -u option. This ensures that only new files are copied (and overwritten), existing (unchanged) files are not touched. This is important to

avoid tape

waste.
Note: You do not want to use -

c, because this would  stage all files from tape to the disk-cache for a complete file-comparison.

When copying directories you need -r (recursive, already included in -a).

(big grin) Fast choicesftp
filezilla
sftp providesfastway of transferring large amounts of data. Use your favorite ftp-client. However, note, that only two connections per user (per HSM server) are allowed. If you request more, your connection will terminate. sftp uses the secure ssh-protocol and should be preferred. Use port 22 for sftp.
(minus) Do not use!scp <file|dir> <username>@hssrv2.dmawi.de:<destination-dir>scp seems convenient, but it is slightly slower when transferring data compared to ftp and/or rsync. It also just overrides existing files and no update (like rsync -u) is possible. This would also create new tape copies, you do not want to do that!!!

Note: If you have to archive many (>100 000) small (<100 MB) files this will stress the system more than necessary. Please zip or tar[.gz] your directories and upload these compressed files.

Example: Copy a lot of data from HSM to HPC

Assuming you want to copy a lot of data from "/hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_[Record All]/111498_RGB" (note the special characters you should avoid on all costs (warning) )

  1. Check how many files are online:
    Code Block
    languagebash
    ssh hssrv2 "saminfo.sh -Cc -f '/hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_[Record All]/111498_RGB'"
    Checking >/hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_[Record All]/111498_RGB<
       Total files:   ... 12083
       Online files:  ... 7637
       Staging files: ... 4299
       Size:          ... 857G
     
  2. Stage your data (only needed once, in the output shown before this was done some time ago): 
    Code Block
    languagebash
    ssh hssrv2 "saminfo.sh -Cstage  -f '/hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_[Record All]/111498_RGB'"
  3. You can start copying already staged files whenever you want:
    Code Block
    languagebash
    ssh hssrv2 "samcli file find /hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_\[Record\ All\]/111498_RGB/ --online " > /tmp/files-online.txt
    rsync -Pauv --no-R --chmod 664 --files-from=/tmp/files-online.txt hssrv2:/ .
  4. You can repeat step 3 whenever you feel it is necessary and finally copy all remaining files (when thy are finally online) with

    Code Block
    languagebash
    rsync -Pauv --chmod 664 hssrv2:/hs/platforms/WORM/aircraft/polar6/macs/exdata/P6-244_ANT_23_24_2311300801/20231130-141433_\[Record\ All\]/111498_RGB/ .


Object Storage (S3)

  • ScoutAM provides a S3-gateway to access data. 
  • Currently this in the testing phase.
  • We consider/plan to provide S3 storage via eResources in he future (Speculation: Q4/2024)
  • Only TLS connections are possible. Please use this certificate:  HSM_S3gw.cert.pem