You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

HSM at AWI

The HSM provides for your data

  1. unlimited storage space and
  2. two backups on tape

However, there are three caveats:

  1. Your Data is archived on a tape, so it will take a while to get it back.
  2. You have a quota in your personal user space (domain C) of 1 TB (10 TB) online (offline) and 200000 files. (You can check your quota with ssh hssrv2 saminfo.sh -q)
  3. User data will be deleted six months after a user leaves AWI. But you can move your data from the user space into the project area (domain B), where it will be stored for a much longer period (domain B) or even permanently (domain A).
Three domains of archiving at AWI


DomainPurpose
APermanent archive for irrecoverable data. Metadata is needed for this domain. Please contact Stefanie Schumacher for more information about Pangaea.
BLong term project data with predefined life time. A project can be created with eResources https://cloud.awi.de/#/projects A sticky bit in the project area protects files from being deleted by anyone other than the admin or the owner. The setgid on directories ensures that new (sub-)directories belong to the UNIX-group automatically.
CEvery user get a personal user area when he gets an account on hssrv2. However, user data will be deleted after user leaves AWI.
DomainFile SystemsTape ArchiveDisk Archive
How to apply

A

/hs/usera


Yes


Yes


Archive of individual Projects



/hs/useroYesYesPangaea
B


/hs/bsys

Yes

Yes

Biological science

Please use eResources https://cloud.awi.de/#/projects to create a project.



/hs/csysYesYesClimate science
/hs/gsysYesYesGeophysics science
/hs/techYesYesTechnical science

/hs/potsdamYesNo
C

/hs/userc



Yes

No

User data



To apply for a HSM account you need an AWI-Unix account first.

/hs/usermYesNoUser data

A disk archive is available for specific files at domains A and B. This allows a fast access of offline files. The availability of this disk archive depends on the actual resources/usage and might change.

Access to hssrv2

Read Access

Files might be offline for several reasons. If you want to access these they are read from tape automatically, but this takes some time. If you want to read/copy more than one file staging is strongly recommended (see below).

The Windows Way

Connect to \\hssrv2.awi.de\ within windows explorer (use the right mouse button and add a network device). You can either search for shared directories or directly connect with e.g., \\hssrv2.awi.de\userm\<username>.

The *nix Way

All HSM file systems are shared and mounted automatically on all AWI computers. A simple ls /hs/userm/ should do.

Read/Write Access

You need an HSM-account, if you want to write data on the tape system. After you have been informed via email, you can archive your files with the following methods:

SuggestionCommandImportant Notes
Best choice :-)

rsync -e ssh -uvP[r] <file|dir> <username>@hssrv2.awi.de:<destination-dir>

rsync is the most versatile way of transfering data. E.g., it allows updates with the -u option. This ensures that only new files are copied (and overwritten), existing (unchanged) files are not touched. This is important to reduce tape access. You do not want to use -a, because this would  stage all files from tape to the disk-cache for a complete file-comparison.

When copying directories you need -r (recursive).

Fast choice :-)sftp/ftp[s]ftp provides the fastest way of transferring large amounts of data. Use your favourite ftp-client. However, note, that only two connections per user are allowed. If you request more, your connection will terminate. sftp uses the secure ssh-protocol and should be preferred.
Do not use! :-(scp <file|dir> <username>@hssrv2.awi.de:<destination-dir>scp seems convenient, but it is slightly slower when transferring data compared to ftp and/or rsync. It also just overrides existing files and no update (like rsync -u) is possible. This would also create new tape copies, you do not want to do that!!!

Note: If you have to archive many (>100 000) small (<100 MB) files this will stress the system more than necessary. Please zip or tar your directories and upload these compressed files.

Execute commands on hssrv2

Direct access (login) to hssrv2 should not be necessary (and will not be possible, unless you provide information to prove us wrong :-)). However, you can execute remote commands on hssrv2 in a restricted shell to get information about your data. E.g., you can release and stage your data if necessary. See the right column for some useful commands. They are executed with: ssh <username>@hssrv2 <command>

Create an ssh-key for hssrv2

You can always read data from the hssrv2 via ftp. If you want to archive data in the tape archive, you have to apply for an account. You will get an email after the account has been installed. Within the email you get after you applied for a HSM account you will find the path to your $HOME and your user-directory. You should create a ssh-key for hssrv2. Execute these commands in a terminal (e.g., putty on windwos):

  • Execute >ssh-keygen -t rsa< on your computer/client and just press enter three times (confirmation of key location and empty passphrase).
  • As ssh-copy-id is not working for hssrv2, you have to do this:
    • scp ~/.ssh/id_rsa.pub <username>@hssrv2.awi.de:~/.ssh/authorized_keys
    • ssh <username>@hssrv2.awi.de "chmod 700 ~/.ssh/authorized_keys"
    • Alternatively, if you want access from more than one host you want to append your public key, in that case you can use:
      scp 
      <username>@hssrv2.awi.de:.ssh/authorized_keys <local-file>
      ca
      t ~/.ssh/*pub <local-file>
      scp <local-file> <username>@hssrv2.awi.de:.ssh/authorized_keys
    • This does not work for security reasons: cat ~/.ssh/*pub | ssh <username>@hssrv2.awi.de 'umask 077; cat >> ~/.ssh/authorized_keys'

  • On other systems (e.g. ollie.awi.de) ssh-copy-id <username>@hssrv2.awi.de> is easier to use (this copies all your keys to the server).

The $HOME is not exported/shared and has a very limited quota. It should only be used for ssh-keys.

Note: Starting with the 7.0 release of OpenSSH, support for ssh-dsa keys has been disabled by default. You can re-enable support locally by updating your sshd_config (in /etc/ssh, /opt/local/etc/ssh, or ~/.ssh/config) with:

PubkeyAcceptedKeyTypes=+ssh-dss

General Information about SamFS & HSM

Principle Idea

  • SamFS stands for (S)torage (a)rchive (m)anager (F)ile (S)ystem
  • SamFS: is a (H)ierarchical (S)torage (M)anagement system (HSM). The HSM consists of two storage systems: The cache to speed up access, and the hierarchy. Based on a set of rules, data is stored on certain connected storage devices (tapes and maybe disks).

The Circle of Life

Archiving

  • When creating a file in SamFS (e.g., by rsync, scp, ftp) the data is stored on a fast cache system (a hard drive).
  • Depending on predefined policies (e.g., when the file has not been modified for a specific amount of time) the file is automatically archived on slower (and much cheaper) tapes.
  • A file just created is online

Releasing

  • The metadata (filename, size, ownership, permissions, etc.) of a file stay always on the cache system and are visible, but 
  • when the cache system fills upl (e.g, 90% capacity) the data of large files and files that have not been touched fro some time is released.
  • If the data of the file is released it is called offline.
  • The user does (in the first instance) not see any difference between a online and offline file.

Staging

  • When offline data is accessed the SamFS intercepts the call and automatically gathers the data from the archive media. SamFS uses informations from the metadata to find the media.
  • In the meantime the reads from this file will be blocked, thus the process accessing the data blocks, too.

Recycling

  • If the content of a file changes a new archive copy has to be produced. (You can not modify just the relevant bits on the tape.)
  • The previous archive copy becomes useless (aside from having an additional backup of a previous version).
  • If a file is deleted, the archive copy becomes useless, too.
  • Both processes result in unused (invalid) sections on a tape. 
  • Eventually only a small part of a tape contains relevant (up to date) information. The residual data is archived on other tapes, the old tape is erased and can be used for future archive copies.
  • This happens by the following tasks:
    1. The recycler marks the tape and/or files with R.
    2. The next archiver run finds these files and starts re-archiving, the R flag of the file vanishes and a new vsn (volume serial name) for this copy is set.
    3. The recycler recognises that all files are copied somewhere else, because they have a new unique vsn. The old tape gets the status c (old candiadat) and depending on the settings in /etc/opt/SUNWsamfs/recycler.delay an atq job is scheduled for /etc/opt/SUNWsamfs/scripts/recycler.sh.


Related articles

  • No labels