Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titlesoftware stack
linenumberstrue
conda create -y -n s3 python=3.12
conda activate s3
pip install aws-shell 
pip install s3cmd
conda install -y -c condoconda-forge s3fs boto3 python-magic pyyaml

...

Code Block
languagetext
titlecredentials
linenumberstrue
URL:PORT          =>  https://hssrv2.dmawi.de:635$PORT
region/location   =>  bhv
ACCESS_KEY        =>  HPC_user$GRP
SECRET_KEY        =>  t1H13sOUBD/H7NuL$SECRET
CERTS_FILE        =>  https://spaces.awi.de/download/attachments/494210152/HSM_S3gw.cert.pem

...

 Please make sure to download the certificate file.

aws

Use  aws configure  to  to adapt the credentials to this tool or create the following files.

Code Block
languagetext
title~/.aws/credentials
linenumberstrue
[default]
aws_access_key_id=HPC_user$GRP
aws_secret_access_key=t1H13sOUBD/H7NuL$SECRET


Code Block
languagetext
title~/.aws/config
linenumberstrue
[default]
region = bhv
endpoint_url = https://hssrv2.dmawi.de:635$PORT
ca_bundle = /Users/pasili001/Downloads/HSM_S3gw.cert.pem

Listing the buckets

Code Block
languagebash
titlelistings
linenumberstrue
> aws # < CORRECT ME >. using tilde (~) or $HOME in path does *NOT* work.s3 ls
2024-04-06 01:11:30 testdir
> aws s3 ls s3://testdir
2024-04-06 01:11:30     385458 tmp.csv


s3cmd

s3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol.

s3cmd  look for credentials at ${HOME}/.s3cfg 

create the config file as followsListing the buckets

Code Block
languagebashtext
titlelistings~/.s3cfg
linenumberstrue
> aws s3 ls
2024-04-06 01:11:30 testdir
> aws s3 ls s3://testdir
2024-04-06 01:11:30     385458 tmp.csv

s3cmd

s3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol.

s3cmd  look for credentials at ${HOME}/.s3cfg 

create the config file as follows

Code Block
languagetext
title~/.s3cfg
linenumberstrue
[default]
host_base   = hssrv2.dmawi.de:635
host_bucket = hssrv2.dmawi.de:635
bucket_location = bhv
access_key = HPC_user
secret_key = t1H13sOUBD/H7NuL
use_https = Yes
ca_certs_file = /Users/pasili001/Downloads/HSM_S3gw.cert.pem. # < CORRECT ME >
[default]
host_base   = hssrv2.dmawi.de:$PORT
host_bucket = hssrv2.dmawi.de:$PORT
bucket_location = bhv
access_key = $GRP
secret_key = $SECRET
use_https = Yes
ca_certs_file = HSM_S3gw.cert.pem

Listing the buckets

Code Block
languagebash
titlelistings
linenumberstrue
> s3cmd ls
2024-04-06 01:11  s3://testdir
> s3cmd ls s3://testdir
2024-04-06 01:11       385458  s3://testdir/tmp.csv

upload a directoryListing the buckets

Code Block
languagebash
titlelistingssync
linenumberstrue
> s3cmd ls
2024sync -04-06 01:11  stats demo-airtemp/ s3://testdir
> s3cmd ls s3://testdir
2024-04-06 01:11       385458  s3://testdir/tmp.csv

upload a directory

Code Block
languagebash
titlesync
linenumberstrue
> s3cmd sync --stats demo-airtemp//demo-airtemp/
Done. Uploaded 5569414 bytes in 62.8 seconds, 86.61 KB/s.
Stats: Number of files transferred: 306 (5569414 bytes)

> s3cmd ls s3://testdir/demo-airtemp
                          DIR  s3://testdir/demo-airtemp/
Done.
> Uploadeds3cmd 5569414 bytes in 62.8 seconds, 86.61 KB/s.
Stats: Number of files transferred: 306 (5569414 bytes)

> s3cmd ls s3:ls s3://testdir/demo-airtemp/
                          DIR  s3://testdir/demo-airtemp/air/

>  s3cmd ls s3://testdir/demo-airtemp/
                          DIR  s3://testdir/demo-airtemp/air/
                          DIR  s3://testdir/demo-airtemp/lat/
                          DIR  s3://testdir/demo-airtemp/lon/
                          DIR  s3://testdir/demo-airtemp/time/
2024-04-07 15:57          307  s3://testdir/demo-airtemp/.zattrs
2024-04-07 15:57           24  s3://testdir/demo-airtemp/.zgroup
2024-04-07 15:57         3969  s3://testdir/demo-airtemp/.zmetadata

...

Code Block
languagetext
title~/.s3fs
linenumberstrue
key: HPC_user$GRP
secret: t1H13sOUBD/H7NuL$SECRET
client_kwargs:
  endpoint_url: https://hssrv2.dmawi.de:635$PORT
  verify: /Users/pasili001/Documents/HSM_S3gw.cert.pem
  region_name: bhv

...

check out their API for function signatures andalso their documentation for more examples.

Botocore - low level interface

  • Botocore is a low-level interface to a growing number of Amazon Web Services.
  • Botocore serves as the foundation for the AWS-CLI command line utilities.
  • Sort-of oriented towards library builders

Save the credentials as follows (user is free to choose the convenient file name and file format) 

Code Block
languagetext
title~/.s3fs_boto
linenumberstrue
service_name: s3
aws_access_key_id: $GRP
aws_secret_access_key: $SECRET
endpoint_url: https://hssrv2.dmawi.de:$PORT
region_name: bhv
verify: HSM_S3gw.cert.pem

Write a utility function to read the config file

Code Block
languagepy
titleload credentials
linenumberstrue
import os
import yaml
import boto3
    
def get_connection():
    with open(os.path.expanduser("~/.s3fs_boto")) as fid:
        credentials = yaml.safe_load(fid)
    return boto3.client(**credentials)

Listing buckets and objects

Code Block
languagepy
titlelistings
linenumberstrue
>>> conn = get_connection()
>>> # Listing buckets
>>> print(conn.list_buckets())
{'Buckets': [{'CreationDate': datetime.datetime(2024, 4, 7, 15, 57, 46, 944296, tzinfo=tzoffset(None, 7200)),
              'Name': 'testdir'}],
 'Owner': {'DisplayName': '', 'ID': '$GRP'},
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'close',
                                      'content-length': '315',
                                      'content-type': 'application/xml',
                                      'date': 'Sun, 07 Apr 2024 21:50:03 GMT',
                                      'server': 'VERSITYGW'},
                      'HTTPStatusCode': 200,
                      'RetryAttempts': 0}}
>>>
>>> # filtering down the results just to show the bucket names
>>> for bucket in conn.list_buckets().get('Buckets'):
...    print(bucket['Name'])
...
'testdir'
>>> # Listing objects
>>> objs = conn.list_objects(Bucket='testdir')
>>> print(obj)
{'Delimiter': '',
 'EncodingType': '',
 'IsTruncated': False,
 'Marker': '',
 'MaxKeys': 1000,
 'Name': 'testdir',
 'NextMarker': '',
 'Prefix': '',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'close',
                                      'content-length': '67702',
                                      'content-type': 'application/xml',
                                      'date': 'Sun, 07 Apr 2024 21:58:15 GMT',
                                      'server': 'VERSITYGW'},
                      'HTTPStatusCode': 200,
                      'RetryAttempts': 0},
'Contents': [{'ETag': '5f0137574247761b438aa508333f487d',
  'Key': 'tmp.csv',
  'LastModified': datetime.datetime(2024, 4, 6, 1, 11, 30, 890787, tzinfo=tzoffset(None, 7200)),
  'Size': 385458,
  'StorageClass': 'STANDARD'},
 {'ETag': 'd776a1b6e8dc88615118832c552afd4c',
  'Key': 'demo-airtemp/lon/0',
  'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 37104, tzinfo=tzoffset(None, 7200)),
  'Size': 118,
  'StorageClass': 'STANDARD'},
 {'ETag': 'ffe3e35a2a10544db446cb5ffb64516b',
  'Key': 'demo-airtemp/time/.zarray',
  'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 410103, tzinfo=tzoffset(None, 7200)),
  'Size': 319,
  'StorageClass': 'STANDARD'},
 {'ETag': 'c3469e3ac4f2746bdb750335dbcd104a',
  'Key': 'demo-airtemp/time/.zattrs',
  'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 520103, tzinfo=tzoffset(None, 7200)),
  'Size': 172,
  'StorageClass': 'STANDARD'},
  ...
  ...
 {'ETag': '7c6e83fce9aa546ec903ca93f036a2fd',
  'Key': 'demo-airtemp/time/0',
  'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 630102, tzinfo=tzoffset(None, 7200)),
  'Size': 2549,
  'StorageClass': 'STANDARD'}]}
 

The output for listing the objects is truncated on purpose to avoid filling up this page. Unlike the other clients, botocore provides a lot of metadata information related to buckets and objects. 

This is brief introduction to s3 with the focus of knowing some tools and how to configure them in order to talk to s3.

Additional information related to this topic is found here https://pad.gwdg.de/WH0xt_MGTkitDxP3NAM7Xw?view

A talk on this topic also available at https://docs.gwdg.de/lib/exe/fetch.php?media=en:services:application_services:high_performance_computing:coffee:a_brief_introduction_on_ceph_s3-compatible_object_storage_at_gwdg.mp4