...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
conda create -y -n s3 python=3.12 conda activate s3 pip install aws-shell pip install s3cmd conda install -y -c condoconda-forge s3fs boto3 python-magic pyyaml |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
URL:PORT => https://hssrv2.dmawi.de:635$PORT region/location => bhv ACCESS_KEY => HPC_user$GRP SECRET_KEY => t1H13sOUBD/H7NuL$SECRET CERTS_FILE => https://spaces.awi.de/download/attachments/494210152/HSM_S3gw.cert.pem |
...
Please make sure to download the certificate file.
aws
Use aws configure
to to adapt the credentials to this tool or create the following files.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[default] aws_access_key_id=HPC_user$GRP aws_secret_access_key=t1H13sOUBD/H7NuL$SECRET |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[default] region = bhv endpoint_url = https://hssrv2.dmawi.de:635$PORT ca_bundle = /Users/pasili001/Downloads/HSM_S3gw.cert.pem |
Listing the buckets
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
> aws # < CORRECT ME >. using tilde (~) or $HOME in path does *NOT* work. |
Listing the buckets
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
> aws s3 ls 2024-04-06 s3 ls 2024-04-06 01:11:30 testdir > aws s3 ls s3://testdir 2024-04-06 01:11:30 385458 tmp.csv |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[default] host_base = hssrv2.dmawi.de:635$PORT host_bucket = hssrv2.dmawi.de:635$PORT bucket_location = bhv access_key = HPC_user$GRP secret_key = t1H13sOUBD/H7NuL$SECRET use_https = Yes ca_certs_file = /Users/pasili001/Downloads/HSM_S3gw.cert.pem. # < CORRECT ME > |
Listing the buckets
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
> s3cmd ls 2024-04-06 01:11 s3://testdir > s3cmd ls s3://testdir 2024-04-06 01:11 385458 s3://testdir/tmp.csv |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
key: HPC_user$GRP secret: t1H13sOUBD/H7NuL$SECRET client_kwargs: endpoint_url: https://hssrv2.dmawi.de:635$PORT verify: /Users/pasili001/Documents/HSM_S3gw.cert.pem region_name: bhv |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
service_name: s3 aws_access_key_id: HPC_user$GRP aws_secret_access_key: t1H13sOUBD/H7NuL$SECRET endpoint_url: https://hssrv2.dmawi.de:635$PORT region_name: bhv verify: /Users/pasili001/Documents/HSM_S3gw.cert.pem |
Write a utility function to read the config file
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import os import yaml import boto3 def get_connection(): with open(os.path.expanduser("~/.s3fs_boto")) as fid: credentials = yaml.safe_load(fid) return boto3.client(**credentials) |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
>>> conn = get_connection() >>> # Listing buckets >>> print(conn.list_buckets()) {'Buckets': [{'CreationDate': datetime.datetime(2024, 4, 7, 15, 57, 46, 944296, tzinfo=tzoffset(None, 7200)), 'Name': 'testdir'}], 'Owner': {'DisplayName': '', 'ID': 'HPC_user$GRP'}, 'ResponseMetadata': {'HTTPHeaders': {'connection': 'close', 'content-length': '315', 'content-type': 'application/xml', 'date': 'Sun, 07 Apr 2024 21:50:03 GMT', 'server': 'VERSITYGW'}, 'HTTPStatusCode': 200, 'RetryAttempts': 0}} >>> >>> # filtering down the results just to show the bucket names >>> for bucket in conn.list_buckets().get('Buckets'): ... print(bucket['Name']) ... 'testdir' >>> # Listing objects >>> objs = conn.list_objects(Bucket='testdir') >>> print(obj) {'Delimiter': '', 'EncodingType': '', 'IsTruncated': False, 'Marker': '', 'MaxKeys': 1000, 'Name': 'testdir', 'NextMarker': '', 'Prefix': '', 'ResponseMetadata': {'HTTPHeaders': {'connection': 'close', 'content-length': '67702', 'content-type': 'application/xml', 'date': 'Sun, 07 Apr 2024 21:58:15 GMT', 'server': 'VERSITYGW'}, 'HTTPStatusCode': 200, 'RetryAttempts': 0}, 'Contents': [{'ETag': '5f0137574247761b438aa508333f487d', 'Key': 'tmp.csv', 'LastModified': datetime.datetime(2024, 4, 6, 1, 11, 30, 890787, tzinfo=tzoffset(None, 7200)), 'Size': 385458, 'StorageClass': 'STANDARD'}, {'ETag': 'd776a1b6e8dc88615118832c552afd4c', 'Key': 'demo-airtemp/lon/0', 'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 37104, tzinfo=tzoffset(None, 7200)), 'Size': 118, 'StorageClass': 'STANDARD'}, {'ETag': 'ffe3e35a2a10544db446cb5ffb64516b', 'Key': 'demo-airtemp/time/.zarray', 'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 410103, tzinfo=tzoffset(None, 7200)), 'Size': 319, 'StorageClass': 'STANDARD'}, {'ETag': 'c3469e3ac4f2746bdb750335dbcd104a', 'Key': 'demo-airtemp/time/.zattrs', 'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 520103, tzinfo=tzoffset(None, 7200)), 'Size': 172, 'StorageClass': 'STANDARD'}, ... ... {'ETag': '7c6e83fce9aa546ec903ca93f036a2fd', 'Key': 'demo-airtemp/time/0', 'LastModified': datetime.datetime(2024, 4, 7, 15, 58, 49, 630102, tzinfo=tzoffset(None, 7200)), 'Size': 2549, 'StorageClass': 'STANDARD'}]} |
The output for listing the objects is truncated on purpose to avoid filling up this page. Unlike the other clients, botocore provides a lot of metadata information related to buckets and objects. .
This is brief introduction to s3 with the focus of knowing some tools and how to configure them in order to talk to s3.
Additional information related to this topic is found here https://pad.gwdg.de/WH0xt_MGTkitDxP3NAM7Xw?view
A talk on this topic also available at https://docs.gwdg.de/lib/exe/fetch.php?media=en:services:application_services:high_performance_computing:coffee:a_brief_introduction_on_ceph_s3-compatible_object_storage_at_gwdg.mp4