Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note: trailing forward-slash /  matters in both listing the objects and as-well in transferring files ( `sync` ) to S3.

s3fs

  • `s3fs` is a Python library to talk to S3.
  • It builds on top of `botocore`.
  • parts of the library uses `fsspec` to map to S3.

Features the following:

  • `s3fs.S3Filesystem` for file system operations (ls, remove, du, ...)
  • `s3fs.S3Map` for python dictionary like access (key --> value)
  • `s3fs.S3File` for file-like object (read, write, seek, ...)

s3fs is a bit flexible with config file naming convention and also with the file format of the config file. Users are free to choose to store their credentials in either yaml or json or any other format that is convenient for them read and load them. Here these credentials are shown as a yaml format just because it a bit reader friendly.

Code Block
languagetext
title~/.s3fs
linenumberstrue
key: HPC_user
secret: t1H13sOUBD/H7NuL
client_kwargs:
  endpoint_url: https://hssrv2.dmawi.de:635
  verify: /Users/pasili001/Documents/HSM_S3gw.cert.pem
  region_name: bhv

Write a utility function to read the config file

Code Block
languagepy
titleload credentials
linenumberstrue
import os
import yaml
import s3fs
    
def get_fs():
    with open(os.path.expanduser("~/.s3fs")) as fid:
        credentials = yaml.safe_load(fid)
    return s3fs.S3FileSystem(**credentials)

listing bucket

Code Block
languagepy
titlelistings
linenumberstrue
>>> fs = get_fs()
>>> fs.ls('testdir')
['testdir/demo-airtemp', 'testdir/tmp.csv']
>>>
>>> fs.ls('testdir/demo-airtemp')
['testdir/demo-airtemp/.zattrs',
 'testdir/demo-airtemp/.zgroup',
 'testdir/demo-airtemp/.zmetadata',
 'testdir/demo-airtemp/air',
 'testdir/demo-airtemp/lat',
 'testdir/demo-airtemp/lon',
 'testdir/demo-airtemp/time']

download file

Code Block
languagepy
titleGet a file
linenumberstrue
>>> fs.get("testdir/demo-airtemp/.zattrs", "zattrs")
[None]
>>> 
>>> # reading the local file `zattrs` to check if all bytes are transfered 
>>> import json
>>> with open("zattrs") as fid:
...     content = json.load(fid)
... 
>>> print(content)
{'Conventions': 'COARDS',
 'description': 'Data is from NMC initialized reanalysis\n'
                '(4x/day).  These are the 0.9950 sigma level values.',
 'platform': 'Model',
 'references': 'http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html',
 'title': '4x daily NMC reanalysis (1948)'}
>>> 

directly read a file from s3

Code Block
languagepy
titleon-the-fly
linenumberstrue
>>> with fs.open("testdir/demo-airtemp/.zattrs", mode="rb") as f:
...     content = f.read().decode()
...     content = json.loads(content)
... 
>>> print(content)
{'Conventions': 'COARDS',
 'description': 'Data is from NMC initialized reanalysis\n'
                '(4x/day).  These are the 0.9950 sigma level values.',
 'platform': 'Model',
 'references': 'http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html',
 'title': '4x daily NMC reanalysis (1948)'}
>>> 

Further documentation:

check out their API for function signatures andalso their documentation for more examples.