You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

  • What shall it contain? (This depends highly on your sensors data.)
    • If your sensor produces only one file type and files are all stored in your sensors root directory under platforms/urn/exdata/, there are no duplicates and all existing files shall be published, then then we require the following:
      • A description how we can extract date and time information from your filenames
      • A data format description, if it is a non-standard data format. This can be a description or a persistent link to a format description or a software which can be used to open and process this data.
    • If your sensor produces more than one file type we additionally need:
      • A file name description for how to distinguish between the different files. (E.g. file endings, prefix in filename, ...)
    • If there are directories below the sensors root directory
      • A description if we shall archive files from all directories of if to discard certain folders
      • A description if we can extract information such as Expedition, Event, Date and Time from the directory name
  •  What is it used for?  We use this data description to create an Ingest template. Your descriptions are used to create Regular Expressions to match your files in your directories. We appreciate you to provide us with such regex repressions directly in your data description if your are familiar with them. Your data description is translated into such a template:

      "_ingest": {
        "_sourceRegex": ".*(?P<campaign>PS[0-9]{2,3})",
        "_columns": [
          {
            "regex": "^((Screenshots|screenshots))\\/.*\\.((JPG|jpg|PNG|png))",
            "column": "Binary Object []",
            "comment": "Screenshots",
            "description": ""
          },
          {
            "regex": "^PHF_ASD_[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9].*\\/HS3PHF_(?P<year>[1-2][0-9][0-9][0-9])-(?P<month>[0-1][0-9])-(?P<day>[0-3][0-9])T(?P<hour>[0-2][0-9])(?P<minute>[0-5][0-9])(?P<second>[0-5][0-9]).*Z_[0-9]*\\.asd",
            "column": "Binary Object []",
            "comment": "PHF_ASD files",
            "description": ""
          },
          {
            "regex": "^PHS_ASD_[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9].*\\/HS3PHS_(?P<year>[1-2][0-9][0-9][0-9])-(?P<month>[0-1][0-9])-(?P<day>[0-3][0-9])T(?P<hour>[0-2][0-9])(?P<minute>[0-5][0-9])(?P<second>[0-5][0-9]).*Z_[0-9]*\\.asd",
            "column": "Binary Object []",
            "comment": "PHS_ASD files",
            "description": ""
          },
          {
            "regex": "^S7K_[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9].*\\/(?P<year>[1-2][0-9][0-9][0-9])(?P<month>[0-1][0-9])(?P<day>[0-3][0-9])_(?P<hour>[0-2][0-9])(?P<minute>[0-5][0-9])(?P<second>[0-5][0-9])_.*\\.((s7k|S7K))$",
            "column": "Binary Object []",
            "comment": "RESON-S7K files",
            "description": ""
          }
        ]
      }
    
    
    
    
    • the key "_sourceRegex"  can be used to extract campaign or event information from the directory. This is important if you wish to publish data from several campaigns (expedition legs) or events together. The following regex groups are supported by our framework: "campaign", "leg", "science_operation", "device_operation"
    • each element in the list below "columns" : [   describes one file type.  "regex": provides the regular expression to match this certain file. The following regex groups are supported to extract information here:  "year", "month", "hour", "minute", "second"
  • Finally this template is used to create the data table for the PANGAEA publication page and to archive there the list of files accordingly. The date time information from the file can be used to provide a georeference based on the campaign's mastertrack if desired.


  • No labels