HDF5 Extras  0.0.1
Convenience Functions for using HDF5 Better
File Format

This is a brief overview of the hdf5-extras library object implementation in an HDF5 file. There are basically 3 different objects that HDF5 defines: metadata, groups, and datasets. Groups are basically like directories in a filesystem, with names and metadata attached to them. Datasets are basically like files in a filesystem, with names and metadata attached to them. We use these 3 HDF5 objects to create our own "meta" objects.

File Metadata

There is an "HDF5 Group," named _Header which contains "file" metadata:

The descriptor is set by the user. The history is added-on to during the lifetime of the file. It keeps track of everything that is done to the file. datetime's look like: "15:50:36 12-Apr-2014" and help keep track of things. Each history line-item is tagged with a datetime.

Other objects are created as "HDF5 Groups" under "/," (the root) with metadata and data beneath the group.

Image Objects

Image objects are created as groups. They have the following metadata attached to them:

The idea behind a multi-channel image is that it should hold all the channels that the user would like to group together as a single unit. For polarimetric SAR data, all the channels of the covariance matrix should be kept together in one image, to be operated on together. For multispectral data, like Landsat, all 7 channels need to be kept together, even though the resolution, and hence number of pixels, is different for some of the channels.

The descriptor is set by the user. The history is added-on to during the lifetime of the image. It keeps track of everything that is done to the image. datetime's look like: "15:50:36 12-Apr-2014" and help keep track of things. Each history line-item is tagged with a datetime.

Other, optional, metadata includes the header information that came with the original data, called the "rawheader". This is just a HDF5 dataset that contains an exact copy of the metadata that was provided with the original data.

Another optional metadata item is the "processed" header data. This is in a format that depends on the kind of data, such as sar, or landsat, or lidar. Each of these kinds of data has a specific XML-based metadata format that contains the generic metadata for that type. This metadata is a key component of any system that needs to be able to apply processing algorithms to data without needing to know which sensor took the data. Without this, all the processing codes would be specialized for speciifc sensors and there would never be any consistency and there would be far too many algorithms to keep track of easily. This data is stored in an IFile so that it is easy to adapt other codes to work with them, as if they are a file in the filesystem.

Image Data or Rasters

Each channel in an image is a separate "HDF5 Dataset" within an image group. These are called "rasters."

YET:
Images can also have image pyramid data associated with them. These are also "HDF5 Datasets," and are named "p%d_%dX%d" where the first d is the channel number they are associated with, the secondc d is the x-dimension in pixels, and the 3rd d is the y-dimension in pixels. As the pyramids are meant solely for speeding up the GUI, they are not mentioned in the metadata, and the code has to "look for" them.

Each raster dataset also has metadata:

optional metadata:

The typical value for dataset_type is "9" which means a raster dataset.

The type is a string that indicates the numeric data-type for the raster data, it can be one of: "UINT1", "UINT8", "INT8", "CINT8", "UINT16", "INT16", "CINT16", "UINT32", "INT32", "UINT64", "INT64", "CINT32", "CINT64", "R32", "R64", "C64", "C128".

This covers all the basic types that are possible: integers, reals, complex numbers, with sizes of 1 bit, 1 byte, 2 byte, 4 bytes, or 8 bytes. These strings can be decoded by noting that U stands for Unsigned, C stands for Complex, R stands for Real, INT stands for Integer, and the number at the end is the number of bits in the value. For example:

The "pixel-size" parameters are given in the x and y directions, and they have units. These can all be updated as needed.

"npixels" is the count of pixels in the x-direction, "nlines" is the count of pixels in the y-direction.

The metadata "writeable" is string, containing a 0 if the raster is write-locked, and a 1 if the raster is writeable.

YET:
The metadata "location" is straightforward copy of the Location_t structure, with either an affine transform to the projected coords: (x0,y0), (dx,dy), etc., or a list of GCPs giving image coords and projected coords for up to 100 points. This is all stored in a simple dataset YET: in XML format.

YET:
The metadata "spatialref" is a string containing the EPSG (or other) code, the WKT string, and the proj.4 string.

Image Pyramids

YET

Image objects can contain other datasets besides the rasters. In particular, they can contain "image pyramids" named "p%d_xsizeXysize" where the first d is the channel number they correspond to, and the xsizeXysize is the shrunken size of each: there may be many. These datasets (as well as the rasters) have last-update-datetime's so that a program can decide to recompute the pyramids if they are older than the image rasters they go with. These datasets do not have "writeable" metadata so they are always read/write.

Vectors

YET Vector objects are also created as groups. They have the following metadata attached to them:

YET:Convert to XML The "spatialref" metadata string contains 3 popular ways to specify the spatial reference that is used to specify coordinates: First there is the EPSG code, then the Well-Known-Text version, and lastly the proj4 method, each separated with semi-colons. An example:

EPSG:32613; wkt:PROJCS["WGS_1984_UTM_Zone_13N",
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",
SPHEROID["WGS_1984",6378137.0,298.257223563]],
PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],
PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],
PARAMETER["Central_Meridian",-105.0],PARAMETER["Scale_Factor",0.9996],
PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0],
AUTHORITY["EPSG","32613"]];
proj4:+proj=utm +zone=13 +ellps=WGS84 +units=m +no_defs;

Note that in the actual string there are no "line-feeds" as there are here.

The data that is stored in the Vector object is in an "HDF5 Dataset" under the named Group. This dataset is always named "ifile1." However, this dataset is set up as a 1-dimensional extendable dataset that is basically equivalent to a "file." It is called an IFILE, and contains all the database tables needed to represent the vectors in a relational-spatial database. The database used is spatialite (www.gaia-gis.it/fossil/libspatialite), which is implemented on top of sqlite3 (www.sqlite.org). A driver was written in order to make it use the internal file instead of the usual file on a user's hard drive. This is described in chapter vfs_library}.

Yet to Implement

LUTs, PCTS, GCPs,

bayesian classn signatures

random forests data

modality-specific header info.

2d, 3d meshes

2d, 3d triangulations

2d, 3d objects

and we can use the sqlite3 to make a standard relational database, instead of the spatialite database we have already created for vectors. This can be used for std spreadsheets, etc....

lots of arbitrary data formats for various computer codes can be re-created as IFILE's with minor modifications to the original codes (as I've implemented all the C file operations for IFILE's).

is it possible to include ESMF in this?

Need forward and inverse models of all kinds.

calibration

etc.