Skip to article frontmatterSkip to article content

Basics

Metadata is important to describe data, its also a pain to keep track of and standardize. mt_metadata was written to make it easier to standardize metadata, specifically MT metadata, though not exclusively. There are standard ways of created schema’s for metadata for example in XML or JSON. We decided to be agnostic to those formats and internally use Python’s built-in dictionary object. Provided are tools to read/write XML and JSON formats if desired.

All values input are validated against the standards and makes sure the data type is correct. More on that below.

Here basic usages of the mt_metadata module are demonstrated.

Base Class

mt_metadata.base.Base is the base for which all metadata objects are built upon. Base provides convenience methods to input and output metadata in different formats XML, JSON, Python dictionary, Pandas Series. It also provides functions to help the user understand what’s inside.

The underlying attribute of Base that controls how inputs are validated and what keywords are included is _attr_dict. This dictionary can be input manually, but are usually loaded automatically when called. Base._attr_dict = {} to begin with. To build useful versions of Base an _attr_dict needs to be input, commonly on initialization.

The metadata objects that inherit Base have the _attr_dict input on initialization from JSON files that provide the keywords and attributes of those keywords that describe how to validate them. For example

{
    "name": {
        "type": "string",
        "required": true,
        "style": "free form",
        "units": null,
        "description": "Persons name, should be full first and last name.",
        "options": [],
        "alias": [],
        "example": "person name",
        "default": null
        }
}

Here the keyword is name, it should be a free_form string that describes the name of a person. The default value is null. Any keyword added needs to have this form with the attributes

AttributeDescriptionOptions
“type”Data type this keyword should be, must be a native Python typefloat, str, int, bool, list
“required”Is this keyword required by the metadata standardTrue or False
“style”If the “type” is a string, what type of string is it, how should it be formatted“name”, “url”, “email”, “number”, “date”, “free form”, “time”, “date time”, “name list”, “number list”, “controlled vocabulary”, “alpha numeric”
“units”What units the keyword should be inSI units
“description”Full description of what this keyword describes
“options”If the “style” is controlled provide a list of options the keyword can belist of options
“alias”Is this keyword known by other names (not currently implemented)
“example”An example of what the keyword should look like
“default”Default valuedepends on "type

Under the hood

To make it easier for the user and to help standardize metadata, the standards for each metadata element and more complex metadata objects are stored as JSON files within the package. When a class is initialized it opens the appropriate JSON file loads to populate the _attr_dict which is used to initilize the metadata object. For example if you want to have a metadata object for location you would do from mt_metadata.timeseries import Location. Location is going to open the location.json file stored in mt_metadata.timeseries.standards to populate the _attr_dict. This is in turn used to initalize a Location object. In this way the standards can be respected whilst allowing the metadata objects to be user friendly because all metadata attributes can be accessed in a Python way like

l = Location()
l.latitude = 50

Methods of Base

Base has the following methods. It also overloads built-in methods like __eq__, __ne__, for comparing 2 similar metadata objects, __len__, __str__, __repr__ (see below).

MethodPurpose
add_base_attributeadd a base attribute with a dictionary as above
attribute_informationprint attribute information for a given attribute or all
from_dictfill keyword values from a dictionary
from_jsonfill keyword values from a json string or file
from_seriesfill keyword values from a pandas.Series
from_xmlfill keyword values from and XML string or file
get_attr_from_nameget an attribute from a complex name separated by a . like location.latitude
get_attribute_listget a list of attributes in the object
set_attr_from_nameset an attribute from a complex name separated by a . like location.longitude
to_dictexport the keywords and values as a dictionary
to_jsonexport the keywords and values as a JSON string
to_seriesexport the ke words and values as a pandas.Series object
to_xmlexport the keywords and values as an XML element
updateupdate the values from a similar metadata object

Exmple

A simple demonstration of Base and how to add attributes and figure out what is in the metadata and standards.

from mt_metadata.base import Base

b = Base()

Add attributes

You can add attibutes to an existing metadata object. All you need is to add a standards dictionary that describes the new attribute.

Here we will add an extra attribute for temperature. We will allow it to only have two options ‘ambient’ or ‘air’. It will be a string but is not required.

extra = {
    'type': str,
    'style': 'controlled vocabulary',
    'required': False,
    'units': 'celsius',
    'description': 'local temperature',
    'alias': ['temp'],
    'options': [ 'ambient', 'air'],
    'example': 'ambient',
    'default': None
}
b.add_base_attribute("temperature", "ambient", extra)

The __repr__

The base class __repr__ is represented by the JSON representation of the object.

b
{ "base": { "temperature": "ambient" } }

The __str__

The __str__ of the class is a printed list

print(b)
base:
	temperature = ambient

Attribute Information and List

There is also a convenience method to get attribute information.

b.get_attribute_list()
['temperature']
b.attribute_information()
temperature:
	alias: ['temp']
	default: None
	description: local temperature
	example: ambient
	options: ['ambient', 'air']
	required: False
	style: controlled vocabulary
	type: <class 'str'>
	units: celsius
==================================================
b.attribute_information("temperature")
temperature:
	alias: ['temp']
	default: None
	description: local temperature
	example: ambient
	options: ['ambient', 'air']
	required: False
	style: controlled vocabulary
	type: <class 'str'>
	units: celsius

Validation

Validation of the attribute is the most important part of having a separate module for the metadata. The validation processes

  1. First assures the type is the correct type prescribed by the metadata. For example in the above example the prescribed data type for temperature is a string. Therefore when the value is set, the validators make sure the value is a string. If it is not it is converted to a string if possible. If not a ValueError is thrown.
  2. If the style is controlled vocabulary then the value is checked against options. If other is in options that allows other options to be input that are not in the list, kind of a accept anything key.
  3. If a value of None is given the proper None type is set. If the style is a date then the None value for is set to 1980-01-01T00:00:00, or if list in style the value is set to [].

When the standards are first read in if required is True the value is set to the given default value. If required is False the value is set to the appropriate None value.

extra = {
    'type': float,
    'style': 'number',
    'required': True,
    'units': None,
    'description': 'height',
    'alias': [],
    'options': [],
    'example': 10.0,
    'default': 0.0
}
b.add_base_attribute("height", 0, extra)
b.height = "11.7"
print(b)
base:
	height = 11.7
	temperature = ambient
b.temperature = "fail"
2022-09-28 11:05:20,328 [line 359] mt_metadata.base.metadata.base.__setattr__ - ERROR: fail not found in options list ['ambient', 'air']
---------------------------------------------------------------------------
MTSchemaError                             Traceback (most recent call last)
~\AppData\Local\Temp\1\ipykernel_16268\1125618368.py in <cell line: 1>()
----> 1 b.temperature = "fail"

~\OneDrive - DOI\Documents\GitHub\mt_metadata\mt_metadata\base\metadata.py in __setattr__(self, name, value)
    358                     if not accept:
    359                         self.logger.error(msg.format(value, options))
--> 360                         raise MTSchemaError(msg.format(value, options))
    361                     if other and not accept:
    362                         self.logger.warning(msg.format(value, options, name))

MTSchemaError: fail not found in options list ['ambient', 'air']

A more complicated example

We will look at a more complicated metadata object mt_metadata.timeseries.Location

from mt_metadata.timeseries import Location
here = Location()
here.get_attribute_list()
['datum', 'declination.comments', 'declination.epoch', 'declination.model', 'declination.value', 'elevation', 'latitude', 'longitude', 'x', 'x2', 'y', 'y2', 'z', 'z2']
here.attribute_information()
latitude:
	alias: ['lat']
	default: 0.0
	description: latitude of location in datum specified at survey level
	example: 23.134
	options: []
	required: True
	style: number
	type: float
	units: degrees
==================================================
longitude:
	alias: ['lon', 'long']
	default: 0.0
	description: longitude of location in datum specified at survey level
	example: 14.23
	options: []
	required: True
	style: number
	type: float
	units: degrees
==================================================
elevation:
	alias: ['elev']
	default: 0.0
	description: elevation of location in datum specified at survey level
	example: 123.4
	options: []
	required: True
	style: number
	type: float
	units: meters
==================================================
datum:
	alias: []
	default: None
	description: Datum of the location values.  Usually a well known datum like WGS84.
	example: WGS84
	options: ['WGS84', 'NAD83', 'other']
	required: False
	style: controlled vocabulary
	type: string
	units: None
==================================================
x:
	alias: ['east', 'easting']
	default: None
	description: relative distance to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
x2:
	alias: ['east', 'easting']
	default: None
	description: relative distance to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
y:
	alias: ['north', 'northing']
	default: None
	description: relative distance to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
y2:
	alias: ['north', 'northing']
	default: None
	description: relative distance to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
z:
	alias: []
	default: None
	description: relative elevation to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
z2:
	alias: []
	default: None
	description: relative elevation to the center of the station
	example: 10.0
	options: []
	required: False
	style: number
	type: float
	units: meters
==================================================
declination.comments:
	alias: []
	default: None
	description: any comments on declination
	example: estimated from WMM 2016
	options: []
	required: False
	style: free form
	type: string
	units: None
==================================================
declination.model:
	alias: []
	default: WMM
	description: geomagnetic reference model used to calculate declination
	example: WMM
	options: ['EMAG2', 'EMM', 'HDGM', 'IGRF', 'WMM', 'unknown', 'other']
	required: True
	style: controlled vocabulary
	type: string
	units: None
==================================================
declination.epoch:
	alias: []
	default: None
	description: Epoch for which declination was approximated in.
	example: 2020
	options: []
	required: False
	style: free form
	type: string
	units: None
==================================================
declination.value:
	alias: []
	default: 0.0
	description: declination angle relative to geographic north positive clockwise
	example: 12.3
	options: []
	required: True
	style: number
	type: float
	units: degrees
==================================================

Getting/Setting an attribute

These methods are convenience methods for getting/setting complicated attributes. For instance getting/setting the declination value from a single call. This is helpful when filling metadata from a file.

here.set_attr_from_name("declination.value", 10)
print(here)
location:
	declination.model = WMM
	declination.value = 10.0
	elevation = 0.0
	latitude = 0.0
	longitude = 0.0
here.get_attr_from_name("declination.value")
10.0
# This is the same as
here.declination.value
10.0

Dictionary

The basic element that the metadata can be in is a Python dictionary with key, value pairs.

here.to_dict()
{'location': OrderedDict([('declination.model', 'WMM'), ('declination.value', 10.0), ('elevation', 0.0), ('latitude', 0.0), ('longitude', 0.0)])}
here.from_dict(
    {
        "location": {
            "declination.value": -11.0,
            "elevation": 759.0,
            "latitude": -34.0,
            "longitude": -104.0
        }
    }
)
print(here)
location:
	declination.model = WMM
	declination.value = -11.0
	elevation = 759.0
	latitude = -34.0
	longitude = -104.0

JSON

JSON is a standard format human/machine readable and well supported in Python. There are methods to to read/write JSON files.

# Compact form
print(here.to_json())
{
    "location": {
        "declination.model": "WMM",
        "declination.value": -11.0,
        "elevation": 759.0,
        "latitude": -34.0,
        "longitude": -104.0
    }
}
here.from_json('{"location": {"declination.model": "WMM", "declination.value": 10.0, "elevation": 99.0, "latitude": 40.0, "longitude": -120.0}}')
print(here)
location:
	declination.model = WMM
	declination.value = 10.0
	elevation = 99.0
	latitude = 40.0
	longitude = -120.0
# Nested form
print(here.to_json(nested=True))
{
    "location": {
        "declination": {
            "model": "WMM",
            "value": 10.0
        },
        "elevation": 99.0,
        "latitude": 40.0,
        "longitude": -120.0
    }
}
here.from_json('{"location": {"declination": {"model": "WMM", "value": -12.0}, "elevation": 199.0, "latitude": 20.0, "longitude": -110.0}}')
print(here)
location:
	declination.model = WMM
	declination.value = -12.0
	elevation = 199.0
	latitude = 20.0
	longitude = -110.0

XML

XML is also a common format for metadata, though not as human readable.

print(here.to_xml(string=True))
<?xml version="1.0" ?>
<location>
    <declination>
        <model>WMM</model>
        <value units="degrees">-12.0</value>
    </declination>
    <elevation units="meters">199.0</elevation>
    <latitude units="degrees">20.0</latitude>
    <longitude units="degrees">-110.0</longitude>
</location>

from xml.etree import cElementTree as et
location = et.Element('location')
lat = et.SubElement(location, 'latitude')
lat.text = "-10"
here.from_xml(location)
print(here)
location:
	declination.model = WMM
	declination.value = -12.0
	elevation = 199.0
	latitude = -10.0
	longitude = -110.0

Pandas Series

Pandas is a common data base object that is commonly used for columnar data. A series is basically like a single row in a data base.

pd_series = here.to_series()
print(pd_series)
declination.model      WMM
declination.value    -12.0
elevation            199.0
latitude             -10.0
longitude           -110.0
dtype: object
from pandas import Series

location_series = Series(
    {
        'declination.model': 'WMM',
         'declination.value': -14.0,
         'elevation': 399.0,
         'latitude': -14.0,
         'longitude': -112.0
    }
)

here.from_series(location_series)
print(here)
location:
	declination.model = WMM
	declination.value = -14.0
	elevation = 399.0
	latitude = -14.0
	longitude = -112.0