Basics¶
Metadata is important to describe data, its also a pain to keep track of and standardize. mt_metadata
was written to make it easier to standardize metadata, specifically MT metadata, though not exclusively. There are standard ways of created schema’s for metadata for example in XML or JSON. We decided to be agnostic to those formats and internally use Python’s built-in dictionary object. Provided are tools to read/write XML and JSON formats if desired.
All values input are validated against the standards and makes sure the data type is correct. More on that below.
Here basic usages of the mt_metadata
module are demonstrated.
Base Class¶
mt_metadata.base.Base
is the base for which all metadata objects are built upon. Base
provides convenience methods to input and output metadata in different formats XML, JSON, Python dictionary, Pandas Series. It also provides functions to help the user understand what’s inside.
The underlying attribute of Base
that controls how inputs are validated and what keywords are included is _attr_dict
. This dictionary can be input manually, but are usually loaded automatically when called. Base._attr_dict = {}
to begin with. To build useful versions of Base
an _attr_dict
needs to be input, commonly on initialization.
The metadata objects that inherit Base
have the _attr_dict
input on initialization from JSON files that provide the keywords and attributes of those keywords that describe how to validate them. For example
{
"name": {
"type": "string",
"required": true,
"style": "free form",
"units": null,
"description": "Persons name, should be full first and last name.",
"options": [],
"alias": [],
"example": "person name",
"default": null
}
}
Here the keyword is name
, it should be a free_form
string
that describes the name of a person. The default value is null
. Any keyword added needs to have this form with the attributes
Attribute | Description | Options |
---|---|---|
“type” | Data type this keyword should be, must be a native Python type | float, str, int, bool, list |
“required” | Is this keyword required by the metadata standard | True or False |
“style” | If the “type” is a string, what type of string is it, how should it be formatted | “name”, “url”, “email”, “number”, “date”, “free form”, “time”, “date time”, “name list”, “number list”, “controlled vocabulary”, “alpha numeric” |
“units” | What units the keyword should be in | SI units |
“description” | Full description of what this keyword describes | |
“options” | If the “style” is controlled provide a list of options the keyword can be | list of options |
“alias” | Is this keyword known by other names (not currently implemented) | |
“example” | An example of what the keyword should look like | |
“default” | Default value | depends on "type |
Under the hood¶
To make it easier for the user and to help standardize metadata, the standards for each metadata element and more complex metadata objects are stored as JSON files within the package. When a class is initialized it opens the appropriate JSON file loads to populate the _attr_dict
which is used to initilize the metadata object. For example if you want to have a metadata object for location you would do from mt_metadata.timeseries import Location
. Location
is going to open the location.json
file stored in mt_metadata.timeseries.standards
to populate the _attr_dict
. This is in turn used to initalize a Location
object. In this way the standards can be respected whilst allowing the metadata objects to be user friendly because all metadata attributes can be accessed in a Python way like
l = Location()
l.latitude = 50
Methods of Base¶
Base has the following methods. It also overloads built-in methods like __eq__
, __ne__
, for comparing 2 similar metadata objects, __len__
, __str__
, __repr__
(see below).
Method | Purpose |
---|---|
add_base_attribute | add a base attribute with a dictionary as above |
attribute_information | print attribute information for a given attribute or all |
from_dict | fill keyword values from a dictionary |
from_json | fill keyword values from a json string or file |
from_series | fill keyword values from a pandas.Series |
from_xml | fill keyword values from and XML string or file |
get_attr_from_name | get an attribute from a complex name separated by a . like location.latitude |
get_attribute_list | get a list of attributes in the object |
set_attr_from_name | set an attribute from a complex name separated by a . like location.longitude |
to_dict | export the keywords and values as a dictionary |
to_json | export the keywords and values as a JSON string |
to_series | export the ke words and values as a pandas.Series object |
to_xml | export the keywords and values as an XML element |
update | update the values from a similar metadata object |
Exmple¶
A simple demonstration of Base
and how to add attributes and figure out what is in the metadata and standards.
from mt_metadata.base import Base
b = Base()
Add attributes¶
You can add attibutes to an existing metadata object. All you need is to add a standards dictionary that describes the new attribute.
Here we will add an extra attribute for temperature. We will allow it to only have two options ‘ambient’ or ‘air’. It will be a string
but is not required.
extra = {
'type': str,
'style': 'controlled vocabulary',
'required': False,
'units': 'celsius',
'description': 'local temperature',
'alias': ['temp'],
'options': [ 'ambient', 'air'],
'example': 'ambient',
'default': None
}
b.add_base_attribute("temperature", "ambient", extra)
The __repr__
¶
The base class __repr__
is represented by the JSON representation of the object.
b
{
"base": {
"temperature": "ambient"
}
}
The __str__
¶
The __str__
of the class is a printed list
print(b)
base:
temperature = ambient
Attribute Information and List¶
There is also a convenience method to get attribute information.
b.get_attribute_list()
['temperature']
b.attribute_information()
temperature:
alias: ['temp']
default: None
description: local temperature
example: ambient
options: ['ambient', 'air']
required: False
style: controlled vocabulary
type: <class 'str'>
units: celsius
==================================================
b.attribute_information("temperature")
temperature:
alias: ['temp']
default: None
description: local temperature
example: ambient
options: ['ambient', 'air']
required: False
style: controlled vocabulary
type: <class 'str'>
units: celsius
Validation¶
Validation of the attribute is the most important part of having a separate module for the metadata. The validation processes
- First assures the
type
is the correct type prescribed by the metadata. For example in the above example the prescribed data type fortemperature
is astring
. Therefore when the value is set, the validators make sure the value is a string. If it is not it is converted to a string if possible. If not aValueError
is thrown. - If the
style
iscontrolled vocabulary
then the value is checked againstoptions
. Ifother
is in options that allows other options to be input that are not in the list, kind of a accept anything key. - If a value of None is given the proper None type is set. If the
style
is a date then the None value for is set to 1980-01-01T00:00:00, or iflist
instyle
the value is set to [].
When the standards are first read in if required
is True the value is set to the given default value. If required
is False the value is set to the appropriate None value.
extra = {
'type': float,
'style': 'number',
'required': True,
'units': None,
'description': 'height',
'alias': [],
'options': [],
'example': 10.0,
'default': 0.0
}
b.add_base_attribute("height", 0, extra)
b.height = "11.7"
print(b)
base:
height = 11.7
temperature = ambient
b.temperature = "fail"
2022-09-28 11:05:20,328 [line 359] mt_metadata.base.metadata.base.__setattr__ - ERROR: fail not found in options list ['ambient', 'air']
---------------------------------------------------------------------------
MTSchemaError Traceback (most recent call last)
~\AppData\Local\Temp\1\ipykernel_16268\1125618368.py in <cell line: 1>()
----> 1 b.temperature = "fail"
~\OneDrive - DOI\Documents\GitHub\mt_metadata\mt_metadata\base\metadata.py in __setattr__(self, name, value)
358 if not accept:
359 self.logger.error(msg.format(value, options))
--> 360 raise MTSchemaError(msg.format(value, options))
361 if other and not accept:
362 self.logger.warning(msg.format(value, options, name))
MTSchemaError: fail not found in options list ['ambient', 'air']
A more complicated example¶
We will look at a more complicated metadata object mt_metadata.timeseries.Location
from mt_metadata.timeseries import Location
here = Location()
here.get_attribute_list()
['datum',
'declination.comments',
'declination.epoch',
'declination.model',
'declination.value',
'elevation',
'latitude',
'longitude',
'x',
'x2',
'y',
'y2',
'z',
'z2']
here.attribute_information()
latitude:
alias: ['lat']
default: 0.0
description: latitude of location in datum specified at survey level
example: 23.134
options: []
required: True
style: number
type: float
units: degrees
==================================================
longitude:
alias: ['lon', 'long']
default: 0.0
description: longitude of location in datum specified at survey level
example: 14.23
options: []
required: True
style: number
type: float
units: degrees
==================================================
elevation:
alias: ['elev']
default: 0.0
description: elevation of location in datum specified at survey level
example: 123.4
options: []
required: True
style: number
type: float
units: meters
==================================================
datum:
alias: []
default: None
description: Datum of the location values. Usually a well known datum like WGS84.
example: WGS84
options: ['WGS84', 'NAD83', 'other']
required: False
style: controlled vocabulary
type: string
units: None
==================================================
x:
alias: ['east', 'easting']
default: None
description: relative distance to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
x2:
alias: ['east', 'easting']
default: None
description: relative distance to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
y:
alias: ['north', 'northing']
default: None
description: relative distance to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
y2:
alias: ['north', 'northing']
default: None
description: relative distance to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
z:
alias: []
default: None
description: relative elevation to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
z2:
alias: []
default: None
description: relative elevation to the center of the station
example: 10.0
options: []
required: False
style: number
type: float
units: meters
==================================================
declination.comments:
alias: []
default: None
description: any comments on declination
example: estimated from WMM 2016
options: []
required: False
style: free form
type: string
units: None
==================================================
declination.model:
alias: []
default: WMM
description: geomagnetic reference model used to calculate declination
example: WMM
options: ['EMAG2', 'EMM', 'HDGM', 'IGRF', 'WMM', 'unknown', 'other']
required: True
style: controlled vocabulary
type: string
units: None
==================================================
declination.epoch:
alias: []
default: None
description: Epoch for which declination was approximated in.
example: 2020
options: []
required: False
style: free form
type: string
units: None
==================================================
declination.value:
alias: []
default: 0.0
description: declination angle relative to geographic north positive clockwise
example: 12.3
options: []
required: True
style: number
type: float
units: degrees
==================================================
Getting/Setting an attribute¶
These methods are convenience methods for getting/setting complicated attributes. For instance getting/setting the declination value from a single call. This is helpful when filling metadata from a file.
here.set_attr_from_name("declination.value", 10)
print(here)
location:
declination.model = WMM
declination.value = 10.0
elevation = 0.0
latitude = 0.0
longitude = 0.0
here.get_attr_from_name("declination.value")
10.0
# This is the same as
here.declination.value
10.0
Dictionary¶
The basic element that the metadata can be in is a Python dictionary with key, value pairs.
here.to_dict()
{'location': OrderedDict([('declination.model', 'WMM'),
('declination.value', 10.0),
('elevation', 0.0),
('latitude', 0.0),
('longitude', 0.0)])}
here.from_dict(
{
"location": {
"declination.value": -11.0,
"elevation": 759.0,
"latitude": -34.0,
"longitude": -104.0
}
}
)
print(here)
location:
declination.model = WMM
declination.value = -11.0
elevation = 759.0
latitude = -34.0
longitude = -104.0
JSON¶
JSON is a standard format human/machine readable and well supported in Python. There are methods to to read/write JSON files.
# Compact form
print(here.to_json())
{
"location": {
"declination.model": "WMM",
"declination.value": -11.0,
"elevation": 759.0,
"latitude": -34.0,
"longitude": -104.0
}
}
here.from_json('{"location": {"declination.model": "WMM", "declination.value": 10.0, "elevation": 99.0, "latitude": 40.0, "longitude": -120.0}}')
print(here)
location:
declination.model = WMM
declination.value = 10.0
elevation = 99.0
latitude = 40.0
longitude = -120.0
# Nested form
print(here.to_json(nested=True))
{
"location": {
"declination": {
"model": "WMM",
"value": 10.0
},
"elevation": 99.0,
"latitude": 40.0,
"longitude": -120.0
}
}
here.from_json('{"location": {"declination": {"model": "WMM", "value": -12.0}, "elevation": 199.0, "latitude": 20.0, "longitude": -110.0}}')
print(here)
location:
declination.model = WMM
declination.value = -12.0
elevation = 199.0
latitude = 20.0
longitude = -110.0
XML¶
XML is also a common format for metadata, though not as human readable.
print(here.to_xml(string=True))
<?xml version="1.0" ?>
<location>
<declination>
<model>WMM</model>
<value units="degrees">-12.0</value>
</declination>
<elevation units="meters">199.0</elevation>
<latitude units="degrees">20.0</latitude>
<longitude units="degrees">-110.0</longitude>
</location>
from xml.etree import cElementTree as et
location = et.Element('location')
lat = et.SubElement(location, 'latitude')
lat.text = "-10"
here.from_xml(location)
print(here)
location:
declination.model = WMM
declination.value = -12.0
elevation = 199.0
latitude = -10.0
longitude = -110.0
Pandas Series¶
Pandas is a common data base object that is commonly used for columnar data. A series is basically like a single row in a data base.
pd_series = here.to_series()
print(pd_series)
declination.model WMM
declination.value -12.0
elevation 199.0
latitude -10.0
longitude -110.0
dtype: object
from pandas import Series
location_series = Series(
{
'declination.model': 'WMM',
'declination.value': -14.0,
'elevation': 399.0,
'latitude': -14.0,
'longitude': -112.0
}
)
here.from_series(location_series)
print(here)
location:
declination.model = WMM
declination.value = -14.0
elevation = 399.0
latitude = -14.0
longitude = -112.0