Example of Working with a Version 0.1.0 MTH5 File¶
There are 2 different flavors of MTH5 files at the moment. Version 0.1.0 can only contain one survey, whereas version 0.2.0 can contain multiple surveys. There are benfits to both. If you are only storing a single station per file, like for quicker archiving and sharing, then version 0.1.0 is a good choice. If you are storing multiple stations across different surveys, or are storing transfer functions from multiple surveys, then version 0.2.0 is the correct choice.
We will demonstrate here how to work with a version 0.1.0 file.
from mth5.mth5 import MTH5
2022-09-26 13:35:58,864 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mth5\logs\mth5_debug.log
Initialize an MTH5 object with file version 0.1.0¶
m = MTH5(file_version="0.1.0")
File Attributes¶
Have a look at the attributes of the file, these are the high level metadata that can be used to do a quick validation of the file like checking the version and file type.
file_attributes
is a property of an MTH5
object that is a convenience to summarize attributes. Each of these can be set except for file.access
by doing m.file.type = "mth5"
.
m.file_attributes
{'file.type': 'MTH5',
'file.version': '0.1.0',
'file.access.platform': 'Windows-10-10.0.19044-SP0',
'file.access.time': '2022-09-26T20:35:59.111224+00:00',
'mth5.software.version': '0.3.0',
'mth5.software.name': 'mth5',
'data_level': 1}
Dataset Options¶
Dataset options dictate how datasets (time series of transfer function data) are compressed and stored. Normally the defaults work well but if you would like smaller files, you may want to change the compression options or compression type. For more information see h5py compression.
Note that if you use MTH5 in parallel then you cannot have any compression.
Here are the data set options, where compression_opts
will vary depending on compression
.
m.dataset_options
{'compression': 'gzip',
'compression_opts': 4,
'shuffle': True,
'fletcher32': True}
The file is currently not open yet and you will get printed message informing you as much.
m
HDF5 file is closed and cannot be accessed.
Open a new file¶
We will open the file in mode w
here, which will overwrite the file if it already exists. If you don’t want to do that or are unsure if a file already exists the safest option is using mode a
. This is the default mode of opening a file to provide the most flexibility to the user. If r
is the mode then you will not have write privelages so you can’t change anything in the file. This is a good option if you just want to plot data.
m = m.open_mth5(r"example.h5", "w")
2022-09-26 13:35:59,179 [line 605] mth5.mth5.MTH5.open_mth5 - WARNING: example.h5 will be overwritten in 'w' mode
2022-09-26 13:35:59,673 [line 672] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.1.0 file example.h5 in mode w
Now that we have initiated a file, let’s see what’s in an empty file.
m
/:
====================
|- Group: Survey
----------------
|- Group: Filters
-----------------
|- Group: coefficient
---------------------
|- Group: fap
-------------
|- Group: fir
-------------
|- Group: time_delay
--------------------
|- Group: zpk
-------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Stations
------------------
--> Dataset: channel_summary
..............................
--> Dataset: tf_summary
.........................
We can see that there are default groups that are initiated by default. And here are the methods an MTH5 object contains. You can open/close an MTH5 file; add/remove station
, run
, channel
; read from an mt_metadata.timeseries.Experiment
object to fill the metadata and structure before adding data and create an mt_metadata.timeseries.Experiment
object for archiving.
MTH5 Methods¶
There are a bunch of convenince methods to access different parts of the file. Here is a list of those methods.
print("\n".join(sorted([func for func in dir(m) if callable(getattr(m, func)) and not func.startswith("_")])))
2022-09-26 13:35:59,736 [line 471] mth5.mth5.MTH5.experiment_group - INFO: File version 0.1.0 does not have an Experiment Group
2022-09-26 13:35:59,763 [line 498] mth5.mth5.MTH5.surveys_group - INFO: File version 0.1.0 does not have a survey_group, try surveys_group
add_channel
add_run
add_station
add_survey
add_transfer_function
close_mth5
from_experiment
from_reference
get_channel
get_run
get_station
get_survey
get_transfer_function
h5_is_read
h5_is_write
has_group
open_mth5
remove_channel
remove_run
remove_station
remove_survey
remove_transfer_function
to_experiment
validate_file
Add a station¶
Here we will add a station called mt001
. This will return a StationGroup
object.
station_group = m.add_station("mt001")
Add some metadata to this station like location, who acquired it, and the reference frame in which the data were collected.
station_group.metadata.location.latitude = "40:05:01"
station_group.metadata.location.longitude = -122.3432
station_group.metadata.location.elevation = 403.1
station_group.metadata.acquired_by.author = "me"
station_group.metadata.orientation.reference_frame = "geomagnetic"
# IMPORTANT: Must always use the write_metadata method when metadata is updated.
station_group.write_metadata()
station_group.metadata
{
"station": {
"acquired_by.name": "me",
"channels_recorded": [],
"data_type": "BBMT",
"geographic_name": null,
"hdf5_reference": "<HDF5 object reference>",
"id": "mt001",
"location.declination.model": "WMM",
"location.declination.value": 0.0,
"location.elevation": 403.1,
"location.latitude": 40.08361111111111,
"location.longitude": -122.3432,
"mth5_type": "Station",
"orientation.method": null,
"orientation.reference_frame": "geomagnetic",
"provenance.creation_time": "1980-01-01T00:00:00+00:00",
"provenance.software.author": "none",
"provenance.software.name": null,
"provenance.software.version": null,
"provenance.submitter.email": null,
"provenance.submitter.organization": null,
"run_list": [],
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00"
}
}
Add a Run¶
We can now add a run to the new station. We can do this in 2 ways, one directly from the m
the MTH5
object, or from the newly created station_group
run_01 = m.add_run("mt001", "001")
run_02 = station_group.add_run("002")
station_group
/Survey/Stations/mt001:
====================
|- Group: 001
-------------
|- Group: 002
-------------
|- Group: Transfer_Functions
----------------------------
Add a Channel¶
Again we can do this in 2 ways: directly from the m
the MTH5
object, or from the newly created run_01
or run_02
group. There are only 3 types of channels electric
, magnetic
, and auxiliary
and this needs to be specified when a channel is initiated. We will initate the channel with data=None
, which will create an empty data set.
ex = m.add_channel("mt001", "001", "ex", "electric", None)
hy = run_01.add_channel("hy", "magnetic", None)
run_01
/Survey/Stations/mt001/001:
====================
--> Dataset: ex
.................
--> Dataset: hy
.................
Now, let’s see what the contents are of this file
m
/:
====================
|- Group: Survey
----------------
|- Group: Filters
-----------------
|- Group: coefficient
---------------------
|- Group: fap
-------------
|- Group: fir
-------------
|- Group: time_delay
--------------------
|- Group: zpk
-------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Stations
------------------
|- Group: mt001
---------------
|- Group: 001
-------------
--> Dataset: ex
.................
--> Dataset: hy
.................
|- Group: 002
-------------
|- Group: Transfer_Functions
----------------------------
--> Dataset: channel_summary
..............................
--> Dataset: tf_summary
.........................
Channel Summary¶
This is a summary of all channels in the file and can be used to query the data easily.
%time
m.channel_summary.clear_table()
m.channel_summary.summarize()
ch_df = m.channel_summary.to_dataframe()
ch_df
Example of using the Channel Summary¶
Because the channel_summary
is a pandas.DataFrame
we can use all the fancy indexing and searching that pandas
provides. Lets say we want to get a run, but aren’t sure of the names
run_id = ch_df.run.unique()[0]
print(run_id)
001
Now we want to use the HDF5 reference to get that run without knowing much about the other metadata
run_hdf5_reference = ch_df.loc[ch_df.run == run_id, "run_hdf5_reference"][0]
run_group = m.from_reference(run_hdf5_reference)
Close MTH5 file¶
This part is important, be sure to close the file in order to save any changes. This function flushes metadata and data to the HDF5 file and then closes it. Note that once a file is closed all groups lose their link to the file and cannot retrieve any data.
m.close_mth5()
station_group
2022-09-26 13:36:00,337 [line 753] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing example.h5
2022-09-26 13:36:00,342 [line 113] mth5.groups.base.Station.__str__ - WARNING: MTH5 file is closed and cannot be accessed.
MTH5 file is closed and cannot be accessed.