Example of Working with a Version 0.2.0 MTH5 File¶
There are 2 different flavors of MTH5 files at the moment. Version 0.1.0 can only contain one survey, whereas version 0.2.0 can contain multiple surveys. There are benfits to both. If you are only storing a single station per file, like for quicker archiving and sharing, then version 0.1.0 is a good choice. If you are storing multiple stations across different surveys, or are storing transfer functions from multiple surveys, then version 0.2.0 is the correct choice.
We will demonstrate here how to work with a version 0.2.0 file.
from mth5.mth5 import MTH5
Initialize an MTH5 object with file version 0.2.0¶
m = MTH5(file_version="0.2.0")
File Attributes¶
Have a look at the attributes of the file, these are the high level metadata that can be used to do a quick validation of the file like checking the version and file type.
file_attributes
is a property of an MTH5
object that is a convenience to summarize attributes. Each of these can be set except for file.access
by doing m.file.type = "mth5"
.
m.file_attributes
{'file.type': 'MTH5',
'file.version': '0.2.0',
'file.access.platform': 'Windows-10-10.0.22631-SP0',
'file.access.time': '2024-09-25T21:32:59.795011+00:00',
'mth5.software.version': '0.4.6',
'mth5.software.name': 'mth5',
'data_level': 1}
Dataset Options¶
Dataset options dictate how datasets (time series of transfer function data) are compressed and stored. Normally the defaults work well but if you would like smaller files, you may want to change the compression options or compression type. For more information see h5py compression.
Note that if you use MTH5 in parallel then you cannot have any compression.
Here are the data set options, where compression_opts
will vary depending on compression
.Here are the data set options
m.dataset_options
{'compression': 'gzip',
'compression_opts': 4,
'shuffle': True,
'fletcher32': True}
The file is currently not open yet
m
HDF5 file is closed and cannot be accessed.
Open a new file¶
We will open the file in mode w
here, which will overwrite the file if it already exists. If you don’t want to do that or are unsure if a file already exists the safest option is using mode a
. This is the default mode of opening a file to provide the most flexibility to the user. If r
is the mode then you will not have write privelages so you can’t change anything in the file. This is a good option if you just want to plot data.
m = m.open_mth5(r"example.h5", "w")
2024-09-25T14:32:59.841538-0700 | WARNING | mth5.mth5 | open_mth5 | example.h5 will be overwritten in 'w' mode
2024-09-25T14:33:00.235291-0700 | INFO | mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file example.h5 in mode w
Now that we have initiated a file, let’s see what’s in an empty file.
m
/:
====================
|- Group: Experiment
--------------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Surveys
-----------------
--> Dataset: channel_summary
..............................
--> Dataset: fc_summary
.........................
--> Dataset: tf_summary
.........................
We can see that there are default groups that are initiated by default. Note that now the top level is Experiment
which can contain multiple surveys.
MTH5 Methods¶
And here are the methods an MTH5 object contains. You can open/close an MTH5 file; add/remove station
, run
, channel
; read from an mt_metadata.timeseries.Experiment
object to fill the metadata and structure before adding data and create an mt_metadata.timeseries.Experiment
object for archiving.
There are a bunch of convenince methods to access different parts of the file. Here is a list of those methods.
print("\n".join(sorted([func for func in dir(m) if callable(getattr(m, func)) and not func.startswith("_")])))
2024-09-25T14:33:00.275260-0700 | INFO | mth5.mth5 | filters_group | File version 0.2.0 does not have a FiltersGroup at the experiment level
2024-09-25T14:33:00.306863-0700 | INFO | mth5.mth5 | stations_group | File version 0.2.0 does not have a Stations. try surveys_group.
2024-09-25T14:33:00.307860-0700 | INFO | mth5.mth5 | survey_group | File version 0.2.0 does not have a survey_group, try surveys_group
add_channel
add_run
add_station
add_survey
add_transfer_function
close_mth5
from_experiment
from_reference
get_channel
get_run
get_station
get_survey
get_transfer_function
h5_is_read
h5_is_write
has_group
open_mth5
remove_channel
remove_run
remove_station
remove_survey
remove_transfer_function
to_experiment
validate_file
Add a Survey¶
The first step is to add a survey, here we will add the survey example
. This will return a SurveyGroup
object which will commonly be the main group we work with.
survey_group = m.add_survey("example")
Add a station¶
Here we will add a station called mt001
. This will return a StationGroup
object. We can add a station 2 ways: one directy from the MTH5
object m
or from the newly created survey_group
. Note if we add it from m
then we need to include the survey name example
.
station_group = m.add_station("mt001", survey="example")
station_group = survey_group.stations_group.add_station("mt002")
Add some metadata to this station like location, who acquired it, and the reference frame in which the data were collected.
station_group.metadata.location.latitude = "40:05:01"
station_group.metadata.location.longitude = -122.3432
station_group.metadata.location.elevation = 403.1
station_group.metadata.acquired_by.author = "me"
station_group.metadata.orientation.reference_frame = "geomagnetic"
# IMPORTANT: Must always use the write_metadata method when metadata is updated.
station_group.write_metadata()
station_group.metadata
{
"station": {
"acquired_by.name": "me",
"channels_recorded": [],
"data_type": "BBMT",
"geographic_name": null,
"hdf5_reference": "<HDF5 object reference>",
"id": "mt002",
"location.declination.model": "WMM",
"location.declination.value": 0.0,
"location.elevation": 403.1,
"location.latitude": 40.08361111111111,
"location.longitude": -122.3432,
"mth5_type": "Station",
"orientation.method": null,
"orientation.reference_frame": "geomagnetic",
"provenance.archive.name": null,
"provenance.creation_time": "1980-01-01T00:00:00+00:00",
"provenance.creator.name": null,
"provenance.software.author": null,
"provenance.software.name": null,
"provenance.software.version": null,
"provenance.submitter.email": null,
"provenance.submitter.name": null,
"provenance.submitter.organization": null,
"release_license": "CC0-1.0",
"run_list": [],
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00"
}
}
Add a Run¶
We can now add a run to the new station. We can do this in 2 ways, one directly from the m
the MTH5
object, or from the newly created station_group
run_01 = m.add_run("mt002", "001", survey="example")
run_02 = station_group.add_run("002")
station_group
/Experiment/Surveys/example/Stations/mt002:
====================
|- Group: 001
-------------
|- Group: 002
-------------
|- Group: Fourier_Coefficients
------------------------------
|- Group: Transfer_Functions
----------------------------
Add a Channel¶
Again we can do this in 2 ways: directly from the m
the MTH5
object, or from the newly created run_01
or run_02
group. There are only 3 types of channels electric
, magnetic
, and auxiliary
and this needs to be specified when a channel is initiated. We will initate the channel with data=None
, which will create an empty data set.
ex = m.add_channel("mt002", "001", "ex", "electric", None, survey="example")
hy = run_01.add_channel("hy", "magnetic", None)
hy
Channel Magnetic:
-------------------
component: hy
data type: magnetic
data format: int32
data shape: (1,)
start: 1980-01-01T00:00:00+00:00
end: 1980-01-01T00:00:00+00:00
sample rate: 0.0
Now, let’s see what the contents are of this file
m
/:
====================
|- Group: Experiment
--------------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Surveys
-----------------
|- Group: example
-----------------
|- Group: Filters
-----------------
|- Group: coefficient
---------------------
|- Group: fap
-------------
|- Group: fir
-------------
|- Group: time_delay
--------------------
|- Group: zpk
-------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Stations
------------------
|- Group: mt001
---------------
|- Group: Fourier_Coefficients
------------------------------
|- Group: Transfer_Functions
----------------------------
|- Group: mt002
---------------
|- Group: 001
-------------
--> Dataset: ex
.................
--> Dataset: hy
.................
|- Group: 002
-------------
|- Group: Fourier_Coefficients
------------------------------
|- Group: Transfer_Functions
----------------------------
--> Dataset: channel_summary
..............................
--> Dataset: fc_summary
.........................
--> Dataset: tf_summary
.........................
Channel Summary¶
We can have a look at the what channels are in this file. This can take a long time if you have lots of data. This returns a pandas.DataFrame
object and can therefore be queried with the standard Pandas
methods.
Note: the number of samples is 1 even though we did not add any data. This is because we initialize the dataset to be extendable and it needs at least 1 dimension to be initialized. We set the max shape to be (1, None) which means it can be extended to an arbitrary shape.
%time
m.channel_summary.clear_table()
m.channel_summary.summarize()
ch_df = m.channel_summary.to_dataframe()
ch_df
Run Summary¶
Channel summary is a great representation of all channels in a MTH5 file. However, if you want to process the data or have a look at synchronous block of data (runs) then a summary of runs would be more helpful. MTH5 object also includes a run summary
m.run_summary
Access channel through HDF5 Reference¶
The channel summary table contains a column labeled hdf5_reference
, this is an interal HDF5 reference that can be used directly to access that specific group or dataset. A method is provided in MTH5
to use this reference and return the proper group object. Here we will request to get the first channel in the table
h5_reference = ch_df.iloc[0].hdf5_reference
ex = m.from_reference(h5_reference)
ex
Channel Electric:
-------------------
component: ex
data type: electric
data format: int32
data shape: (1,)
start: 1980-01-01T00:00:00+00:00
end: 1980-01-01T00:00:00+00:00
sample rate: 0.0
Close MTH5 file¶
This part is important, be sure to close the file in order to save any changes. This function flushes metadata and data to the HDF5 file and then closes it. Note that once a file is closed all groups lose their link to the file and cannot retrieve any data.
m.close_mth5()
station_group
2024-09-25T14:33:01.073775-0700 | INFO | mth5.mth5 | close_mth5 | Flushing and closing example.h5
2024-09-25T14:33:01.075881-0700 | WARNING | mth5.groups.base | __str__ | MTH5 file is closed and cannot be accessed.
MTH5 file is closed and cannot be accessed.