seisflows.preprocess.default
The SeisFlows Preprocessing module is in charge of interacting with seismic data (observed and synthetic). It should contain functionality to read and write seismic data, apply preprocessing such as filtering, quantify misfit, and write adjoint sources that are expected by the solver.
Classes
Default Preprocess [Preprocess Base] |
Functions
|
Waveform reading functionality. Imports waveforms as Obspy streams |
|
Waveform writing functionality. Writes waveforms back to format that |
|
Converts SPECFEM synthetics into ObsPy Stream objects with the correct |
|
SPECFEM requires that adjoint traces be present for every matching |
|
Parallelizable function to write out empty adjoint source |
Module Contents
- class seisflows.preprocess.default.Default(syn_data_format='ascii', obs_data_format='ascii', unit_output='VEL', misfit='waveform', adjoint='waveform', normalize=None, filter=None, min_period=None, max_period=None, min_freq=None, max_freq=None, mute=None, early_slope=None, early_const=None, late_slope=None, late_const=None, short_dist=None, long_dist=None, plot_waveforms=False, source_prefix=None, workdir=os.getcwd(), path_preprocess=None, path_solver=None, **kwargs)
Default Preprocess [Preprocess Base]
Data processing for seismic traces, with options for data misfit, filtering, normalization and muting.
Parameters
- type obs_data_format:
str
- param obs_data_format:
data format for reading observed traces into memory. Available formats: ‘su’, ‘ascii’, ‘sac’
- type unit_output:
str
- param unit_output:
Data units. Must match the synthetic output of external solver. Available: [‘DISP’: displacement, ‘VEL’: velocity, ‘ACC’: acceleration, ‘PRE’: pressure]
- type misfit:
str
- param misfit:
misfit function for waveform comparisons. For available see seisflows.plugins.preprocess.misfit
- type adjoint:
str
- param adjoint:
adjoint source misfit function (backprojection function for migration, or the objective function in FWI). For available see seisflows.plugins.preprocess.adjoint
- type normalize:
str
- param normalize:
Data normalization parameters used to normalize the amplitudes of waveforms. By default, set to NoneType, which means no normalization is applied. User can choose from one of the following options to normalize BOTH obs and syn data:
TNORML1: normalize per trace by the L1 norm of itself
TNORML2: normalize per trace by the L2 norm of itself
TNORM_MAX: normalize by the maximum positive amplitude in the trace
TNORM_ABSMAX: normalize by the absolute maximum amplitude in the trace
TNORM_MEAN: normalize by the mean of the absolute trace
- RNORM_OBS_MAX: normalize synthetic traces by the maximum amplitude of
the corresponding observed trace
- RNORM_OBS_ABSMAX: normalize synthetic traces by the absolute maximum
amplitude of the corresponding observed trace
- RNORM_SYN_MAX: normalize observed traces by the maximum amplitude of
the corresponding synthetic trace
- RNORM_SYN_ABSMAX: normalize observed traces by the absolute maximum
amplitude of the corresponding synthetic trace
Note: options ENORML? are not currently available. If this is a feature you would like to see, please open a GitHub Issue. - ENORML1: normalize per event by L1 of traces; OR - ENORML2: normalize per event by L2 of traces;
- type filter:
str
- param filter:
Data filtering type, by default no filtering is applied. Available options for user to choose are:
BANDPASS (requires: MIN_FREQ + MAX_FREQ OR MIN_PERIOD + MAX PERIOD);
LOWPASS (requires: MAX_FREQ OR MIN_PERIOD);
HIGHPASS (requires: MIN_FREQ OR MAX_PERIOD);
- type min_period:
float
- param min_period:
Minimum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters.
- type max_period:
float
- param max_period:
Maximum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters.
- type min_freq:
float
- param min_freq:
Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters.
- type max_freq:
float
- param max_freq:
Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters.
- type mute:
list
- param mute:
Data mute parameters used to zero out early / late arrivals or offsets. Choose the following: - EARLY: mute early arrivals - LATE: mute late arrivals; - SHORT: mute short source-receiver distances; - LONG: mute long source-receiver distances input comma separated list of strings, (e.g., mute: early,late,short)
- type plot_waveforms:
bool
- param plot_waveforms:
plot waveforms from each evaluation of the misfit. By default turned off as this can produce many files for an interative inversion.
Paths
- type path_preprocess:
str
- param path_preprocess:
scratch path for all preprocessing processes, including saving files
- syn_data_format = ''
- obs_data_format = ''
- unit_output = ''
- misfit = 'waveform'
- adjoint = 'waveform'
- normalize = None
- filter = None
- min_period = None
- max_period = None
- min_freq = None
- max_freq = None
- mute = []
- early_slope = None
- early_const = None
- late_slope = None
- late_const = None
- short_dist = None
- long_dist = None
- source_prefix = None
- plot_waveforms = False
- path
- _syn_acceptable_data_formats = ['SU', 'ASCII']
- _obs_acceptable_data_formats = ['SU', 'ASCII', 'SAC']
- _acceptable_unit_output = ['DISP', 'VEL', 'ACC', 'PRE']
- _acceptable_misfits
- _acceptable_adjsrcs
- _acceptable_norms
- _acceptable_mutes
- _acceptable_filters
- _iteration = None
- _step_count = None
- check()
Checks parameters and paths
- setup()
Sets up data preprocessing machinery
- finalize()
Teardown procedures for the default preprocessing class. Required to keep things general because Pyaflowa preprocessing module has some finalize procedures.
- _calculate_misfit(**kwargs)
Wrapper for plugins.preprocess.misfit misfit/objective function
- _generate_adjsrc(**kwargs)
Wrapper for plugins.preprocess.adjoint source/backproject function
- quantify_misfit(source_name=None, save_residuals=None, export_residuals=None, save_adjsrcs=None, components=None, iteration=1, step_count=0, _serial=False, **kwargs)
Prepares solver for gradient evaluation by writing residuals and adjoint traces. Meant to be called by solver.eval_func().
Reads in observed and synthetic waveforms, applies optional preprocessing, assesses misfit, and writes out adjoint sources and STATIONS_ADJOINT file. Processing for each station is done in parallel using concurrent.futures.
Warning
The concurrent processing in this function may fail in the case that a User is running N>1 events using the ‘Cluster’ system but on a local workstation, because each event is also run with multiprocessing, so their compute may quickly run out of RAM or cores. Might need to introduce max_workers_preproc and max_workers_solver to ensure that there is a good balance between the two values.
- Parameters:
source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()
save_residuals (str) – if not None, path to write misfit/residuls to
export_residuals (str) – export all residuals (data-synthetic misfit) that are generated by the external solver to path_output. If False, residuals stored in scratch may be discarded at any time in the workflow
save_adjsrcs (str) – if not None, path to write adjoint sources to
components (list) – optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., [‘Z’, ‘N’]. If None, all available components will be considered.
iteration (int) – optional, current iteration of the workflow used for tagging output waveform figures if internal parameter plot_waveforms is set
step_count (int) – optional, current step count of the line search, used for tagging output waveform figures if internal parameter plot_waveforms is set
_serial (bool) – debug function to turn preprocessing to a serial task whereas it is normally a multiprocessed parallel task
- _setup_quantify_misfit(source_name, save_adjsrcs=None, components=None)
Gather a list of filenames of matching waveform IDs that can be run through the misfit quantification step, and generate empty adjoint sources so that Solver knows which components are zero’d out.
- Parameters:
source_name (str) – the name of the source to process
components (list) – optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., [‘Z’, ‘N’]. If None, all available components will be considered.
- Return type:
list of tuples
- Returns:
[(observed filename, synthetic filename)]. tuples will contain filenames for matching stations + component for obs and syn
- _quantify_misfit_single(obs_fid, syn_fid, source_name, save_residuals=None, save_adjsrcs=None)
Run misfit quantification for one pair of data-synthetic waveforms. This is kept in a separate function so that it can be parallelized for more efficient processing.
- Parameters:
obs_fid (str) – filename for the observed waveform to be processed
syn_fid (str) – filename for the synthetic waveform to be procsesed
source_name (str) – name of the source used for getting origintime of synthetic waveforms and tagging output waveform figures if internal parameter plot is set True
save_residuals (str) – if not None, path to write misfit/residuls to
save_adjsrcs (str) – if not None, path to write adjoint sources to
- Return type:
float
- Returns:
residual value, calculated by the chosen misfit function comparing obs and syn
- preprocess(obs, syn)
Convenience function that wraps all preprocessing steps so that they can be more easily overwritten if the User wants preprocessing that does not follow this convention
- Parameters:
obs (obspy.core.stream.Stream) – Stream containing observed waveforms
syn (obspy.core.stream.Stream) – Stream containing synthetic waveforms
- seisflows.preprocess.default.read(fid, data_format, **kwargs)
Waveform reading functionality. Imports waveforms as Obspy streams
- Parameters:
fid (str) – path to file to read data from
data_format (str) – format of the file to read data from
- Return type:
obspy.core.stream.Stream
- Returns:
ObsPy stream containing data stored in fid
- Raises:
TypeError – if the provided data format cannot be read
- seisflows.preprocess.default.write(st, fid, data_format)
Waveform writing functionality. Writes waveforms back to format that SPECFEM recognizes
- Parameters:
st (obspy.core.stream.Stream) – stream to write
fid (str) – path to file to write stream to
data_format (str) – format of the file to write to
- seisflows.preprocess.default.read_ascii(fid, origintime=None, **kwargs)
Converts SPECFEM synthetics into ObsPy Stream objects with the correct header information.
Note
This is a trimmed down version of pysep.utils.io.read_sem() which is copied here for better visibility within SeisFlows, and ignores things like SAC headers appending
- Parameters:
fid (str) – path of the given ascii file
origintime (obspy.UTCDateTime) – UTCDatetime object for the origintime of the event
- Rtype st:
obspy.Stream.stream
- Return st:
stream containing header and data info taken from ascii file
- seisflows.preprocess.default.initialize_adjoint_traces(data_filenames, fmt, path_out='./')
SPECFEM requires that adjoint traces be present for every matching synthetic seismogram. If an adjoint source does not exist, it is simply set as zeros. This function creates all adjoint traces as zeros, to be filled out later. Does this in parallel for speedup
- Parameters:
data_filenames (list of str) – existing solver waveforms (synthetic) to read. These will be copied, zerod out, and saved to path save. Should come from solver.data_filenames
fmt (str) – format of the input waveforms which will be fed to the read function to get a working Stream object used to write empty adjsrcs
path_out (str) – path to save the new adjoint traces to. Ideally this is set to ‘solver/traces/adj’
- seisflows.preprocess.default._write_adjsrc_single(st, fid, output, fmt)
Parallelizable function to write out empty adjoint source