seisflows.preprocess.default

The SeisFlows Preprocessing module is in charge of interacting with seismic data (observed and synthetic). It should contain functionality to read and write seismic data, apply preprocessing such as filtering, quantify misfit, and write adjoint sources that are expected by the solver.

Classes

Default

Default Preprocess [Preprocess Base]

Functions

read(fid, data_format, **kwargs)

Waveform reading functionality. Imports waveforms as Obspy streams

write(st, fid, data_format)

Waveform writing functionality. Writes waveforms back to format that

read_ascii(fid[, origintime])

Converts SPECFEM synthetics into ObsPy Stream objects with the correct

initialize_adjoint_traces(data_filenames, fmt[, path_out])

SPECFEM requires that adjoint traces be present for every matching

_write_adjsrc_single(st, fid, output, fmt)

Parallelizable function to write out empty adjoint source

Module Contents

class seisflows.preprocess.default.Default(syn_data_format='ascii', obs_data_format='ascii', unit_output='VEL', misfit='waveform', adjoint='waveform', normalize=None, filter=None, min_period=None, max_period=None, min_freq=None, max_freq=None, mute=None, early_slope=None, early_const=None, late_slope=None, late_const=None, short_dist=None, long_dist=None, plot_waveforms=False, source_prefix=None, workdir=os.getcwd(), path_preprocess=None, path_solver=None, **kwargs)

Default Preprocess [Preprocess Base]

Data processing for seismic traces, with options for data misfit, filtering, normalization and muting.

Parameters

type obs_data_format:

str

param obs_data_format:

data format for reading observed traces into memory. Available formats: ‘su’, ‘ascii’, ‘sac’

type unit_output:

str

param unit_output:

Data units. Must match the synthetic output of external solver. Available: [‘DISP’: displacement, ‘VEL’: velocity, ‘ACC’: acceleration, ‘PRE’: pressure]

type misfit:

str

param misfit:

misfit function for waveform comparisons. For available see seisflows.plugins.preprocess.misfit

type adjoint:

str

param adjoint:

adjoint source misfit function (backprojection function for migration, or the objective function in FWI). For available see seisflows.plugins.preprocess.adjoint

type normalize:

str

param normalize:

Data normalization parameters used to normalize the amplitudes of waveforms. By default, set to NoneType, which means no normalization is applied. User can choose from one of the following options to normalize BOTH obs and syn data:

  • TNORML1: normalize per trace by the L1 norm of itself

  • TNORML2: normalize per trace by the L2 norm of itself

  • TNORM_MAX: normalize by the maximum positive amplitude in the trace

  • TNORM_ABSMAX: normalize by the absolute maximum amplitude in the trace

  • TNORM_MEAN: normalize by the mean of the absolute trace

  • RNORM_OBS_MAX: normalize synthetic traces by the maximum amplitude of

    the corresponding observed trace

  • RNORM_OBS_ABSMAX: normalize synthetic traces by the absolute maximum

    amplitude of the corresponding observed trace

  • RNORM_SYN_MAX: normalize observed traces by the maximum amplitude of

    the corresponding synthetic trace

  • RNORM_SYN_ABSMAX: normalize observed traces by the absolute maximum

    amplitude of the corresponding synthetic trace

Note: options ENORML? are not currently available. If this is a feature you would like to see, please open a GitHub Issue. - ENORML1: normalize per event by L1 of traces; OR - ENORML2: normalize per event by L2 of traces;

type filter:

str

param filter:

Data filtering type, by default no filtering is applied. Available options for user to choose are:

  • BANDPASS (requires: MIN_FREQ + MAX_FREQ OR MIN_PERIOD + MAX PERIOD);

  • LOWPASS (requires: MAX_FREQ OR MIN_PERIOD);

  • HIGHPASS (requires: MIN_FREQ OR MAX_PERIOD);

type min_period:

float

param min_period:

Minimum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters.

type max_period:

float

param max_period:

Maximum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters.

type min_freq:

float

param min_freq:

Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters.

type max_freq:

float

param max_freq:

Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters.

type mute:

list

param mute:

Data mute parameters used to zero out early / late arrivals or offsets. Choose the following: - EARLY: mute early arrivals - LATE: mute late arrivals; - SHORT: mute short source-receiver distances; - LONG: mute long source-receiver distances input comma separated list of strings, (e.g., mute: early,late,short)

type plot_waveforms:

bool

param plot_waveforms:

plot waveforms from each evaluation of the misfit. By default turned off as this can produce many files for an interative inversion.

Paths

type path_preprocess:

str

param path_preprocess:

scratch path for all preprocessing processes, including saving files

***

syn_data_format = ''
obs_data_format = ''
unit_output = ''
misfit = 'waveform'
adjoint = 'waveform'
normalize = None
filter = None
min_period = None
max_period = None
min_freq = None
max_freq = None
mute = []
early_slope = None
early_const = None
late_slope = None
late_const = None
short_dist = None
long_dist = None
source_prefix = None
plot_waveforms = False
path
_syn_acceptable_data_formats = ['SU', 'ASCII']
_obs_acceptable_data_formats = ['SU', 'ASCII', 'SAC']
_acceptable_unit_output = ['DISP', 'VEL', 'ACC', 'PRE']
_acceptable_misfits
_acceptable_adjsrcs
_acceptable_norms
_acceptable_mutes
_acceptable_filters
_iteration = None
_step_count = None
check()

Checks parameters and paths

setup()

Sets up data preprocessing machinery

finalize()

Teardown procedures for the default preprocessing class. Required to keep things general because Pyaflowa preprocessing module has some finalize procedures.

_calculate_misfit(**kwargs)

Wrapper for plugins.preprocess.misfit misfit/objective function

_generate_adjsrc(**kwargs)

Wrapper for plugins.preprocess.adjoint source/backproject function

quantify_misfit(source_name=None, save_residuals=None, export_residuals=None, save_adjsrcs=None, components=None, iteration=1, step_count=0, _serial=False, **kwargs)

Prepares solver for gradient evaluation by writing residuals and adjoint traces. Meant to be called by solver.eval_func().

Reads in observed and synthetic waveforms, applies optional preprocessing, assesses misfit, and writes out adjoint sources and STATIONS_ADJOINT file. Processing for each station is done in parallel using concurrent.futures.

Warning

The concurrent processing in this function may fail in the case that a User is running N>1 events using the ‘Cluster’ system but on a local workstation, because each event is also run with multiprocessing, so their compute may quickly run out of RAM or cores. Might need to introduce max_workers_preproc and max_workers_solver to ensure that there is a good balance between the two values.

Parameters:
  • source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()

  • save_residuals (str) – if not None, path to write misfit/residuls to

  • export_residuals (str) – export all residuals (data-synthetic misfit) that are generated by the external solver to path_output. If False, residuals stored in scratch may be discarded at any time in the workflow

  • save_adjsrcs (str) – if not None, path to write adjoint sources to

  • components (list) – optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., [‘Z’, ‘N’]. If None, all available components will be considered.

  • iteration (int) – optional, current iteration of the workflow used for tagging output waveform figures if internal parameter plot_waveforms is set

  • step_count (int) – optional, current step count of the line search, used for tagging output waveform figures if internal parameter plot_waveforms is set

  • _serial (bool) – debug function to turn preprocessing to a serial task whereas it is normally a multiprocessed parallel task

_setup_quantify_misfit(source_name, save_adjsrcs=None, components=None)

Gather a list of filenames of matching waveform IDs that can be run through the misfit quantification step, and generate empty adjoint sources so that Solver knows which components are zero’d out.

Parameters:
  • source_name (str) – the name of the source to process

  • components (list) – optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., [‘Z’, ‘N’]. If None, all available components will be considered.

Return type:

list of tuples

Returns:

[(observed filename, synthetic filename)]. tuples will contain filenames for matching stations + component for obs and syn

_quantify_misfit_single(obs_fid, syn_fid, source_name, save_residuals=None, save_adjsrcs=None)

Run misfit quantification for one pair of data-synthetic waveforms. This is kept in a separate function so that it can be parallelized for more efficient processing.

Parameters:
  • obs_fid (str) – filename for the observed waveform to be processed

  • syn_fid (str) – filename for the synthetic waveform to be procsesed

  • source_name (str) – name of the source used for getting origintime of synthetic waveforms and tagging output waveform figures if internal parameter plot is set True

  • save_residuals (str) – if not None, path to write misfit/residuls to

  • save_adjsrcs (str) – if not None, path to write adjoint sources to

Return type:

float

Returns:

residual value, calculated by the chosen misfit function comparing obs and syn

preprocess(obs, syn)

Convenience function that wraps all preprocessing steps so that they can be more easily overwritten if the User wants preprocessing that does not follow this convention

Parameters:
  • obs (obspy.core.stream.Stream) – Stream containing observed waveforms

  • syn (obspy.core.stream.Stream) – Stream containing synthetic waveforms

seisflows.preprocess.default.read(fid, data_format, **kwargs)

Waveform reading functionality. Imports waveforms as Obspy streams

Parameters:
  • fid (str) – path to file to read data from

  • data_format (str) – format of the file to read data from

Return type:

obspy.core.stream.Stream

Returns:

ObsPy stream containing data stored in fid

Raises:

TypeError – if the provided data format cannot be read

seisflows.preprocess.default.write(st, fid, data_format)

Waveform writing functionality. Writes waveforms back to format that SPECFEM recognizes

Parameters:
  • st (obspy.core.stream.Stream) – stream to write

  • fid (str) – path to file to write stream to

  • data_format (str) – format of the file to write to

seisflows.preprocess.default.read_ascii(fid, origintime=None, **kwargs)

Converts SPECFEM synthetics into ObsPy Stream objects with the correct header information.

Note

This is a trimmed down version of pysep.utils.io.read_sem() which is copied here for better visibility within SeisFlows, and ignores things like SAC headers appending

Parameters:
  • fid (str) – path of the given ascii file

  • origintime (obspy.UTCDateTime) – UTCDatetime object for the origintime of the event

Rtype st:

obspy.Stream.stream

Return st:

stream containing header and data info taken from ascii file

seisflows.preprocess.default.initialize_adjoint_traces(data_filenames, fmt, path_out='./')

SPECFEM requires that adjoint traces be present for every matching synthetic seismogram. If an adjoint source does not exist, it is simply set as zeros. This function creates all adjoint traces as zeros, to be filled out later. Does this in parallel for speedup

Parameters:
  • data_filenames (list of str) – existing solver waveforms (synthetic) to read. These will be copied, zerod out, and saved to path save. Should come from solver.data_filenames

  • fmt (str) – format of the input waveforms which will be fed to the read function to get a working Stream object used to write empty adjsrcs

  • path_out (str) – path to save the new adjoint traces to. Ideally this is set to ‘solver/traces/adj’

seisflows.preprocess.default._write_adjsrc_single(st, fid, output, fmt)

Parallelizable function to write out empty adjoint source