seisflows.preprocess.default ============================ .. py:module:: seisflows.preprocess.default .. autoapi-nested-parse:: The SeisFlows Preprocessing module is in charge of interacting with seismic data (observed and synthetic). It should contain functionality to read and write seismic data, apply preprocessing such as filtering, quantify misfit, and write adjoint sources that are expected by the solver. Classes ------- .. autoapisummary:: seisflows.preprocess.default.Default Functions --------- .. autoapisummary:: seisflows.preprocess.default.read seisflows.preprocess.default.write seisflows.preprocess.default.read_ascii seisflows.preprocess.default.initialize_adjoint_traces seisflows.preprocess.default._write_adjsrc_single Module Contents --------------- .. py:class:: Default(syn_data_format='ascii', obs_data_format='ascii', unit_output='VEL', misfit='waveform', adjoint='waveform', normalize=None, filter=None, min_period=None, max_period=None, min_freq=None, max_freq=None, mute=None, early_slope=None, early_const=None, late_slope=None, late_const=None, short_dist=None, long_dist=None, plot_waveforms=False, source_prefix=None, workdir=os.getcwd(), path_preprocess=None, path_solver=None, **kwargs) Default Preprocess [Preprocess Base] ------------------------------------ Data processing for seismic traces, with options for data misfit, filtering, normalization and muting. Parameters ---------- :type obs_data_format: str :param obs_data_format: data format for reading observed traces into memory. Available formats: 'su', 'ascii', 'sac' :type unit_output: str :param unit_output: Data units. Must match the synthetic output of external solver. Available: ['DISP': displacement, 'VEL': velocity, 'ACC': acceleration, 'PRE': pressure] :type misfit: str :param misfit: misfit function for waveform comparisons. For available see seisflows.plugins.preprocess.misfit :type adjoint: str :param adjoint: adjoint source misfit function (backprojection function for migration, or the objective function in FWI). For available see seisflows.plugins.preprocess.adjoint :type normalize: str :param normalize: Data normalization parameters used to normalize the amplitudes of waveforms. By default, set to NoneType, which means no normalization is applied. User can choose from one of the following options to normalize BOTH `obs` and `syn` data: - TNORML1: normalize per trace by the L1 norm of itself - TNORML2: normalize per trace by the L2 norm of itself - TNORM_MAX: normalize by the maximum positive amplitude in the trace - TNORM_ABSMAX: normalize by the absolute maximum amplitude in the trace - TNORM_MEAN: normalize by the mean of the absolute trace - RNORM_OBS_MAX: normalize synthetic traces by the maximum amplitude of the corresponding observed trace - RNORM_OBS_ABSMAX: normalize synthetic traces by the absolute maximum amplitude of the corresponding observed trace - RNORM_SYN_MAX: normalize observed traces by the maximum amplitude of the corresponding synthetic trace - RNORM_SYN_ABSMAX: normalize observed traces by the absolute maximum amplitude of the corresponding synthetic trace Note: options ENORML? are not currently available. If this is a feature you would like to see, please open a GitHub Issue. - ENORML1: normalize per event by L1 of traces; OR - ENORML2: normalize per event by L2 of traces; :type filter: str :param filter: Data filtering type, by default no filtering is applied. Available options for user to choose are: - BANDPASS (requires: MIN_FREQ + MAX_FREQ OR MIN_PERIOD + MAX PERIOD); - LOWPASS (requires: MAX_FREQ OR MIN_PERIOD); - HIGHPASS (requires: MIN_FREQ OR MAX_PERIOD); :type min_period: float :param min_period: Minimum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters. :type max_period: float :param max_period: Maximum filter period applied to time series. See also MIN_FREQ, MAX_FREQ, if User defines FREQ parameters, they will overwrite PERIOD parameters. :type min_freq: float :param min_freq: Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters. :type max_freq: float :param max_freq: Maximum filter frequency applied to time series, See also MIN_PERIOD, MAX_PERIOD, if User defines FREQ parameters, they will overwrite PERIOD parameters. :type mute: list :param mute: Data mute parameters used to zero out early / late arrivals or offsets. Choose the following: - EARLY: mute early arrivals - LATE: mute late arrivals; - SHORT: mute short source-receiver distances; - LONG: mute long source-receiver distances input comma separated list of strings, (e.g., mute: early,late,short) :type plot_waveforms: bool :param plot_waveforms: plot waveforms from each evaluation of the misfit. By default turned off as this can produce many files for an interative inversion. Paths ----- :type path_preprocess: str :param path_preprocess: scratch path for all preprocessing processes, including saving files *** .. py:attribute:: syn_data_format :value: '' .. py:attribute:: obs_data_format :value: '' .. py:attribute:: unit_output :value: '' .. py:attribute:: misfit :value: 'waveform' .. py:attribute:: adjoint :value: 'waveform' .. py:attribute:: normalize :value: None .. py:attribute:: filter :value: None .. py:attribute:: min_period :value: None .. py:attribute:: max_period :value: None .. py:attribute:: min_freq :value: None .. py:attribute:: max_freq :value: None .. py:attribute:: mute :value: [] .. py:attribute:: early_slope :value: None .. py:attribute:: early_const :value: None .. py:attribute:: late_slope :value: None .. py:attribute:: late_const :value: None .. py:attribute:: short_dist :value: None .. py:attribute:: long_dist :value: None .. py:attribute:: source_prefix :value: None .. py:attribute:: plot_waveforms :value: False .. py:attribute:: path .. py:attribute:: _syn_acceptable_data_formats :value: ['SU', 'ASCII'] .. py:attribute:: _obs_acceptable_data_formats :value: ['SU', 'ASCII', 'SAC'] .. py:attribute:: _acceptable_unit_output :value: ['DISP', 'VEL', 'ACC', 'PRE'] .. py:attribute:: _acceptable_misfits .. py:attribute:: _acceptable_adjsrcs .. py:attribute:: _acceptable_norms .. py:attribute:: _acceptable_mutes .. py:attribute:: _acceptable_filters .. py:attribute:: _iteration :value: None .. py:attribute:: _step_count :value: None .. py:method:: check() Checks parameters and paths .. py:method:: setup() Sets up data preprocessing machinery .. py:method:: finalize() Teardown procedures for the default preprocessing class. Required to keep things general because Pyaflowa preprocessing module has some finalize procedures. .. py:method:: _calculate_misfit(**kwargs) Wrapper for plugins.preprocess.misfit misfit/objective function .. py:method:: _generate_adjsrc(**kwargs) Wrapper for plugins.preprocess.adjoint source/backproject function .. py:method:: quantify_misfit(source_name=None, save_residuals=None, export_residuals=None, save_adjsrcs=None, components=None, iteration=1, step_count=0, _serial=False, **kwargs) Prepares solver for gradient evaluation by writing residuals and adjoint traces. Meant to be called by solver.eval_func(). Reads in observed and synthetic waveforms, applies optional preprocessing, assesses misfit, and writes out adjoint sources and STATIONS_ADJOINT file. Processing for each station is done in parallel using concurrent.futures. .. warning:: The concurrent processing in this function may fail in the case that a User is running N>1 events using the 'Cluster' system but on a local workstation, because each event is also run with multiprocessing, so their compute may quickly run out of RAM or cores. Might need to introduce `max_workers_preproc` and `max_workers_solver` to ensure that there is a good balance between the two values. :type source_name: str :param source_name: name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run() :type save_residuals: str :param save_residuals: if not None, path to write misfit/residuls to :type export_residuals: str :param export_residuals: export all residuals (data-synthetic misfit) that are generated by the external solver to `path_output`. If False, residuals stored in scratch may be discarded at any time in the workflow :type save_adjsrcs: str :param save_adjsrcs: if not None, path to write adjoint sources to :type components: list :param components: optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., ['Z', 'N']. If None, all available components will be considered. :type iteration: int :param iteration: optional, current iteration of the workflow used for tagging output waveform figures if internal parameter `plot_waveforms` is set :type step_count: int :param step_count: optional, current step count of the line search, used for tagging output waveform figures if internal parameter `plot_waveforms` is set :type _serial: bool :param _serial: debug function to turn preprocessing to a serial task whereas it is normally a multiprocessed parallel task .. py:method:: _setup_quantify_misfit(source_name, save_adjsrcs=None, components=None) Gather a list of filenames of matching waveform IDs that can be run through the misfit quantification step, and generate empty adjoint sources so that Solver knows which components are zero'd out. :type source_name: str :param source_name: the name of the source to process :type components: list :param components: optional list of components to ignore preprocessing traces that do not have matching components. The adjoint sources for these components will be 0. E.g., ['Z', 'N']. If None, all available components will be considered. :rtype: list of tuples :return: [(observed filename, synthetic filename)]. tuples will contain filenames for matching stations + component for obs and syn .. py:method:: _quantify_misfit_single(obs_fid, syn_fid, source_name, save_residuals=None, save_adjsrcs=None) Run misfit quantification for one pair of data-synthetic waveforms. This is kept in a separate function so that it can be parallelized for more efficient processing. :type obs_fid: str :param obs_fid: filename for the observed waveform to be processed :type syn_fid: str :param syn_fid: filename for the synthetic waveform to be procsesed :type source_name: str :param source_name: name of the source used for getting origintime of synthetic waveforms and tagging output waveform figures if internal parameter `plot` is set True :type save_residuals: str :param save_residuals: if not None, path to write misfit/residuls to :type save_adjsrcs: str :param save_adjsrcs: if not None, path to write adjoint sources to :rtype: float :return: residual value, calculated by the chosen misfit function comparing `obs` and `syn` .. py:method:: preprocess(obs, syn) Convenience function that wraps all preprocessing steps so that they can be more easily overwritten if the User wants preprocessing that does not follow this convention :type obs: obspy.core.stream.Stream :param obs: Stream containing observed waveforms :type syn: obspy.core.stream.Stream :param syn: Stream containing synthetic waveforms .. py:function:: read(fid, data_format, **kwargs) Waveform reading functionality. Imports waveforms as Obspy streams :type fid: str :param fid: path to file to read data from :type data_format: str :param data_format: format of the file to read data from :rtype: obspy.core.stream.Stream :return: ObsPy stream containing data stored in `fid` :raises TypeError: if the provided data format cannot be read .. py:function:: write(st, fid, data_format) Waveform writing functionality. Writes waveforms back to format that SPECFEM recognizes :type st: obspy.core.stream.Stream :param st: stream to write :type fid: str :param fid: path to file to write stream to :type data_format: str :param data_format: format of the file to write to .. py:function:: read_ascii(fid, origintime=None, **kwargs) Converts SPECFEM synthetics into ObsPy Stream objects with the correct header information. .. note:: This is a trimmed down version of pysep.utils.io.read_sem() which is copied here for better visibility within SeisFlows, and ignores things like SAC headers appending :type fid: str :param fid: path of the given ascii file :type origintime: obspy.UTCDateTime :param origintime: UTCDatetime object for the origintime of the event :rtype st: obspy.Stream.stream :return st: stream containing header and data info taken from ascii file .. py:function:: initialize_adjoint_traces(data_filenames, fmt, path_out='./') SPECFEM requires that adjoint traces be present for every matching synthetic seismogram. If an adjoint source does not exist, it is simply set as zeros. This function creates all adjoint traces as zeros, to be filled out later. Does this in parallel for speedup :type data_filenames: list of str :param data_filenames: existing solver waveforms (synthetic) to read. These will be copied, zerod out, and saved to path `save`. Should come from solver.data_filenames :type fmt: str :param fmt: format of the input waveforms which will be fed to the `read` function to get a working Stream object used to write empty adjsrcs :type path_out: str :param path_out: path to save the new adjoint traces to. Ideally this is set to 'solver/traces/adj' .. py:function:: _write_adjsrc_single(st, fid, output, fmt) Parallelizable function to write out empty adjoint source