Senayan

  • Home
  • Information
  • News
  • Help
  • Librarian
  • Member Area
  • Select Language :
    Arabic Bengali Brazilian Portuguese English Espanol German Indonesian Japanese Malay Persian Russian Thai Turkish Urdu

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

{{tmpObj[k].text}}
Image of MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Text

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Tariq Alkhalifah - Personal Name; Hanchen Wang - Personal Name; Oleg Ovcharenko - Personal Name;

Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic data lack many of the realistic features embedded in real data, including an accurate waveform source signature, realistic noise, and accurate reflectivity. In other words, the real data set is far from being a sample from the distribution of the synthetic training set. Thus, we describe a novel approach to enhance our supervised neural network (NN) training on synthetic data with real data features (domain adaptation). Specifically, for tasks in which the absolute values of the vertical axis (time or depth) of the input section are not crucial to the prediction, like classification, or can be corrected after the prediction, like velocity model building using a well, we suggest a series of linear operations on the input to the network data so that the training and application data have similar distributions. This is accomplished by applying two operations on the input data to the NN, whether the input is from the synthetic or real data subset domain: (1) The crosscorrelation of the input data section (i.e., shot gather, seismic image, etc.) with a fixed-location reference trace from the input data section. (2) The convolution of the resulting data with the mean (or a random sample) of the autocorrelated sections from the other subset domain. In the training stage, the input data are from the synthetic subset domain and the auto-corrected (we crosscorrelate each trace with itself) sections are from the real subset domain, and the random selection of sections from the real data is implemented at every epoch of the training. In the inference/application stage, the input data are from the real subset domain and the mean of the autocorrelated sections are from the synthetic data subset domain. Example applications on passive seismic data for microseismic event source location determination and on active seismic data for predicting low frequencies are used to demonstrate the power of this approach in improving the applicability of our trained NNs to real data.


Availability
286551Perpustakaan BIG (Eksternal Harddisk)Available
Detail Information
Series Title
Artificial Intelligence in Geosciences
Call Number
551
Publisher
Beijing : KeAi Communications Co. Ltd.., 2022
Collation
14 hlm PDF, 3.698 KB
Language
Inggris
ISBN/ISSN
2666-5441
Classification
551
Content Type
text
Media Type
-
Carrier Type
-
Edition
Vol.3, December 2022
Subject(s)
Neural networks
Induced seismicity
Image processing
Computational seismology
Waveform inversion
Specific Detail Info
-
Statement of Responsibility
-
Other version/related

No other version available

File Attachment
  • MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
    Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic data lack many of the realistic features embedded in real data, including an accurate waveform source signature, realistic noise, and accurate reflectivity. In other words, the real data set is far from being a sample from the distribution of the synthetic training set. Thus, we describe a novel approach to enhance our supervised neural network (NN) training on synthetic data with real data features (domain adaptation). Specifically, for tasks in which the absolute values of the vertical axis (time or depth) of the input section are not crucial to the prediction, like classification, or can be corrected after the prediction, like velocity model building using a well, we suggest a series of linear operations on the input to the network data so that the training and application data have similar distributions. This is accomplished by applying two operations on the input data to the NN, whether the input is from the synthetic or real data subset domain: (1) The crosscorrelation of the input data section (i.e., shot gather, seismic image, etc.) with a fixed-location reference trace from the input data section. (2) The convolution of the resulting data with the mean (or a random sample) of the autocorrelated sections from the other subset domain. In the training stage, the input data are from the synthetic subset domain and the auto-corrected (we crosscorrelate each trace with itself) sections are from the real subset domain, and the random selection of sections from the real data is implemented at every epoch of the training. In the inference/application stage, the input data are from the real subset domain and the mean of the autocorrelated sections are from the synthetic data subset domain. Example applications on passive seismic data for microseismic event source location determination and on active seismic data for predicting low frequencies are used to demonstrate the power of this approach in improving the applicability of our trained NNs to real data.
    Other Resource Link
Comments

You must be logged in to post a comment

Senayan
  • Information
  • Services
  • Librarian
  • Member Area

About Us

As a complete Library Management System, SLiMS (Senayan Library Management System) has many features that will help libraries and librarians to do their job easily and quickly. Follow this link to show some features provided by SLiMS.

Search

start it by typing one or more keywords for title, author or subject

Keep SLiMS Alive Want to Contribute?

© 2026 — Senayan Developer Community

Powered by SLiMS
Select the topic you are interested in
  • Computer Science, Information & General Works
  • Philosophy & Psychology
  • Religion
  • Social Sciences
  • Language
  • Pure Science
  • Applied Sciences
  • Art & Recreation
  • Literature
  • History & Geography
Icons made by Freepik from www.flaticon.com
Advanced Search