Senayan

  • Home
  • Information
  • News
  • Help
  • Librarian
  • Member Area
  • Select Language :
    Arabic Bengali Brazilian Portuguese English Espanol German Indonesian Japanese Malay Persian Russian Thai Turkish Urdu

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

{{tmpObj[k].text}}
Image of Classification of geological borehole descriptions using a domain adapted large language model

Text

Classification of geological borehole descriptions using a domain adapted large language model

Hossein Ghorbanfekr - Personal Name; Pieter Jan Kerstens - Personal Name; Katrijn Dirix - Personal Name;

Geological borehole descriptions contain detailed textual information about the composition of the subsurface. However, their unstructured format presents significant challenges for extracting relevant features into a structured format. This paper introduces GEOBERTje: a domain adapted large language model trained on geological borehole descriptions from Flanders (Belgium) in the Dutch language. This model effectively extracts relevant information from the borehole descriptions and represents it into a numeric vector space. Showcasing just one potential application of GEOBERTje, we finetune a classifier model on a limited number of manually labeled observations. This classifier categorizes borehole descriptions into a main, second and third lithology class. We show that our classifier outperforms a rule-based approach (by 30% on average), non-contextual Word2Vec embeddings combined with a random forest classifier (by 38% on average), and a prompt engineering method with large language models (i.e., GPT-4 (by 11% on average) and Gemma 2 (by 28% on average)). This study exemplifies how domain adapted large language models enhance the efficiency and accuracy of extracting information from complex, unstructured geological descriptions. This offers new opportunities for geological analysis and modeling using vast amounts of data.


Availability
229551.136Perpustakaan BIG (Eksternal Harddisk)Available
Detail Information
Series Title
Applied Computing and Geoscience - Open Access
Call Number
551.136
Publisher
Amsterdam : Elsevier., 2025
Collation
15 hlm PDF, 2.295 KB
Language
Inggris
ISBN/ISSN
2590-1974
Classification
551.136
Content Type
text
Media Type
-
Carrier Type
-
Edition
Vol.25, February 2025
Subject(s)
Classification
Natural language processing
Borehole description
Large language model
Specific Detail Info
-
Statement of Responsibility
-
Other version/related

No other version available

File Attachment
  • Classification of geological borehole descriptions using a domain adapted large language model
    Geological borehole descriptions contain detailed textual information about the composition of the subsurface. However, their unstructured format presents significant challenges for extracting relevant features into a structured format. This paper introduces GEOBERTje: a domain adapted large language model trained on geological borehole descriptions from Flanders (Belgium) in the Dutch language. This model effectively extracts relevant information from the borehole descriptions and represents it into a numeric vector space. Showcasing just one potential application of GEOBERTje, we finetune a classifier model on a limited number of manually labeled observations. This classifier categorizes borehole descriptions into a main, second and third lithology class. We show that our classifier outperforms a rule-based approach (by 30% on average), non-contextual Word2Vec embeddings combined with a random forest classifier (by 38% on average), and a prompt engineering method with large language models (i.e., GPT-4 (by 11% on average) and Gemma 2 (by 28% on average)). This study exemplifies how domain adapted large language models enhance the efficiency and accuracy of extracting information from complex, unstructured geological descriptions. This offers new opportunities for geological analysis and modeling using vast amounts of data.
    Other Resource Link
Comments

You must be logged in to post a comment

Senayan
  • Information
  • Services
  • Librarian
  • Member Area

About Us

As a complete Library Management System, SLiMS (Senayan Library Management System) has many features that will help libraries and librarians to do their job easily and quickly. Follow this link to show some features provided by SLiMS.

Search

start it by typing one or more keywords for title, author or subject

Keep SLiMS Alive Want to Contribute?

© 2026 — Senayan Developer Community

Powered by SLiMS
Select the topic you are interested in
  • Computer Science, Information & General Works
  • Philosophy & Psychology
  • Religion
  • Social Sciences
  • Language
  • Pure Science
  • Applied Sciences
  • Art & Recreation
  • Literature
  • History & Geography
Icons made by Freepik from www.flaticon.com
Advanced Search