Optimizing 3D Finite Difference Solvers for the Elastic Wave Equation for Modern GPUs

Haugdahl, Tor Andre

Haugdahl, Tor Andre

Master thesis

Permanent lenke

https://hdl.handle.net/11250/3039600

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6569]

Beskrivelse

Full text not available

Sammendrag

Partielle Differensialligninger (PDEer) omfatter særdeles viktige matematiske verktøy for å modellere reelle fenomener slik som jordskjelv og bølgeforplantning. Finte Difference Metoden (FDM) er en numerisk metode som anvendes til å tilnærme løsninger på PDEer og denne brukes ofte i vitenskapelige beregninger. Modellering av seismiske bølger ved hjelp av numeriske tilnærminger av den elastiske bølgeligningen står sentralt hos geofysikere i deres streben etter å forstå egenskapene av hva som befinner seg under bakken. FDM kan anvendes i en prosess som kalles Full Waveform Inversion (FWI), der man forsøker å tilpasse en modell til å gjenspeile reelle data. For å verifisere modellen kan man anvende simuleringsmetoder som FDM.

FDM utviser gode egenskaper for parallelisering på grafikkort (GPU) og det er derfor svært interessant å finne ut av hvordan man kan utnytte grafikkort på beste vis.

Denne oppgaven beskriver vårt arbeid i å analysere og optimalisere en naiv FDM løser skrevet for NVIDIA GPUer basert på en eksisterende applikasjon skrevet av ingeniører hos Aker BP. Vår løser begrenser seg til å kun løse den elastiske

bølgeligningen i tre-dimensjonale, homogene og isotropiske domener. Vi evaluerer effektiviteten til programmet med utgangspunkt i minne, datarepresentering og instruksjonsmiks og hvordan disse påvirker presisjonen til simuleringen.

Våre metoder og implementasjoner inkluderer å erstatte alle beregninger basert på doble flyttall med enkle flyttall samt analysere påvirkningen det har på GPU-ytelsen. Våre eksperimenter viser at enkle flyttall gir god nok presisjon for å simulere datasettet vårt. Vi viser også at enkel presisjon gir drastisk lavere kjøretid enn for dobbel

presisjon. I tillegg identifiserer vi flere optimaliseringer som kan anvende minnehierarkiet effektivt og som burde kunne føre til en økende minne- og aritmetisk utnyttelse. Dette kan forbedre kjøretiden ytterligere. Vi inkluderer også flere mulige retninger for videre forskning.

Partial Differential Equations (PDEs) are an essential mathematical tool for modeling real-world phenomena such as earthquakes and wave propagation. The Finite Difference Method (FDM) is a numerical method used to approximate solutions to PDEs widely employed in scientific computing. Forward modeling of seismic wave propagation through numerically approximating the elastic wave equation is central to geoscience as part of the efforts of understanding the structure and properties of the subsurface. FDM can be used as part of a process called Full Waveform Inversion (FWI), a model fitting approach that attempts to perturbate some model of the subsurface to correlate with data gathered in real-world seismic surveys. Geophysics research presents ever increasing demands on modeling at increasing resolutions and model sizes, which increases the memory demands and computational complexity of, e.g., FDM. The increasing demands motivates the search for efficient algorithms and implementations that accelerates forward modeling for emerging hardware architectures.

The Finite Difference Method is computationally heavy, so it is interesting to investigate how well it will parallelize on GPUs. This entails exploring the potentials of the underlying architectures and specialized hardware units present on them.

This thesis describes our work on the analysis and optimizations of a naïve FDM solver for NVIDIA GPUs based on an existing implementation by engineers at the Aker BP corporation. We constrain the FDM solver to simulate only the Elastic Wave Equation in three-dimensional, isotropic, homogeneous media. We evaluate the efficiency of the application with regards to memory throughput, data representation, and instruction mix and how they affect the precision of the simulation.

Our methods and implementation includes replacing all double-precision floating point operations with single-precision floating point operations and analysing its impact on GPU performance. Our experiments show, for the chosen dataset, that single-precision floating point representations are sufficient for simulating with satisfactory precision. We therefore can show that opting for a single-precision representation yields drastically faster runtimes. Furthermore, we identify further optimizations that make efficient use of the memory hierarchy, which should yield an increased memory and arithmetic throughput, further improving the runtime. Several future directions of research are also included.

Utgiver

NTNU