Vis enkel innførsel

dc.contributor.advisorElster, Anne Cathrinenb_NO
dc.contributor.advisorJarp, Sverrenb_NO
dc.contributor.authorLindal, Yngve Sneennb_NO
dc.date.accessioned2014-12-19T13:38:02Z
dc.date.available2014-12-19T13:38:02Z
dc.date.created2011-11-04nb_NO
dc.date.issued2011nb_NO
dc.identifier454090nb_NO
dc.identifierntnudaim:5795nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/252713
dc.description.abstractA desired trend within high energy physics is to increase particle accelerator luminosities,leading to production of more collision data and higher probabilities of finding interestingphysics results. A central data analysis technique used to determine whether results areinteresting or not is the maximum likelihood method, and the corresponding evaluation ofthe negative log-likelihood, which can be computationally expensive. As the amount of datagrows, it is important to take benefit from the parallelism in modern computers. This, inessence, means to exploit vector registers and all available cores on CPUs, as well as utilizingco-processors as GPUs.This thesis describes the work done to optimize and parallelize a prototype of a centraldata analysis tool within the high energy physics community. The work consists of optimiza-tions for multicore processors, GPUs, as well as a mechanism to balance the load betweenboth CPUs and GPUs with the aim to fully exploit the power of modern commodity comput-ers. We explore the OpenCL standard thoroughly and we give an overview of its limitationswhen used in a large real-world software package. We reach a single-core speedup of ∼ 7.8xcompared to the original implementation of the toolkit for the physical model we use through-out this thesis. On top of that follows an increase of ∼ 3.6x with 4 threads on a commodityIntel processor, as well as almost perfect scalability on NUMA systems when thread affinityis applied. GPUs give varying speedups depending on the complexity of the physics modelused. With our model, price-comparable GPUs give a speedup of ∼ 2.5x compared to amodern Intel CPU utilizing 8 SMT threads.The balancing mechanism is based on real timings of each device and works optimally forlarge workloads when the API calls to the OpenCL implementation impose a small overheadand when computation timings are accurate.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaim:5795no_NO
dc.subjectMTDT datateknikkno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleOptimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architecturesnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber152nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel