Vis enkel innførsel

dc.contributor.authorMeyer, Jan Christiannb_NO
dc.date.accessioned2014-12-19T13:39:21Z
dc.date.available2014-12-19T13:39:21Z
dc.date.created2012-12-17nb_NO
dc.date.issued2012nb_NO
dc.identifier578154nb_NO
dc.identifier.isbn978-82-471-4015-4 (printed ver.)nb_NO
dc.identifier.isbn978-82-471-4016-1 (electronic ver.)nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/253074
dc.description.abstractAs the complexity of parallel computers grows, constraints posed by the construction of larger systems require both greater, and increasingly non-linear, parameter sets to model their behavior realistically. These heterogeneous characteristics create a trade-off between the complexity and accuracy of performance models, creating challenges in utilizing them for design decisions. In this thesis, we take a bottom-up approach to realistically model software and hardware interactions, by composing system models from simpler, linear models, which allow parts of the analysis to be automated. We associate empirically benchmarked platform performance metrics with the core elements in a variant of bulk-synchronous execution, aiming to quantify application performance, and associated potential for computation and communication overlap on SMP clusters. The original bulk-synchronous performance model is introduced, and we identify areas of computation and communication where its abstractions impede realistic models of contemporary hardware. These are addressed independently, using experimental evidence to develop a representation collecting computation kernel characteristics and pairwise communications in matrices, to combine into a system model. As bulk-synchronous execution strongly depends on periodic, global synchronization, we develop a cost model for it by combining latency measurements with a parametric representation of signalling patterns, and experimentally verify the resulting predictions for three common algorithms. We describe a design to implement the BSPLib programming interface, combining threads and message-passing parallelism to achieve overlap on commodity cluster platforms, implementing its one-sided communication primitives using out-of-band control messages. We augment and validate the cost model of one adapted synchronization algorithm with the corresponding bandwidth requirement, completing a framework for modeling BSPLib program performance. Finally, we test the utility of this framework as a proof-of-concept for guiding software performance adaptations, using two cases. First, we use the latency terms to automatically generate synchronization operations, using model predictions to generate customized patterns with respect to platform topology, showing that the resulting algorithms equal or outperform the system defaults. Second, the strong scaling characteristics of a 5-point stencil code is compared for three implementations. Experiments show the performance overhead of our implementation, but also its capability for predicting program cost, including parameter values to optimize for balanced overlapping of computation and communication.nb_NO
dc.languageengnb_NO
dc.publisherNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO
dc.relation.ispartofseriesDoktoravhandlinger ved NTNU, 1503-8181; 2012:344nb_NO
dc.titlePerformance Modeling of Heterogeneous Systemsnb_NO
dc.typeDoctoral thesisnb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO
dc.description.degreePhD i informasjonsteknologinb_NO
dc.description.degreePhD in Information Technologyen_GB


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel