Vis enkel innførsel

dc.contributor.advisorKjeldsberg, Per Gunnarnb_NO
dc.contributor.authorStenersen, Espennb_NO
dc.date.accessioned2014-12-19T13:43:30Z
dc.date.accessioned2015-12-22T11:41:05Z
dc.date.available2014-12-19T13:43:30Z
dc.date.available2015-12-22T11:41:05Z
dc.date.created2010-09-03nb_NO
dc.date.issued2008nb_NO
dc.identifier347616nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/2369121
dc.description.abstract3D graphic accelerators are often limited by their floating-point performance. A Graphic Processing Unit (GPU) has several specialized floating-point units to achieve high throughput and performance. The floating-point units consume a large part of total area, and power consumption, and hence architectural choices are important to evaluate when implementing the design. GPUs are specially tuned for performing a set of operations on large sets of data. The task of a 3D graphic solution is to render a image or a scene. The scene contains geometric primitives as well as descriptions of the light, the way each object reflects light and the viewer position and orientation. This thesis evaluates four different pipelined, vectorized floating-point multipliers, supporting 16-bit, 32-bit and 64-bit floating-point numbers. The architectures are compared concerning area usage, power consumption and performance. Two of the architectures are implemented at Register Transfer Level (RTL), tested and synthesized, to see if assumptions made in the estimation methodologies are accurate enough to select the best architecture to implement given a set of architectures and constraints. The first architecture trades area for lower power consumption with a throughput of 38.4 Gbit/s at 300 MHz clock frequency, and the second architecture trades power for smaller area with equal throughput. The two architectures are synthesized at 200 MHz, 300 MHz and 400 MHz clock frequency, in a 65 nm low-power standard cell library and a 90 nm general purpose library, and for different input data format distributions, to compare area and power results at different clock frequencies, input data distributions and target technology. Architecture one has lower power consumption than architecture two at all clock frequencies and input data format distributions. At 300 MHz, architecture one has a total power consumption of 1.9210 mW at 65 nm, and 15.4090 mW at 90 nm. Architecture two has a total power consumption of 7.3569 mW at 65 nm, and 17.4640 mW at 90 nm. Architecture two requires less area than architecture one at all clock frequencies. At 300 MHz, architecture one has a total area of 59816.4414 um^2 at 65 nm, and 116362.0625 um^2 at 90 nm. Architecture two has a total area of 50843.0 um^2 at 65 nm, and 95242.0469 um^2 at 90 nm.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for elektronikk og telekommunikasjonnb_NO
dc.subjectntnudaimno_NO
dc.titleVectorized 128-bit Input FP16/FP32/FP64 Floating-Point Multipliernb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber184nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel