Hierarchical Memory Size Estimation for Loop Transformation and Data Memory Platform Optimization
MetadataShow full item record
In today’s embedded systems, the memory hierarchy is rapidly becoming a major bottleneck in terms of power, performance and area, due to the very large amount of (memory related) data need to be transferred and stored (temporarily). This is especially the case for portable multi-media applications systems. These applications are characterized by deep loop nests and multi-dimensional arrays at the high level. Due to the dramatically increasing size and complexity of system-on-a-chip (SoC) designs and stringent time-to-market requirement, the methodology and tools for chip design must be raised to the system level. Early analysis tools are particularly critical in enabling SoC designers to take full advantage of the many architectural options available. For memory optimization, the early high level techniques aim either to design an optimal memory platform for a given application or to optimize the application code in order to take advantage of the memory platform features, or even both. Loop transformation is such an important high level optimization technique. It modifies the execution order of loops and statements without changing the application functionality. Existing loop transformation algorithms are all performed based either on reduction of data access lifetime and on improvement in data locality and regularity to steer selection of loop transformations. These are, however, very abstract cost functions which do not represent the exact memory size requirement of the arrays and how the data will be mapped onto the memory platform later on. Existing algorithms all result in one final loop transformation solution. As different loop transformations may result in optimal utilization for different memory platform instances, ad-hoc decisions at this stage without estimating their impact on the actual hierarchy utilization can lead to a final sub-optimal solution. An evaluation of later design stages’ effort is hence required. On the other hand, there usually exist a huge number of loop transformation possibilities, the estimation is required to be performed repeatedly and its computation time of the estimation technique also becomes critical to make it useful during the loop transformation search space exploration. This dissertation proposes a memory footprint estimation methodology. An intra-array memory footprint estimation is performed first followed by an interarray estimation. In order to achieve a fast estimate to make it useful repeatedly during the early high level search space exploration, several techniques have been introduced. A fast intra-array memory footprint estimation is performed at the iteration domain based on the maximal lifetime of data accesses, which is defined by the maximal dependency vector. Two approaches, an ILP formulation and vertexes approach, have been introduced for achieving a fast maximal dependency vector calculation. The fast inter-array estimation has been achieved based on several Hanoi tower based approaches. A hierarchical memory size estimation methodology has also been proposed in this dissertation. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high level design stage, a platform independent estimation is introduced with a Pareto curve output for each loop transformation instance. It can steer the designer or an automatic steering tool to select all the interesting loop transformation instances that might later lead to low power data mapping for any of the many possible memory hierarchy instances. This is useful when the memory platform is not defined yet, or for a given memory hierarchy instance. It also allows to find the most appropriate low power memory hierarchy instance by performing an early power estimation of different memory hierarchy instances. Initially the source code is used as input for estimation, resulting in an initial approach. However, performing the estimation repeatedly from the source code is too slow for the large loop transformation search space exploration. An incremental approach, based on local updating of the previous result, is thus introduced to handle sequences of different loop transformations. Several advanced techniques have also been used on these two approaches in order to perform a fast estimation, such as bounding box geometrical model based data reuse analysis, platform independent memory hierarchy layer assignment estimation, fast intra- and inter-array memory footprint estimation. The feasibility and usefulness of the methodologies are substantiated using several representative real-life application demonstrators. It shows for instance that the fast memory footprint estimation can be two order of magnitude faster than compared techniques while still achieving fairly accurate estimation result. For hierarchical memory size estimation methodology, the initial approach is two order of magnitude faster than the compared technique and the incremental approach is another two order of magnitude faster than the initial approach, which can just take a few milliseconds. The fast computation time of the incremental approach make it feasible to be used repeatedly during the loop transformation exploration over a very large number of possibilities. Furthermore, prototype CAD tools has been developed that includes mast parts of the methodologies.