Bandwidth-Aware Prefetching in Chip Multiprocessors
MetadataShow full item record
Chip Multiprocessors (CMP) are an increasingly popular architecture and increasing numbers of vendors are now offering CMP solutions. The shift to CMP architectures from uniprocessors is driven by the increasing complexity of cores, the processor-memory performance gap, limitations in ILP and increasing power requirements. Prefetching is a successful technique commonly used in high performance processors to hide latency. In a CMP, prefetching offers new opportunities and challenges, as current uniprocessor heuristics will need adaption or redesign to integrate with CMPs. In this thesis, I look at the state of the art in prefetching and CMP architecture. I conduct experiments on how unmodified uniprocessor prefetching heuristics perform in a CMP. In addition, I have proposed a new prefetching scheme based on bandwidth monitoring and prediction through performance counters, suited for embedded CMP systems. This new prefetching scheme has been simulated with SimpleScalar. It offers lower bandwidth usage (up to 47.8 %), while retaining most of the performance gains from prefetching for low accuracy prefetching heuristics.