Reducing Address Translation Overheads with Virtual Caching

Reducing Address Translation Overheads with Virtual Caching PDF Author: Hongil Yoon
Publisher:
ISBN:
Category :
Languages : en
Pages : 126

Get Book Here

Book Description
This dissertation research addresses overheads in supporting virtual memory, especially virtual-to-physical address translation overheads (i.e., performance, power, and energy) via a Translation Lookaside Buffer (TLB). To overcome the overheads, we revisit virtually indexed, virtually tagged caches. In practice, they have not been common in commercial microarchitecture designs, and the crux of the problem is the complications of dealing with virtual address synonyms. This thesis makes novel, empirical observations, based on real world applications, that show temporal properties of synonym accesses. By exploiting these observations, we propose a practical virtual cache design with dynamic synonym remapping (VC-DSR), which effectively reduces the design complications of virtual caches. The proposed approach (1) dynamically decides a unique virtual page number for all the synonymous virtual pages that map to the same physical page and (2) uses this unique page number to place and look up data in the virtual caches, while data from the physical page resides in the virtual caches. Accesses to this unique page number proceed without any intervention. Accesses to other synonymous pages are dynamically detected, and remapped to the corresponding unique virtual page number to correctly access data in the cache. Such remapping operations are rare, due to the temporal properties of synonyms, allowing our proposal to achieve most of the benefits (i.e., performance, power, and energy) of virtual caches, without software involvement. We evaluate the effectiveness of the proposed virtual cache design by integrating it into modern CPUs as well as GPUs in heterogeneous systems. For the proposed L1 virtual cache of CPUs, the experimental results show that our proposal saves about 92% of dynamic energy consumption for TLB lookups and achieves most of the latency benefit (about 99.4%) of ideal (but impractical) virtual caches. For the proposed entire GPU virtual cache hierarchy, we see an average of 77% performance benefits over the conventional GPU MMU.