Reducing Address Translation Overheads with Virtual Caching

Reducing Address Translation Overheads with Virtual Caching PDF Author: Hongil Yoon
Publisher:
ISBN:
Category :
Languages : en
Pages : 126

Get Book Here

Book Description
This dissertation research addresses overheads in supporting virtual memory, especially virtual-to-physical address translation overheads (i.e., performance, power, and energy) via a Translation Lookaside Buffer (TLB). To overcome the overheads, we revisit virtually indexed, virtually tagged caches. In practice, they have not been common in commercial microarchitecture designs, and the crux of the problem is the complications of dealing with virtual address synonyms. This thesis makes novel, empirical observations, based on real world applications, that show temporal properties of synonym accesses. By exploiting these observations, we propose a practical virtual cache design with dynamic synonym remapping (VC-DSR), which effectively reduces the design complications of virtual caches. The proposed approach (1) dynamically decides a unique virtual page number for all the synonymous virtual pages that map to the same physical page and (2) uses this unique page number to place and look up data in the virtual caches, while data from the physical page resides in the virtual caches. Accesses to this unique page number proceed without any intervention. Accesses to other synonymous pages are dynamically detected, and remapped to the corresponding unique virtual page number to correctly access data in the cache. Such remapping operations are rare, due to the temporal properties of synonyms, allowing our proposal to achieve most of the benefits (i.e., performance, power, and energy) of virtual caches, without software involvement. We evaluate the effectiveness of the proposed virtual cache design by integrating it into modern CPUs as well as GPUs in heterogeneous systems. For the proposed L1 virtual cache of CPUs, the experimental results show that our proposal saves about 92% of dynamic energy consumption for TLB lookups and achieves most of the latency benefit (about 99.4%) of ideal (but impractical) virtual caches. For the proposed entire GPU virtual cache hierarchy, we see an average of 77% performance benefits over the conventional GPU MMU.

Reducing Address Translation Overheads with Virtual Caching

Reducing Address Translation Overheads with Virtual Caching PDF Author: Hongil Yoon
Publisher:
ISBN:
Category :
Languages : en
Pages : 126

Get Book Here

Book Description
This dissertation research addresses overheads in supporting virtual memory, especially virtual-to-physical address translation overheads (i.e., performance, power, and energy) via a Translation Lookaside Buffer (TLB). To overcome the overheads, we revisit virtually indexed, virtually tagged caches. In practice, they have not been common in commercial microarchitecture designs, and the crux of the problem is the complications of dealing with virtual address synonyms. This thesis makes novel, empirical observations, based on real world applications, that show temporal properties of synonym accesses. By exploiting these observations, we propose a practical virtual cache design with dynamic synonym remapping (VC-DSR), which effectively reduces the design complications of virtual caches. The proposed approach (1) dynamically decides a unique virtual page number for all the synonymous virtual pages that map to the same physical page and (2) uses this unique page number to place and look up data in the virtual caches, while data from the physical page resides in the virtual caches. Accesses to this unique page number proceed without any intervention. Accesses to other synonymous pages are dynamically detected, and remapped to the corresponding unique virtual page number to correctly access data in the cache. Such remapping operations are rare, due to the temporal properties of synonyms, allowing our proposal to achieve most of the benefits (i.e., performance, power, and energy) of virtual caches, without software involvement. We evaluate the effectiveness of the proposed virtual cache design by integrating it into modern CPUs as well as GPUs in heterogeneous systems. For the proposed L1 virtual cache of CPUs, the experimental results show that our proposal saves about 92% of dynamic energy consumption for TLB lookups and achieves most of the latency benefit (about 99.4%) of ideal (but impractical) virtual caches. For the proposed entire GPU virtual cache hierarchy, we see an average of 77% performance benefits over the conventional GPU MMU.

Toward Efficient and Protected Address Translation in Memory Management

Toward Efficient and Protected Address Translation in Memory Management PDF Author: Xiaowan Dong
Publisher:
ISBN:
Category :
Languages : en
Pages : 205

Get Book Here

Book Description
"Virtual memory is widely employed in most computer systems to make programming easy and provide isolation among different applications. Since virtual-tophysical address translation is on the critical path of every memory access, a lot of software and hardware techniques to reduce address translation overhead have been designed and implemented. However, memory management support on state-of-the-art architectures and operating systems encounters both performance and security challenges for today's applications and execution environments. From a performance perspective, both data and instruction working sets of modern applications are growing tremendously, resulting in the potential for high address translation overheads. While data address translation has received significant attention, instruction address translation performance has seen less attention at both architecture and operating system (OS) level. However, our performance analysis shows that a variety of services, ranging from compilers to web user interface frameworks, which provide the infrastructure for many high-level applications, suffer from performance degradation due to instruction address translation overheads. Stall cycles due to instruction address translation account for up to 15% of the execution time. Moreover, instruction address translation overhead is likely to grow with the increasing degree of parallelism at both architecture and application levels.From a security perspective, attackers can leverage address translation information to steal confidential data via privileged side-channel attacks. Recent works such as Intel SGX and Virtual Ghost prevent OS kernels from reading or corrupting confidential application data. However, with the abilities to process page faults and configure page tables, a compromised OS can monitor the victim's memory access behavior and use this information to infer its secret data. In this dissertation, we show that (1) compaction and selective sharing of instruction address translation information can improve performance by reducing memory management overhead; (2) appropriate control of address space permissions can protect memory management data from being used to compromise application data confidentiality. In particular, we explore the efficacy of different OS-level approaches to automatically reduce instruction address translation overhead for widely-used mobile, desktop, and server applications, including automatic superpage promotion and sharing page table and translation lookaside buffer (TLB) entries. The combined effects of these approaches can reduce an application's total execution cycles by up to 18%. In addition, we also propose defenses against page table and last-level cache (LLC) side-channel attacks by privileged code and explore mitigations against bounds check bypass attacks launched by a compromised OS."--Pages xiix-xiii.

Revisiting Virtual Memory

Revisiting Virtual Memory PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 166

Get Book Here

Book Description
Page-based virtual memory (paging) is a crucial piece of memory management in today's computing systems. However, I find that need, purpose and design constraints of virtual memory have changed dramatically since translation lookaside buffers (TLBs) were introduced to cache recently-used address translations: (a) physical memory sizes have grown more than a million-fold, (b) workloads are often sized to avoid swapping information to and from secondary storage, and (c) energy is now a first-order design constraint. Nevertheless, level-one TLBs have remained the same size and are still accessed on every memory reference. As a result, large workloads waste considerable execution time on TLB misses and all workloads spend energy on frequent TLB accesses. In this thesis I argue that it is now time to reevaluate virtual memory management. I reexamine virtual memory subsystem considering the ever-growing latency overhead of address translation and considering energy dissipation, developing three results. First, I proposed direct segments to reduce the latency overhead of address translation for emerging big-memory workloads. Many big-memory workloads allocate most of their memory early in execution and do not benefit from paging. Direct segments enable hardware-OS mechanisms to bypass paging for a part of a process's virtual address space, eliminating nearly 99% of TLB miss for many of these workloads. Second, I proposed opportunistic virtual caching (OVC) to reduce the energy spent on translating addresses. Accessing TLBs on each memory reference burns significant energy, and virtual memory's page size constrains L1-cache designs to be highly associative -- burning yet more energy. OVC makes hardware-OS modifications to expose energy-efficient virtual caching as a dynamic optimization. This saves 94-99% of TLB lookup energy and 23% of L1-cache lookup energy across several workloads. Third, large pages are likely to be more appropriate than direct segments to reduce TLB misses under frequent memory allocations/deallocations. Unfortunately, prevalent chip designs like Intel's, statically partition TLB resources among multiple page sizes, which can lead to performance pathologies for using large pages. I proposed the merged-associative TLB to avoid such pathologies and reduce TLB miss rate by up to 45% through dynamic aggregation of TLB resources across page sizes.

Efficient Fine-grained Virtual Memory

Efficient Fine-grained Virtual Memory PDF Author: Tianhao Zheng (Ph. D.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 252

Get Book Here

Book Description
Virtual memory in modern computer systems provides a single abstraction of the memory hierarchy. By hiding fragmentation and overlays of physical memory, virtual memory frees applications from managing physical memory and improves programmability. However, virtual memory often introduces noticeable overhead. State-of-the-art systems use a paged virtual memory that maps virtual addresses to physical addresses in page granularity (typically 4 KiB ).This mapping is stored as a page table. Before accessing physically addressed memory, the page table is accessed to translate virtual addresses to physical addresses. Research shows that the overhead of accessing the page table can even exceed the execution time for some important applications. In addition, this fine-grained mapping changes the access patterns between virtual and physical address spaces, introducing difficulties to many architecture techniques, such as caches and prefecthers. In this dissertation, I propose architecture mechanisms to reduce the overhead of accessing and managing fine-grained virtual memory without compromising existing benefits. There are three main contributions in this dissertation. First, I investigate the impact of address translation on cache. I examine the restriction of virtually indexed, physically tagged (VIPT) caches with fine-grained paging and conclude that this restriction may lead to sub-optimal cache designs. I introduce a novel cache strategy, speculatively indexed, physically tagged (SIPT) to enable flexible cache indexing under fine-grained page mapping. SIPT speculates on the value of a few more index bits (1 - 3 in our experiments) to access the cache speculatively before translation, and then verify that the physical tag matches after translation. Utilizing the fact that a simple relation generally exists between virtual and physical addresses, because memory allocators often exhibit contiguity, I also propose low-cost mechanisms to predict and correct potential mis-speculations. Next, I focus on reducing the overhead of address translation for fine-grained virtual memory. I propose a novel architecture mechanism, Embedded Page Translation Information (EMPTI), to provide general fine-grained page translation information on top of coarse-grained virtual memory. EMPTI does so by speculating that a virtual address is mapped to a pre-determined physical location and then verifying the translation with a very-low-cost access to metadata embedded with data. Coarse-grained virtual memory mechanisms (e.g., segmentation) are used to suggest the pre-determined physical location for each virtual page. Overall, EMPTI achieves the benefits of low overhead translation while keeping the flexibility and programmability of fine-grained paging. Finally, I improve the efficiency of metadata caching based on the fact that memory mapping contiguity generally exists beyond a page boundary. In state-of-the-art architectures, caches treat PTEs (page table entries) as regular data. Although this is simple and straightforward, it fails to maximize the storage efficiency of metadata. Each page in the contiguously mapped region costs a full 8-byte PTE. However, the delta between virtual addresses and physical addresses remain the same and most metadata are identical. I propose a novel microarchitectural mechanism that expands the effective PTE storage in the last-level-cache (LLC) and reduces the number of page-walk accesses that miss the LLC.

Operating Systems

Operating Systems PDF Author: Remzi H. Arpaci-Dusseau
Publisher: Createspace Independent Publishing Platform
ISBN: 9781985086593
Category : Operating systems (Computers)
Languages : en
Pages : 714

Get Book Here

Book Description
"This book is organized around three concepts fundamental to OS construction: virtualization (of CPU and memory), concurrency (locks and condition variables), and persistence (disks, RAIDS, and file systems"--Back cover.

The Design and Evaluation of In-cache Address Translation

The Design and Evaluation of In-cache Address Translation PDF Author: David A. Wood
Publisher:
ISBN:
Category : Cache memory
Languages : en
Pages : 524

Get Book Here

Book Description


TRON Project 1987 Open-Architecture Computer Systems

TRON Project 1987 Open-Architecture Computer Systems PDF Author: Ken Sakamura
Publisher: Springer Science & Business Media
ISBN: 4431680691
Category : Computers
Languages : en
Pages : 311

Get Book Here

Book Description
Almost 4 years have elapsed since Dr. Ken Sakamura of The University of Tokyo first proposed the TRON (the realtime operating system nucleus) concept and 18 months since the foundation of the TRON Association on 16 June 1986. Members of the Association from Japan and overseas currently exceed 80 corporations. The TRON concept, as advocated by Dr. Ken Sakamura, is concerned with the problem of interaction between man and the computer (the man-machine inter face), which had not previously been given a great deal of attention. Dr. Sakamura has gone back to basics to create a new and complete cultural environment relative to computers and envisage a role for computers which will truly benefit mankind. This concept has indeed caused a stir in the computer field. The scope of the research work involved was initially regarded as being so extensive and diverse that the completion of activities was scheduled for the 1990s. However, I am happy to note that the enthusiasm expressed by individuals and organizations both within and outside Japan has permitted acceleration of the research and development activities. It is to be hoped that the presentations of the Third TRON Project Symposium will further the progress toward the creation of a computer environment that will be compatible with the aspirations of mankind.

Euro-Par 2024: Parallel Processing

Euro-Par 2024: Parallel Processing PDF Author: Jesus Carretero
Publisher: Springer Nature
ISBN: 3031695771
Category :
Languages : en
Pages : 430

Get Book Here

Book Description


Algorithms and Architectures for Parallel Processing

Algorithms and Architectures for Parallel Processing PDF Author: Yongxuan Lai
Publisher: Springer Nature
ISBN: 3030953882
Category : Computers
Languages : en
Pages : 757

Get Book Here

Book Description
The three volume set LNCS 13155, 13156, and 13157 constitutes the refereed proceedings of the 21st International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2021, which was held online during December 3-5, 2021. The total of 145 full papers included in these proceedings were carefully reviewed and selected from 403 submissions. They cover the many dimensions of parallel algorithms and architectures including fundamental theoretical approaches, practical experimental projects, and commercial components and systems. The papers were organized in topical sections as follows: Part I, LNCS 13155: Deep learning models and applications; software systems and efficient algorithms; edge computing and edge intelligence; service dependability and security algorithms; data science; Part II, LNCS 13156: Software systems and efficient algorithms; parallel and distributed algorithms and applications; data science; edge computing and edge intelligence; blockchain systems; deept learning models and applications; IoT; Part III, LNCS 13157: Blockchain systems; data science; distributed and network-based computing; edge computing and edge intelligence; service dependability and security algorithms; software systems and efficient algorithms.

Computer Organization and Design- A Complete Overciew

Computer Organization and Design- A Complete Overciew PDF Author: Code Xtracts
Publisher: by Mocktime Publication
ISBN:
Category : Computers
Languages : en
Pages : 61

Get Book Here

Book Description
Computer Organization and Design- A Complete Overciew for Engineering, BCA abd BSC Computer Courses; BCA Semester, Engineering Semester, BSC Computer Semester