A Holistic System Support for Persistent Memory
Abstract:
Persistent memory (PM) technologies, such as Intel’s Optane, provide a class of high-performance, byte-addressable, and durable memory. The new features from PM allow the software to directly manage their persistent data in memory, as opposed to the conventional way that goes through the file system. Thanks to these advantages, persistent memory is being widely adopted, from low-power devices to high-performance servers. Though performant, integrating this new class of memory would require significant changes throughout the system stack. First, programs that directly manage persistent data in PM need to guarantee data recovery in event of a failure. However, it is hard and error-prone to implement a failure-recovery mechanism as programs need to carefully manage the order in which writes become persistent. We refer to this requirement as the crash consistency guarantee. Second, PM is both a memory and a storage device. Thus, it needs to integrate system support for both memory and storage devices, such as memory encryption and integrity verification that secure the data and memory compression that improves the bandwidth. Among those system and storage supports, the security guarantees are the most important but encryption and integrity verification increase the access latency. Moreover, these supports should also follow the existing crash consistency guarantees. Third, even though data has been encrypted and integrity-verified, there can be other vulnerabilities in real PM systems. For example, Intel’s Optane PM uses multiple levels of caches and buffers to improve performance. These hardware structures can potentially be leverage as side channels.
My thesis aims to provide system supports to overcome these new challenges. We hypothesize that a whole system-level redesign, from programming support to hardware, that ensures correctness, security, and high-performance is necessary in order to integrate persistent memory into practical systems. On the software side, to ensure the failure-recovery correctness, we have developed testing tools, PMTest and XFDetector, and a test case generator, PMFuzz, to help programmers develop correct PM programs. On the hardware side, we have proposed efficient and crash-consistent secured hardware-software co-designs for PM systems. Taking one step further, we propose to study side-channel vulnerabilities and design mitigation solutions.
Committee:
- Kevin Skadron, Committee Chair (CS)
- Samira Khan, Advisor (CS)
- Yuan Tian (CS)
- Mircea Stan (ECE)
- Thomas Wenisch (CSE, University of Michigan)
- Baishakhi Ray (CS, Columbia University)