(Paper Reading) TrustShadow#paper-reading
[MobiSys 2017] Le Guan, Peng Liu, Xinyu Xing, Xinyang Ge, Shengzhi Zhang, Meng Yu, and Trent Jaeger, “TrustShadow: Secure Execution of Unmodified Applications with ARM TrustZone”.
This paper is published by Prof. Peng Liu’s team in Penn State University, and it mainly talks about how to construct a “secure” system which can run native apps on ARM platform with the help of TrustZone.
Author and Background
The first author, Le Guan, receive his B.E. in University of Science and Technology of China (中科大), and his Ph.D in Chinese academy of Sciences (中科院). Currently, he is a post-doc researcher in Prof. Liu’s Laboratory. His research mainly focus on security.
The background of this work is that with the rapid development of IoT, the security problem of these mostly ARM-based devices has become a public concern. And an secure operating system is in need to solve this problem.
Several ARM TrustZone based solutions have been proposed. A very common idea is to modify the application, and execute the most confidential part of code, e.g. encryption, decryption, and some important transcation, in the TrustZone. Another idea is to reconstruct the whole system to provide some security related functionality. Both way need to a lot of modification, which is not really productive or consolidated.
The target of this research is to provide a usable secure OS for ARM/TrustZone platform with minimal modification on Linux (actually, this work has only 5.3k LoC), and can execute unmodified applications.
TrustZone is a technology which provides a trusted execution environment on ARM. As we can see in the figure, the execution environment is devided into two parts, the normal world and the secure world. Application or system in the normal world cannot read or write the memory that belongs to the secure world. However, the secure can read the memory that belongs to the normal world. Besides the user mode and the privileged mode, there’s a monitor mode, in which world switching is possible. Usually, programs write confidential codes, encrypt them, and load and execute them in the trusted world, and let the trust world to provide some secure functionalities.
The figure above is the architecture of TrustShadow, which include some modifications in Linux, and a runtime system in the secure world. The runtime triggers exceptions in the secure world, handles some of them inside the secure world, and forwards most of exceptions to the normal world to handle. More specifically, the float point exception (FVP) and random number request (RNG) are the only two exceptions that are handled in the secure world, because the FVP can be used by hackers to perform DoS-like attack, and RNG may be a confidential data in the program.
When a program is loaded to the secure world, Linux creates a Zombie HAP (high assurance process) in the normal world. Linux think it operate on the Zombie processs, however, every operation will be forward to the runtime system in the secure world to handle. (I guess by doing this the author can minimize code modification in scheduling module.) When the secheduler choose this process, a world change occurs and the execution enters the secure world.
The memory space is devided into three parts, ZONE_TZ_RT, ZONE_NORMAL, and ZONE_TZ_APP. ZONE_TZ_RT is for the runtime system in the secure world, while ZONE_NORMAL is for linux in the normal world. The runtime system is fixed mapped.
Page fault may occur in the secure world, because of the dynamic page allocation mechanism of Linux. Page faults in the secure world can be devided into three types.
The first type of page fault happens when the missing page is an anonymous page which is accessed by the trusted world for the first time. In this case, linux and the runtime system just alloc a new page in ZONE_TZ_APP, and insert it to both page tables in normal world and secure world. Actually, inserting this page into the normal world’s page table is useless, but it can make the page management easier.
If the missing page is a unprotected file backed page, e.g. the code segment, this page fault should be classfied to the second type. In this case, linux will load the page in ZONE_NORMAL and insert it to the untrusted page table, then the runtime system will copy it to ZONE_TZ_APP, and insert it to the trusted page table. To validation the page, the runtime system will calculate the hash value of the page. If a page is swapped out, the runtime system can know whether this page is modified when it is swapped in again.
The most complex type of page fault is the third type, which happens when a encrypted page is missing. In this case, the runtime will decrypt the page first, before insert it into the trusted page table.
Weakness of TrustShadow
TrustShadow really includes a lot of work; however, I think there may still be a security concern. In TrustShadow, most system calls are forwarded to the normal world linux to handle. However, since the linux can be hacked, it may returns a invalid value to the secure world. Although the author said that the runtime system will check the return value of a system call, it’s almost impossible to validate all system call, because sometimes we don’t know what the system actually does, and a “valid” value is not always a “correct” value. Also, a hacker may hijack the system call to perform DoS-like attack to the secure world.
One way to avoid this bad situation is to try to minimize the number of system calls that need to forward to linux to handle. Maybe this is just an implementation issue, not a hard limitation.
Besides, the second type of page fault should not exist in a real secure environment, because what need to be executed in the trusted world should be encrypted. Actually, the secure world shouldn’t expose anything to the normal world. The hashing may be another potential confidential concern, because false negative may happen.