Binary Recompilation and Combined Compiler/Architecture Enhancements Studies

UM faculty: Trevor Mudge, Steve Reinhardt, Gary Tyson

    A binary recompiler is a software system that takes executable binaries as input, analyzes their structure, applies transformations and optimizations, and outputs new optimized executable binaries.  Although a few recompilation systems are currently available, such as OM for the Alpha architecture from DEC?s Western Research Laboratory and EEL for SPARC from the University of Wisconsin, these tools have been primarily used to instrument binaries (e.g., for profiling) rather than to optimize them.  (The Spike tool, also from DEC, is a notable exception.)  We believe that binary recompilation is a key technology for future high-performance systems.  We propose the following three areas of research:

1.    Recompiling legacy binaries to incorporate modern compiler optimizations
2.    Architectural support for recompilation (including dynamic, on-line recompilation)
3.    Architectural enhancements potentially made viable by recompilation

    The first area studies the benefits of the recompilation of legacy binaries simply by applying modern compiler optimizations that were unavailable when the binary was first produced.  Estimating that compiler technology has contributed about 5% per year in system performance improvements over the last decade or more, the compounded performance loss to legacy binaries may be quite significant.  Although any of the instruction set architectures (ISAs) employed by IBM could benefit from recompilation technology (including PowerPC and Intel IA32), the 3090 architecture stands to benefit most because it has the longest history and hence supports the largest proportion of legacy code.

    As first step in this direction we propose to study the performance potential of recompiling legacy 3090 binaries. Our initial goal is not to build a full recompiler but to set up experiments that will model the transformations that could be obtained using a statically optimizing recompiler. The results of this study will determine the extent that statically optimizing recompilation can help improve legacy codes, and thus whether we should pursue a more detailed study of this type of recompilation.  Regardless of the outcome for static optimization, we will also consider the potential of profile-driven optimizations.  Profile-driven recompilation introduces a more complex optimization process, where a binary is first recompiled to add instrumentation, executed on test data to generate profile data, and finally recompiled again to incorporate profile-based optimizations.  However, the use of profile data will enable more sophisticated optimization, benefiting even binaries compiled with modern optimizing compilers.  The goal of our initial study will be to determine the extent to which profile information improves on static optimization and, in particular, whether profiling is required to make recompilation-based optimization viable.

    The second area of research looks at possible architectural support for recompilation.  Existing binary recompilation tools have identified a number of challenges, particularly self-modifying code and statically unknown indirect jump targets. Both are significant problems for the 3090 and IA32 ISAs, while only the latter is a problem for PowerPC.  A pure recompiler may incur significant overheads in dealing with these problems, even when the dynamic occurrence of self-modifying code or unknown indirect jumps is rare.  For example, an architectural extension could be used to tag memory regions containing recompiled code, causing traps into a run-time fix-up recompiler on writes to code space and on indirect jumps to non-recompiled code.  Although virtual memory protections may be suitable in this case, a new mechanism dedicated for this purpose would enable a more suitable granularity and lower overhead.

  • Sophisticated bit manipulation instructions for interpretation of dynamically generated code. PowerPC already has a very nice set (rlwinm, rlwimi), presumably for 68K emulation
  • Support for low-overhead profiling
  • Additional registers reserved for dynamic recompiler use, e.g. to hold common base addresses and constants
  •     We will explore these ideas in the context of the 3090 and/or PowerPC ISAs (to be determined later in conjunction with IBM researchers).  We will develop a simple prototype recompiler to understand the exact nature of these problems, which will refine our ideas regarding the nature of useful architectural supports. The recompiler will be based on our MIRV intermediate representation, allowing us to reuse existing optimization modules and share future developments with our source-code compilation infrastructure.  We will then incorporate the most promising features into architectural simulators to evaluate them in more detail. Finally, in the longer term, the existence of reliable recompilation technology increases the viability of architectural innovations.  Via recompilation, even pre-existing binaries can take advantage of newly introduced architectural extensions, such as an increased number of addressable registers or new domain-specific operations (e.g., MMX).  Ideas that we will explore include:     The final area, speculative optimization, is the most novel and radical of these extensions.  The idea is that the compiler/recompiler need not be conservative in applying transformations that may violate correctness under some unlikely (but possible) conditions. In a conventional compile-then-execute framework, an incorrect assumption will lead to incorrect results. If the appropriate hooks are present in the microarchitecture, transformations can violate correctness as long as this is detected before application state is irretrievably modified. A new microarchitecture can exploit this capability by incorporat­ing additional instructions designed to pass information about assumptions made during opti­mization that cannot be violated. If a violation is identified during execution, the microarchitecture can restore the state and restart in a non-speculative mode. This ability to avoid conservative optimization can lead to significant performance benefits. Ideas that we will explore in speculative optimization include:     Investigation of all these architectural extensions will build on the prototype recompiler and simulators developed for the previous studies on architectural support for recompilation.  Again, our ongoing experience with the recompiler will help us refine our ideas; the most promising ones will be implemented in the recompiler and in the architectural simulators to provide detailed evaluation of their feasibility and performance potential.