CERCS Technical Reports

GIT-CERCS-03-21

HIDE: Hardware-support for Leakage-Immune Dynamic Execution

Secure processors have been recently introduced, which enable new applications involving software anti-piracy, program execution certification, and secure mobile agents. Secure processors have built-in hardware support for cryptographic mechanisms and can prevent both software attacks and physical attacks. Several recent papers have shown how to construct a secure processor to protect the confidentiality [1][2][3]and integrity[4][3] of a program. The proposed designs are immune from spoofing, splicing and replay attacks. However, none of the previous work is able to address the attacks due to information leakage on the address bus. Dangers due to information leakage on the address bus have been acknowledged to be an important as well as a difficult problem[1]. In fact, in [4]this problem is actually the trigger of the replay attack described.
In this paper, we show that several attacks are possible by monitoring the instruction access sequence on the address bus. Such attacks could emanate from identifying the core algorithms by pattern matching the control flow graph or from finding out or narrowing down critical variables that decide outcomes of conditional branches. We analyze the causes behind such information leakage and then determine the primary requirement that must be met to prevent it. Based on this requirement, we propose HIDE, a hardware-based approach to hide the instruction access sequence. The main goal of HIDE is to construct a fixed instruction access sequence issued to the memory to achieve zero leakage of control flow information, giving a security guarantee. Our base approach involves constructing a fixed instruction access sequence covering the whole program (called base access ring) to hide the actual instruction fetch. This might however lead to severe performance degradation due to tremendous stalls making the framework infeasible. Therefore, we propose two approaches to overcome this problem. In our scheme, the architecture dynamically tracks a hot function set. Based on the hot function set, the first approach involves prefetching blocks accordingly into an on-chip prefetch buffer. The second approach establishes a secondary access ring, which is smaller and faster than the base access ring. The instruction blocks are prefetched from the base ring into the secondary ring instead.
We observe considerable elimination of degradation due to our architectural improvements. For 512K L2 cache, the degradation is reduced from 73% to 38%; for 1M L2, it is cut from 65% to 34% with a reasonable amount of hardware resource.