GIT-CERCS-03-21
Xiaotong Zhuang, Tao Zhang, Santosh Pande, Hsien-Hsin S. Lee,
HIDE: Hardware-support for Leakage-Immune Dynamic Execution
Secure processors have been recently introduced, which enable new applications
involving software anti-piracy, program execution certification, and secure
mobile agents. Secure processors have built-in hardware support for
cryptographic mechanisms and can prevent both software attacks and physical
attacks. Several recent papers have shown how to construct a secure processor to
protect the confidentiality [1][2][3]and integrity[4][3] of a program. The
proposed designs are immune from spoofing, splicing and replay attacks. However,
none of the previous work is able to address the attacks due to information
leakage on the address bus. Dangers due to information leakage on the address
bus have been acknowledged to be an important as well as a difficult problem[1].
In fact, in [4]this problem is actually the trigger of the replay attack
described.
In this paper, we show that several attacks are possible by
monitoring the instruction access sequence on the address bus. Such attacks
could emanate from identifying the core algorithms by pattern matching the
control flow graph or from finding out or narrowing down critical variables that
decide outcomes of conditional branches. We analyze the causes behind such
information leakage and then determine the primary requirement that must be met
to prevent it. Based on this requirement, we propose HIDE, a hardware-based
approach to hide the instruction access sequence. The main goal of HIDE is to
construct a fixed instruction access sequence issued to the memory to achieve
zero leakage of control flow information, giving a security guarantee. Our base
approach involves constructing a fixed instruction access sequence covering the
whole program (called base access ring) to hide the actual instruction fetch.
This might however lead to severe performance degradation due to
tremendous stalls making the framework infeasible. Therefore, we propose
two approaches to overcome this problem. In our scheme, the architecture
dynamically tracks a hot function set. Based on the hot function set, the first
approach involves prefetching blocks accordingly into an on-chip prefetch
buffer. The second approach establishes a secondary access ring, which is
smaller and faster than the base access ring. The instruction blocks are
prefetched from the base ring into the secondary ring instead.
We observe
considerable elimination of degradation due to our architectural improvements.
For 512K L2 cache, the degradation is reduced from 73% to 38%; for 1M L2, it is
cut from 65% to 34% with a reasonable amount of hardware resource.