GIT-CERCS-07-07
Aameek Singh, Mudhakar Srivatsa, Ling Liu,
Efficient and Secure Search of Enterprise File Systems
With fast paced growth of enterprise data, quickly locating relevant content has become a critical IT capability. Research has shown that nearly 85% of enterprise data lies in flat filesystems that allow multiple users and user groups with different access privileges to underlying data. Any search tool for such large scale systems needs to be efficient and yet cognizant of the access control semantics imposed by the underlying filesystem. Current multiuser enterprise search techniques use two disjoint search and access-control components by creating a single system-wide index and simply filtering search results for access control. This approach is ineffective as the index and query statistics subtly leak private information. The other available approach of using separate indices for each user is undesirable as it not only increases disk consumption due to shared files, but also increases the overheads of updating the indices whenever a file changes.
We propose a distributed approach that couples search and access-control into a unified framework and provides secure multiuser search. Our scheme (logically) divides data into independent access-privileges based chunks, called access-control barrels (ACB). ACBs not only manage security but also improve overall efficiency as they can be indexed and searched in parallel by distributing them to multiple enterprise machines. We describe the architecture of ACBs based search framework and propose two optimization technique that ensure the scalability of our approach. We also discuss other useful features of our approach \u2013 seamless integrationwith desktop search and an extenstion to provide secure search in untrusted storage service provider environments. We validate our approach with a detailed evaluation using industry benchmarks and real datasets. Our initial experiments show secure search with 38% improved indexing efficiency and low overheads for ACB processing.