Li Xiong, Subramanyam Chitti, Ling Liu,
Topk Queries across Multiple Private Databases
Advances in distributed service-oriented computing and global communications have formed a strong technology push for large scale data integration among organizations and enterprises. It is widely observed that multiple organizations in the same market sectors are actively competing as well as collaborating with constantly evolving alliances. Many such organizations want to find out the aggregation statistics about sales in the sector without disclosing sales data in their private databases. Privacy-preserving data sharing is becoming increasingly important for large scale mission-critical data integration applications.
In this paper we present a decentralized peer-to-peer protocol for supporting statistics queries over multiple private databases while respecting privacy constraints of participants. Ideally, given a database query spanning multiple private databases, we wish to compute the answer to the query without revealing any additional information of each individual database apart from the query result. In practice, a popular approach is to relax this constraint to allow efficient information integration while minimizing the information disclosure. The paper has a number of unique contributions. First, we formalize the notion of loss of privacy in terms of information revealed and propose a data privacy metric. Second, we propose a novel probabilistic decentralized protocol for privacy preserving top$k$ selection. Third, we perform a formal analysis of the protocol and also experimentally evaluate the protocol in terms of its correctness, efficiency and privacy characteristics.