SIGIR 2014 Workshop

Privacy-Preserving IR: When Information Retrieval Meets Privacy and Security

Invited Keynote Speakers

Christopher W. Clifton (Purdue University, currently on leave at the National Science Foundation, USA)
Keynote Title: (Semi)Private Information Retrieval: A Discussion of Privacy Risks and Relaxation
Abstract: Private Information Retrieval started with a very strict premise: Nobody should learn anything about the query or what is retrieved. The results are not encouraging: Even simple Document ID based retrieval becomes impractical. But we have also seen that openly disclosing queries and results can lead to clear privacy violations.The solution must lie in between these extremes. This talk will look at some past work that takes very different approaches to protecting privacy, and recent debates of privacy risks and harm. We will lay out foundations for discussing privacy requirements in information retrieval. The goal is to lead to a discussion among the workshop participants that will identify opportunities for future research in Privacy-Preserving Information Retrieval.

About the speaker:Dr. Clifton ( works on data privacy, particularly with respect to analysis of private data. This includes privacy-preserving data mining, data de-identification and anonymization, and limits on identifying individuals from data mining models. He also works more broadly in data mining, including data mining of text and data mining techniques applied to interoperation of heterogeneous information sources. Fundamental data mining challenges posed by these applications include extracting knowledge from noisy data, identifying knowledge in highly skewed data (few examples of "interesting" behavior), and limits on learning. He also works on database support for widely distributed and autonomously controlled information, particularly issues related to data privacy. Prior to joining Purdue in 2001, Dr. Clifton was a principal scientist in the Information Technology Division at the MITRE Corporation. Before joining MITRE in 1995, he was an assistant professor of computer science at Northwestern University. He has a Ph.D. (1991) and M.A. (1988) from Princeton University, and Bachelor's and Master's degrees (1986) from the Massachusetts Institute of Technology. He is currently on a leave of absence from Purdue serving as a rotating program director in the Division of Information and Intelligent Systems at the National Science Foundation.

David D. Lewis (David D. Lewis Consulting, USA)
Keynote Title: Privacy in Creating Text Classifiers for Electronic Discovery
Abstract: A large civil lawsuit, particularly in the United States, can require the categorization of billions of enterprise documents to determine which need to produced to opposing parties. Text classifiers, whether created manually or trained by supervised learning, have become the only practical approach to this task, with positive predictions usually reviewed by attorneys prior to production. The adversarial setting poses challenges, however, since producing parties view both interactive development of search queries and labeling of training data as risking the revelation of sensitive information. I discuss the approaches that have developed for meeting these challenges in e-discovery, make connections with the literature on privacy-preserving IR and data mining, and suggest some directions for research that would be of great benefit in e-discovery.

About the speaker: Dave Lewis, Ph.D. ( is a consulting computer scientist working in the areas of information retrieval, data mining, natural language processing, and the evaluation of complex information systems. He formerly held research positions at AT&T Labs, Bell Labs, and the University of Chicago. He has published more than 75 scientific papers and 8 patents, and is a Fellow of the American Association for the Advancement of Science. Dr. Lewis has served as a consulting and testifying expert on e-discovery issues in civil litigation, including in the Kleen Products, Actos, and da Silva Moore cases.

Douglas W. Oard (University of Maryland, College Park, USA)
Keynote Title: Search Among Secrets: Separating the wheat from the buzzsaw
Abstract: A fundamental assumption of nearly all information retrieval research is that content that should not be shown to the user should simply not be included in the collection that is indexed. This assumption breaks down, however, when it is not practical to separate what we want (the “wheat”) from what needs to be protected from disclosure (the “buzzsaw” that can ruin your day). In this talk, I will motivate the problem from four perspectives: multi-level security in enterprise search, scholarly access to personal papers in archival institutions, requests by citizens for access to government records, and withholding of privileged content in the “discovery” process that arises from civil litigation in some jurisdictions. For each of the last three cases, I will describe a research project that sheds some light on requirements, challenges, and capabilities. I’ll then conclude the talk by offering thoughts on task design, evaluation design, and system design that may help to move us closer to being able to address this increasingly important grand challenge of search among secrets.

About the Speaker: Douglas Oard is a Professor at the University of Maryland, College Park, with joint appointments in the College of Information Studies and the Institute for Advanced Computer Studies. Dr. Oard earned his Ph.D. in Electrical Engineering from the University of Maryland. His research interests center around the use of emerging technologies to support information seeking by end users. Additional information is available at

Website last updated: Jun 20th, 2014.