Waldo: A Private Time-Series Database from Function Secret Sharing Talk by Emma Dauterman, Ph.D. Student in Computer Science, UC Berkeley Article by Devesh Krishnani About Our Guest Speaker Soroco invited Emma Dauterman, a current 4th year Ph.D. student studying computer science in UC Berkeley’s RISELab where she is advised by Raluca Ada Popa and Ion Stoica. Emma’s work throughout her Ph.D. has focused broadly in building secure systems using cryptography, published in IEEE and ACM conferences such as S&P (Oakland), SOSP, OSDI, and others. We invited Emma to give a talk at Soroco as her work relates very closely to technologies we build that must be both secure and scalable. Why this talk at Soroco The work that Emma presented at Soroco, called Waldo, is a private time-series database focused on how to protect the privacy of user data stored in it by cryptographically securing their data, access patterns of their data, as well the query filter values. Particularly, the work also focused on how to support multi-predicate filtering which is common in database queries. All of this was published in S&P (Oakland) in “Waldo: A Private Time-Series Database from Function Secret Sharing.” At Soroco, we build the work graph that helps organizations understand how digital work gets done. The work graph is a connected sequence of steps that teams execute. It is, in essence, a map of how teams execute digital work, and it lies at the intersection of people, work, and technology. Once discovered, the work graph enables teams to collaborate and work more effectively. Since the work graph is a sequence of steps that teams execute and sourced from the activities that the teams perform, a major role of the work graph’s design and information access is to ensure end-user privacy is protected. How we protect end-user privacy is and will always be a focus of our system design. For these reasons, we invited Emma to give the talk so that we could learn more about furthering cryptographic storage and privacy. Watch the Talk Powerful ideas from Waldo As presented in Waldo’s system design, it is focused on a time-series database design that supports write-intensive workloads that have a high ratio of updates (‘appends’ being what is supported), where multiple features and multiple predicates are supported while keeping the data cryptographically protected. There are two types of clients in the system, data producers and queriers (or clients that are both). Data producers collect real-time data and update server state with it. Data queriers query the data collected stored on the server. Distributed Trust: The author’s work around Waldo first leverages distributed trust through multiple server deployments. For example, by storing the data in three servers present in three different trust domains. Therefore, if the data is compromised in one trust domain the other two trust domains could still be used to access the correct data. Practically, that means that these servers should be deployed in different clouds and managed by different organizations. By distributing information across these servers, if a majority of the servers are honest then a single malicious server cannot learn the data contents, query filter values, or any search access patterns. Clients need to send messages directly to each of the servers, distributing the information and trust. At Soroco, We think that this follows a powerful design pattern to protect against single or malicious compromised servers. However, it does increase system design complexity and cost. Therefore, when to leverage the distributed trust framework depends on what data is important. Encrypting query at client and decrypting at server using Function Secret Sharing: Waldo suggests the use of a cryptographic technique called Function Secret Sharing (FSS) to generate FSS keys for the query on the client side and evaluate it at the server. Combining this technique with replicated secret sharing ensures that the malicious attacker is not able to determine the access pattern or the filter values used to access the result. In the experiment setting, there is a medical practitioner who queries the data across two data servers using FSS keys generated at client side and the server returns the shares of data. These shares of data are again aggregated at client side to produce the final output. Although Waldo’s complete protocol uses 3 servers, we are showing a simplified example with two below. Emma covers the complete protocol in both the paper and the recording of her talk above! Using MAC key along with FSS key: The Waldo system design further expands on the above idea by introducing the concept of using MAC keys to verify at client side whether the result from server is correct. Here the client is only checking whether the data is compromised on the server or not. In the experimental setting, the medical practitioner sends pair of keys to the server. Waldo uses MAC techniques originating in multi-party computation to provide these security guarantees. Using Function Secret Sharing To ensure that a malicious party is not able to access the access pattern of the data: let’s deep dive into the problem in Soroco’s context. Please note that this is a simplified example of the Waldo system, and we encourage reading the original paper and watching the video for further real-world applications! Suppose there is an individual in a data analyst role that wants to identify how many applications were accessed between two periods of time X and Y. The query in the work graph would be: Query = “select count(distinct application) from workgraph where time between X and Y. “ In this simplified example and strawman-based approach, we will generate secure keys for this query, e.g., K1 and K2 using a Generate method. K1, K2 <- Gen(Query). Sticking with this simplified model, we can then take these keys K1, K2 and initiate a request to the data servers using one key each as shown below. At the server level, we can then use a method Eval and iterate across the entire dataset. Eval method will either produce zero or one… Continue reading Waldo: A Private Time-Series Database from Function Secret Sharing