Article by Devesh Krishnani
The work that Emma presented at Soroco, called Waldo, is a private time-series database focused on how to protect the privacy of user data stored in it by cryptographically securing their data, access patterns of their data, as well the query filter values. Particularly, the work also focused on how to support multi-predicate filtering which is common in database queries. All of this was published in S&P (Oakland) in “Waldo: A Private Time-Series Database from Function Secret Sharing.”
At Soroco, we build the work graph that helps organizations understand how digital work gets done. The work graph is a connected sequence of steps that teams execute. It is, in essence, a map of how teams execute digital work, and it lies at the intersection of people, work, and technology. Once discovered, the work graph enables teams to collaborate and work more effectively.
Since the work graph is a sequence of steps that teams execute and sourced from the activities that the teams perform, a major role of the work graph’s design and information access is to ensure end-user privacy is protected. How we protect end-user privacy is and will always be a focus of our system design. For these reasons, we invited Emma to give the talk so that we could learn more about furthering cryptographic storage and privacy.
Waldo suggests the use of a cryptographic technique called Function Secret Sharing (FSS) to generate FSS keys for the query on the client side and evaluate it at the server. Combining this technique with replicated secret sharing ensures that the malicious attacker is not able to determine the access pattern or the filter values used to access the result. In the experiment setting, there is a medical practitioner who queries the data across two data servers using FSS keys generated at client side and the server returns the shares of data. These shares of data are again aggregated at client side to produce the final output. Although Waldo’s complete protocol uses 3 servers, we are showing a simplified example with two below. Emma covers the complete protocol in both the paper and the recording of her talk above!
To ensure that a malicious party is not able to access the access pattern of the data: let’s deep dive into the problem in Soroco’s context. Please note that this is a simplified example of the Waldo system, and we encourage reading the original paper and watching the video for further real-world applications! Suppose there is an individual in a data analyst role that wants to identify how many applications were accessed between two periods of time X and Y. The query in the work graph would be:
Query = “select count(distinct application) from workgraph where time between X and Y. “
In this simplified example and strawman-based approach, we will generate secure keys for this query, e.g., K1 and K2 using a Generate method.
K1, K2 <- Gen(Query).
Sticking with this simplified model, we can then take these keys K1, K2 and initiate a request to the data servers using one key each as shown below. At the server level, we can then use a method Eval and iterate across the entire dataset. Eval method will either produce zero or one for the index in the dataset. One is produced when K1 matches the index that the analyst is interested in and zero when the index does not match. If the index match is successful, then Eval returns a “share” of data corresponding to the index back to client machine (in this case, the analyst’s). In the final Waldo system design, however, instead of just 2 pairs of keys you would further scale this up to 6 pairs of keys for a total of 12 keys.
Within the client machine, the “share” or partial output from both the servers are aggregated to produce the final output required by the analyst.
Here the Eval method uses the database contents as the key to evaluate the K1 Key (FSS key) on the database. But the server cannot simply evaluate its FSS key on the database contents because the server should not be able to view the database contents. At the same time, the servers need to evaluate their keys on identical copies of the database to produce correct outputs. Further, the author proposed the use of a search index structure to store the time series data to reduce complexity during evaluation. This allows Waldo to leverage FSS using the structure rather than the context of the search index.
Waldo is a unique piece of work that we believe can help further security and privacy in technology globally. As future work, we imagine a few areas to be strengthened. For example, Waldo implementation protects the access pattern while querying the data as well as the filter values used for querying.
There are several considerations at scale in Waldo’s design, which we think lead to future work in this area. Distributed trust by distributing the data and queries across multiple servers in different trusted domains (e.g., cloud providers) provides much stronger guarantees around privacy and protects against malicious servers but obviously comes at a high cost. That is around the additional server, storage, and maintenance costs to obtain that distributed trust. For these reasons, what data is put in this distributed trust framework may need to be limited to sensitive user information as opposed to simply migrating one’s entire database to this framework.
Additionally, performance in terms of computational requirements and additional latency introduced by the cryptographic storage and distributed trust framework also needs to be considered in the application of the data. Waldo’s evaluation has shown significant improvements on latency with multiple predicates as compared to prior work (e.g., MP-SPDZ and ORAM) and we would encourage further advancements in reducing these latencies to be able to scale with the kinds of queries and data sizes we see in our application of the work graph.
We encourage our readers to watch Emma’s talk and read the author’s articles of work that was published in the conferences.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |