muse: multimodal searchable encryption for cloud applications · 2018-07-23 · muse: multimodal...

MuSE: Multimodal Searchable Encryption forCloud Applications

Bernardo Ferreira, Joao Leitao, Henrique DomingosDI, FCT, Universidade NOVA de Lisboa & NOVA LINCS

{bf, jc.leitao, hj}@fct.unl.pt

Abstract—In this paper we tackle the practical challengesof searching encrypted multimodal data (i.e., data containingmultiple media formats simultaneously), stored in public cloudservers, with reduced information leakage. To this end wepropose MuSE, a Multimodal Searchable Encryption schemethat, by combining only standard cryptographic primitives andsymmetric-key block ciphers, allows cloud-backed applicationsto dynamically store, update, and search multimodal datasetswith privacy and efficiency guarantees. As searching encrypteddata requires a tradeoff between privacy and efficiency, we alsopropose a variant of MuSE that resorts to partially homomorphicencryption to further reduce information leakage, but at thecost of additional computational overhead. Both schemes areformally proven secure and experimentally evaluated regardingperformance and search precision. Experiments with realisticdatasets show that our contributions achieve interesting levelsof efficiency and privacy, making MuSE particularly suitable forpractical application scenarios.

I. INTRODUCTION

Applications nowadays manage increasingly larger datacollections [36], including data that simultaneously containsdifferent media formats (also known as multimodal data1) [2].This dataset growth has led to the popularity of cloud servicesfor data and computation outsourcing [1]. In the referred cloudservices, applications outsource the storage and computationsof their data to third-party managed infrastructures, decreasingoperational costs with flexible charging models and leveragingfrom highly available geo-replicated servers. Moreover, asdatasets increase in size, so does the importance of supportingefficient search operations that can return relevant subsets ofdata in response to multimodal queries [35].

Despite the clear advantages cloud services bring, theyalso lead to new security and privacy challenges that mustbe addressed, as outsourcing data and computations alsomeans outsourcing control over them [13]. Recent incidentshave shown that privacy is not preserved by cloud providerswhen using their services [43]. Governmental agencies imposeincreasing pressure on cloud companies to disclose users’data and build insecure backdoors [14], [22]. Malicious orsimply careless cloud administrators have been responsiblefor critical data disclosures [12], [19]. Last but not least,internet hackers exploiting software vulnerabilities in cloudinfrastructures must also be considered, as they may gainaccess to users’ data even if only for a limited time [33].

1An example of multimodal applications are those for medical centermanagement, where patient records can contain both text (written by themedical doctor) and visual data (images obtained from medical equipment).

The conventional approach for addressing such privacyissues is to have applications encrypt all data in transit andat rest [6]. However this leads to expensive computationand communication overheads, as large sets of multimodaldata have to be downloaded (and possibly re-uploaded) whenperforming operations, especially in applications with frequentsearch operations. Performing computations over encrypteddata directly in the cloud servers is possible, with recentadvances in Fully Homomorphic Encryption [21] and Obliv-ious RAM [48]. However existing schemes still impose toomuch computation, storage, and/or communication overheadsfor enabling practical adoption [39].

Cryptographic protocols designed for supporting a specificoperation, such as search over encrypted data, can nonethe-less be used with good privacy-efficiency tradeoffs. In thiscase, these protocols are known as Searchable SymmetricEncryption (SSE) schemes [7], [11], [15]. SSE was originallydesigned for text data [46], with a few recent schemes alsostudying how to search encrypted visual data (images) [18],[34], [50]. In this paper we study a more broad topic: howto support applications dynamically storing and searchingencrypted multimodal data, i.e., data that combines differentmedia formats, including text, images, audio, and video2.

We call our proposal MuSE - Multimodal SearchableEncryption, and base it solely on standard cryptographicprimitives, including Pseudorandom Functions (PRFs) andSymmetric-Key Block Ciphers [28]. At its core MuSE relieson inverted index structures [35] and algorithms that representdifferent media formats through these structures. Multimodalqueries (i.e., queries also composed of different media formats)can then be answered by searching in each format’s index andcombining results through an appropriate merging function.Using these techniques, the research challenge that mustbe addressed is how to securely protect indexing structureswhile allowing their privacy-preserving and efficient operationduring both multimodal data updating and searching.

Since having both full security (i.e., leaking zero informa-tion) and practical efficiency has been shown to be impossiblefor SSE schemes [39], MuSE is required to reveal some mini-mal information patterns when performing operations (namelysearch, access, and frequency patterns [10]). This leakageis common in SSE schemes [7] and results from a tradeoffbetween security and efficiency that is required to achieve sub-

2A solution to this problem can also be fine-tuned to support only onemedia format at a time, offering the same functionality as existing schemes.

1

linear search performance. Nonetheless, further exploring thistradeoff we propose a variant of MuSE, called PHom-MuSE,that employs Partially Homomorphic Encryption [42] whenencrypting index entries. This second scheme exhibits furtherreduced leakage by protecting frequency patterns, but at thecost of additional computational overhead. We formally provethe security properties of both schemes, implement them, andexperimentally evaluate their performance and scalability witha real world multimodal dataset.

In summary, this work provides the following contributions:• We start by revising the state of the art on SSE, followed

by an empirical analysis of existing schemes and theirleakage. From this analysis we propose a new frameworkthat will aid both researchers and developers in thecharacterization of SSE schemes through their leakage(Section II);

• We propose MuSE, an efficient dynamic multimodalsearchable encryption scheme that allows cloud appli-cations to securely store, update, and search multi-modal datasets, by resorting only to standard and ef-ficient cryptographic primitives. Compared to previousSSE schemes, MuSE provides additional functionality(multimodal ranked searching) while displaying similarefficiency and security (Section IV);

• We propose PHom-MuSE, a variant of MuSE that furtherreduces its leakage, namely the leakage of frequencypatterns, at the cost of additional computational overheadby resorting to Partially Homomorphic Encryption (Sec-tion IV-A);

• We formally prove the security properties of our schemesand implement them. Our prototype implementationsfocus on text and image media formats, nonetheless weexplain how to extend them to other media. Using theseprototypes we experimentally evaluate the performanceand scalability of our schemes. Real world datasets andpublicly-available commercial clouds are used in theseexperiments (Section VI).

II. RELATED WORK

With the increasing popularity of cloud services and itsassociated security issues, the topic of searching encrypteddata has quickly become an important research area in recentyears. In this field, Searchable Symmetric Encryption (SSE)strives for a practical balance between efficiency and security.

First proposed by Song et al. [46], searching encryptedtext documents initially required search time linear in thedataset size. Curtmola et al. [15] used an inverted index toachieve sub-linear search performance, while also providingthe first security definitions for SSE. While these works wereconfined to static datasets, Kamara et al. [26], [27] proposedthe first dynamic SSE schemes, where documents could beadded, removed, or updated. Naveed et al. [40] designed adynamic SSE scheme that only required storage services fromthe cloud, instead of storage and computation as in previousschemes. Cash et al. [11] proposed the most efficient dynamicSSE scheme to date. Stefanov et al. [47] presented the first

forward-private dynamic SSE scheme, where updates revealno information even when combined with previously issuedquery tokens. Raphael Bost [7] revisited the topic, proposinga more efficient scheme that achieved the same security notion.

The SSE schemes referred so far focused on exact-matchsearching of text documents, where all documents containinga keyword are returned when the keyword is searched. Rankedsearching, where documents are returned in a sorted order ofrelevance to the query, was addressed by Wang et al. [49]with single keyword queries and Cao et al. [9] with multi-keyword (conjunctive) queries. However these works lackeda formal security analysis. Baldimtsi and Ohrimenko [3]proposed the first ranked SSE scheme with a formal securityanalysis, however their scheme required a cryptographic co-processor to be deployed in the cloud, under the client’scontrol. Additionally, so far these ranked schemes have beenlimited to static document collections, as they depend on pre-computed and immutable ranking scores that would need tobe refreshed and re-encrypted with each document addition,update, or removal.

Searching encrypted data has also been designed for othermedia formats, including visual data (images). Lu et al. [34]presented the first scheme for encrypted image search. Xiaet al. [50] presented a more recent approach to the problem.However these works lack a formal security treatment and donot support dynamic updates. Ferreira et al. [18] presented thefirst dynamic SSE scheme for images with a formal securityanalysis, however it leaked more information than previousschemes for text data: it leaked frequency and update patternsfor all stored data, including the initial dataset (these patternswill be fully detailed in the next section). The problem ofencrypted multimodal searching was addressed for the firsttime by Ferreira et al. [17]. Their work supported dynamicupdates and provided a formal security analysis, however italso leaked update and frequency patterns for all stored data.Hence in this work we present the first dynamic, efficient, andprovably-secure multimodal SSE schemes achieving similarsecurity and leakage guarantees as the state of the art literatureon SSE for text data.

A. SSE Leakage AnalysisAs an extension to the related work analysis performed so

far, we now present an empirical study of the leakage of SSEschemes for different media formats. This study was initiatedby Cash et al. [10], who focused on the leakage of exact-matchqueries on text data. In contrast, we also consider the leakagewhen supporting ranked queries on text data and queries onother media formats.

The efficiency guarantees provided by SSE schemes areonly possible by leaking some information patterns with theexecution of operations [39]. The most commonly leakedpatterns are search and access patterns [15], both leakedby search operations. Search patterns reveal the history of aquery, i.e., how many times it has been performed so far. Thisinformation is leaked by deterministic query tokens submit-ted at search time. Access patterns reveal which documents

2

Level Leakage Name Patterns Leaked e.g. SchemesL2 Fully-Revealed Frequency Search, Access, Frequency & Update [17], [18]L1 Fully-Revealed Occurrence Search, Access & Update [27], [30], [40]

L0=>L2 Query-Revealed Frequency Search, Access & Frequency MuSE, [9], [50]L0=>L1 Query-Revealed Occurrence Search & Access PHom-MuSE, [3], [7]

L0 Blind (Leakage) – [20]

TABLE I: Characterization of SSE schemes according to their leakage.

are returned by a query, which is leaked by deterministicidentifiers of the documents accessed. These patterns havebeen revealed by all SSE schemes to date [7], and havebeen shown to be necessary leakage for achieving practicalefficiency [39]. The first dynamic SSE schemes [27], [40] addi-tionally leaked update patterns with the update operation: theyresorted to deterministic update tokens, revealing if updatesshared contents with previous updates and queried documents.Nonetheless, update leakage has been solved in more recentdynamic schemes [7], [11], [26], [47], by making updates non-deterministic. If additionally updates leak no information atall, even when combined with previously issued queries, SSEschemes are said to be forward-private [7], [47].

The leakage described so far is characteristic of the mostsimple type of queries: exact-match searching. As we moveto more complex queries, including ranked search of textdocuments, images, and multimodal data in general, there isan additional data leakage that must be considered: frequencypatterns, i.e., how many times a keyword (or a similar conceptin other formats, e.g., a keypoint or a feature in images)appears in a document. This is a basic metric required forsupporting most forms of ranked search [35], and may beleaked by update or search operations. As such, it should alsobe modeled in the formal treatment of ranked SSE schemes.

Given the previous patterns, Table I provides a new frame-work that helps characterizing SSE schemes according to theirleakage. The framework is divided in different levels3, withthe top level being the least secure (i.e., leaks more data) andthe bottom the more secure (leaks less). L0 reveals nothingexcept basic information like the dataset size; it representsO-RAM based schemes. L0=>L1 represents typical exact-match SSE schemes on text data as a transitory level: atinitialization nothing is revealed ,as in L0, but with eachsearch some patterns are leaked (more precisely, search andaccess patterns), eventually leading to the equivalent fully-revealed L1 level for an adversary observing all operations(e.g., the cloud provider; nonetheless for an attacker observingonly data at rest (e.g., an internet hacker), these transitorylevels will always offer L0 security). This level also representsthe leakage of our PHom-MuSE scheme (which additionallysupports multimodal ranked searching). L0=>L2 representsranked SSE schemes (as is the case of our MuSE scheme)that additionally reveal frequency patterns with queries. L1represents exact-match schemes (on text data) that leak updatepatterns, fully revealing the occurrence of keywords even ifno queries are performed (we assume databases can startempty, with all data being added through updates, possibly

3Comparing to [10] we omit the leakage of document’s structure forsimplicity, since SSE schemes generally do not reveal it.

in batches). L2 represents schemes that also reveal frequencypatterns with updates and queries.

As in these schemes there is usually a tradeoff between se-curity and efficiency, Table I can also be used for categorizingSSE schemes according to their efficiency, where the top levelis the most efficient and the bottom is the least efficient.

III. TECHNICAL OVERVIEW

This section initiates the technical description of our pa-per. We start with some notations and concepts, followingwith an overview presentation of our system and adversarymodels. A multimodal object is a data object with multiple(different) media formats. A multimodal dataset is a collectionof multimodal objects. Multimodal features are distinctivecharacterizations of a data object in its different media formats:e.g., a document’s keywords compose its text features, whilean image’s visual keypoints compose its visual features.

Multimodal searching is the operation used to search amultimodal dataset with a query, where the query is itselfa multimodal object. Results of a multimodal search arereturned ordered by relevance (or similarity) to the query, andare usually obtained for each media format in separate andaggregated through a merging function [38].

Multimodal indexing consists in building dictionary-likestructures, one for each media format, that compactly describea dataset and where each entry represents a feature (keywordor similar concept in other formats) and its frequency in a dataobject. Indexing structures allow searching in time sub-linearwith the dataset size.

Multimodal training is an operation that is usually requiredin rich, highly dimensional media formats, including images,audio, and video. It consists in performing clustering op-erations (e.g., k-means [23]) on the dataset, particularly onthe referred formats, reducing the data’s dimensionality andallowing it to be more efficiently indexed. The result is acodebook structure [41] that assists indexing procedures.

A. System ModelFigure 1 represents the system model employed in this work.

We consider a client application and one cloud server, wherethe client is outsourcing the storage (and some computations)of his multimodal dataset to the server. We assume three mainoperations between the client and the server: Setup, Update,and Search. The cryptographic protocols subjacent to theseoperations will be formally specified and detailed in the nextsection, while for now we focus on describing a high leveloverview of the possible interactions between client and server.

The Setup operation initiates the system. The client startsby generating the system’s cryptographic keys. Then, for eachrich media format where training is required (images, audio,and video) the client trains an appropriate training dataset,

3

ClientApplication

CloudServer

1.Setup

1.3InitializeIndexingStructures

1.1Trainmediaformatsw/trainingdataset

1.2RequestSetup

3.Search

3.1Processdata-objectforsearching

2.2RequestBatchUpdate

3.3SearchinIndexingStructures

2.1Processdata-objectforupdating

2.Update

3.2RequestSearch

2.3UpdateIndexingStructures

Fig. 1: System model with the interactions between client and cloud server.

storing the resulting codebooks on his side. These will beused in Update and Search operations to allow an efficientindexing and retrieval of multimodal data, respectively. Finally,the client tells the server to initialize the system’s indexingstructures, one for each media format supported.

As implied by our minimalistic Setup, the client’s datasetis initially empty. This means that all data can be addeddynamically through the Update operation. When processing anew multimodal object for storage (or an existing one for up-date), the client starts by processing and extracting its relevantfeatures in each media format. In formats where training isrequired, these features are additionally clustered through therespective codebook. Then each feature is encrypted and sentto the server through a batch Update request. When the serverreceives an Update request, he stores the encrypted features inthe respective indexing structures.

The Search operation is performed in a similar fashion as theUpdate. Given a multimodal object as query, the client extractsits features in each media format, clustering them with therespective training codebooks if required. Then each featureis encrypted and the resulting query tokens are sent to theserver through a Search request. Query tokens are then usedby the server to accesses its indexing structures and calculatesearch results, which are returned to the client as a single setof object identifiers ordered by relevance to the query.

All communications between the client and the servermust be done through secure channels (e.g., TLS/SSL [28]),nonetheless we consider these details to be easily imple-mentable and orthogonal to the main scope of the paper.

B. Adversary ModelIn this work we consider as main adversary the cloud server,

i.e., the cloud provider company and any system administratorsworking for it that may have access to the client’s data andcomputations. As in previous works [7], we assume the cloudserver to operate in an honest but curious fashion: it is expectedto fulfill its contract agreements and not destroy or tamper withdata and computations, but may eavesdrop on their contents atwill without detection by the client. In more detail, the cloud

server keeps a log of all operations done and all informationleaked by them, and may resort to any other backgroundinformation available in order to learn the contents of boththe dataset stored and the performed queries.

A second important adversary that should also be consideredis the snapshot attacker, i.e., an adversary that does not havecontinuous access to the server but may gain that access fora limited time window and may perform a snapshot copy ofall stored data. This adversary represents the typical Internethacker. We informally argue that by addressing the cloudserver adversary, our approach is also implicitly addressingthis second adversary, since his capabilities are a subset of thefirst. Hence, we focus our security analysis on the honest-but-curious cloud adversary.

Data integrity, availability, and verifiability are also impor-tant issues in cloud-based applications. However these issuescan be easily addressed by combining existing techniques inthe literature [8], [29], [45], and hence we also find them tobe orthogonal to the main scope of the paper.

IV. DESIGNING A MULTIMODAL SSE SCHEME

In this section we detail the main contributions of ourpaper. We start by defining what is a Dynamic MultimodalSearchable Encryption scheme.

Definition 1 (Dynamic Multimodal Searchable Encryption).A Dynamic Multimodal Searchable Encryption schemeconsists of three protocols SETUP, UPDATE, AND SEARCHexecuted between a client and a server, such that:• SETUP(SETUPC(1λ,m, {{wji , f

ji }ni=0}mj=0)), SETUPS(m))

is the protocol used to initiate the scheme. The client takesas input the security parameter λ, the number of modalitiesm, and a training dataset {{wji , f

ji }ni=0}mj=0 (where wji is

a feature and f ji is its frequency in the object). It trainsthe modalities that require training and generates thecryptographic keys of the scheme, outputting these keys andthe codebooks resulting from the training step. The serveralso takes m as input and initializes the indexing structuresof each modality as empty, returning no outputs.• UPDATE(UPDATEC({{wji , id

ji , f

ji }ni=0}mj=0),UPDATES(

{{utji}ni=0}mj=0)) is a protocol between the client with input agroup of n features wji , object identifiers idji , and frequenciesf ji in m different modalities, and the server with input nupdate tokens utji in the same m modalities. The client buildseach utji as a function of wji , idji , and f ji , while the serveruses utji to update its indexing structures accordingly. Thisprotocol reflects a batch update of multiple features n indifferent modalities m, where each sub-update can representthe addition of a new feature w to a (also possibly new)object id, an update to the frequency f of an existing w inid, or the deletion of w from id (in which case f is zero).• SEARCH(SEARCHC({{wji , f

ji }ni=0}mj=0), SEARCHS(

{{stji}ni=0}mj=0)) is the protocol used to perform a multimodalsearch. The client takes as input a query object, representedas a collection of n features and their frequencies in mdifferent modalities ({{wji , f

ji }ni=0}mj=0). The server receives

4

the respective search tokens {{stji}ni=0}mj=0 and returns a setof object identifiers ordered by relevance to the query.

We now detail the operations of our MuSE scheme. Webegin by designing a scheme that only supports exact-matchsearching in text documents, expanding its usability by stepsuntil we achieve full multimodal ranked searching.

An Exact-Match Text Searching Scheme. Exact-matchsearching in text documents has been extensively researchedin the literature [7], [11], [27]. From the previous works, wefound the methodology of Cash et al. [11] for dynamic SSEto be one of the most efficient and promising for extension toricher queries. In this approach the client stores D, a dictionaryof counters where each unique keyword in the dataset (thekey) is mapped to a counter (the value) initiated at zero.Each counter increment represents a new object where thekeyword is being added to, and counter values are used duringupdates/searches to determine where to store/find keyword-document occurrences in the server’s index I . Index positionsin I (i.e., the counters) are encrypted with a Pseudo-RandomFunction (PRF) [28] and a key derived from the respectivekeyword, while index values (the documents’ ids) are en-crypted with a RCPA block-cipher encryption scheme (i.e.,a block cipher scheme resistant to Chosen-Plaintext Attacks[28]) and a second key derived from the keyword. Encryptedindex positions and values combined form an update token.

When searching with a query keyword the client derivesits two keys, as in the update protocol, and sends them tothe server. By applying a PRF (with the first key) to anincrementing counter value c, initiated at zero and stoppingwhen an empty index position is found, the server is able toefficiently find all relevant index positions. These are thendecrypted with the second key and returned to the client.

From Exact-Match to Ranked Searching. In ranked textsearching we need to store not only keyword-documentoccurrences, but also their frequencies. Frequency is the basisfor most ranked scoring functions, including the popularTF-IDF [35] (which we will be using in MuSE). Since bothinformations (occurrence and frequency) are closely related,we design our extended scheme to concatenate frequencieswith document ids and store their RCPA encryption as indexvalues. For calculating ranking scores, other repository widemetrics may still be required, including the dataset size (i.e.,number of objects) and keyword dataset size (i.e., number ofobjects containing the keyword), nonetheless these are usuallygeneral information that the server already has access to.

Dynamic Updates and Deletions. The scheme describedso far efficiently supports new additions of keywords todocuments. However supporting updates of existing keyword-document occurrences, including frequency updates and dele-tions, is still challenging. This is a side effect of the countersapproach, since when performing updates there is no way forthe client or server to know if the specified keyword already

exists in the object and where in the index is this informationstored. Searching for the keyword before updating the indexwould solve this problem, however it would also lead toadditional unnecessary leakage.

We foresee two solutions to this problem. The first consistsin incrementing keyword counters with all updates. Whensearching, only the most recent frequencies for each documentid (given by higher counter values) will be used. This solutionworks better for applications with few updates, as it willmake index size grow significantly. Since we expect dynamicSSE schemes to receive many updates, we devise a secondsolution that requires larger server storage at setup time, but noadditional storage will be required as updates are performed.

Our solution consists in dividing index storage in two datastructures. In the first index, IA, we map PRFs on keywordcounters to encrypted document ids. IA represents our previ-ous index and allows efficient searching through the countersapproach. In the second index, IU , we map PRFs on documentids to encrypted frequencies. IU allows efficient updates tokeyword frequencies and keyword deletions, without requiringknowledge of the respective index entries in IA.

In more detail our update protocol will now give the servertwo update tokens (per feature) as input, utA = (lA, dA) andutU = (lU , dU ), where the first represents our old tokens andis used on index IA, and the second represents our new tokens(mapping ids to frequencies) and is used on IU . The serverstarts by accessing IU with (lU , dU ). If there already existsan entry for it (meaning that this is an update or deletionof an existing frequency) then it stores the new encryptedfrequency (which will be 0 for deletions) and discards utA.Otherwise, besides storing dU in IU [lU ], it also stores dA inIA[lA]. Finally, the server outputs to the client a bit r, wherevalue 0 means that this operation was a new addition and value1 means it was an update to an existing frequency. The clientnow waits for this response before incrementing c, and onlyupdates it if r is 0.

The search protocol now also needs a second search tokenfor each feature, and accesses both indexing structures: firstthe server accesses IA with the old query token and then,after decrypting the document id fetched from IA, it accessesIU with it and decrypts the corresponding frequency.

Supporting Multimodality. So far we have an index approachthat efficiently supports the storage, update, and ranked search-ing of text data. If we can find similar index representationsin other media formats, extending our approach to multimodalsearching will be straightforward to achieve.

Image features of any kind, (e.g., from facial recognition tokeypoint detection [16]) can be clustered and represented as vi-sual words [41], allowing their efficient indexing in dictionarystructures as performed for text features. Similar approachescan be used for indexing audio [32] and video features [44]. Inthese approaches, a training phase is usually required beforeindexing is possible, which takes a training dataset as input andbuilds a codebook cbj for each modality that requires training.Hence we change the Setup operation to have the client

5

Setup(1λ,m, {{wji , fji }ni=0}mj=0)

Client:1: for j = 0..m do2: KA

j ,KUj

$←− {0, 1}λ3: Dj ← Init()cb← Train({{wji , f

ji }ni=0}mj=0)

4: Send m to the server.Server:

5: for j = 0..m do6: IAj , I

Uj ← Init()

Update({{wji , idji , f

ji }ni=0}mj=0)

Client:1: {{cwji , id

ji , cf

ji }ni=0}mj=0 ← Cluster(cb, {{wji , id

ji , f

ji }ni=0}mj=0)

2: ut← []3: for j = 0..m do4: utj ← []5: for i = 0..n do6: K1A ← F (KA

j , cwji ||1); K2A ← F (KA

j , cwji ||2)

7: K1U ← F (KUj , cw

ji ||1); K2U ← F (KU

j , cwji ||2)

8: c← Dj [cwji ]

9: if c = ⊥ then c← 010: lA ← F (K1A, c); dA ← Enc(K2A, idji )11: lU ← F (K1U , idji ); d

U ← Enc(K2U , cf ji )12: utj ← {lA, dA, lU , dU} : utj13: ut← utj : ut

14: Send ut to the server.Server:

15: R← []16: for all utj ∈ ut do17: Rj ← []18: for all {lA, dA, lU , dU} ∈ utj do19: if IUj [lU ] = ⊥ then20: IAj [l

A]← dA; r ← 021: else22: r ← 123: IUj [l

U ]← dU ; Rj ← r : Rj

24: R← Rj : R

25: Send (R) to the client.Client:

26: for all Rj ∈ R do27: for all r ∈ Rj do28: if r = 0 then Dj [w]← Dj [w] + 1

Search({{wji , fji }ni=0}mj=0)

Client:1: {{cwji , cf

ji }ni=0}mj=0 ← Cluster(cb, {{wji , f

ji }ni=0}mj=0)

2: st← []3: for j = 0..m do4: stj ← []5: for i = 0..n do6: K1A ← F (KA

j , cwji ||1)

7: K2A ← F (KAj , cw

ji ||2)


ji ||1)


ji ||2)

10: stj ← {cf ji ,K1A,K2A,K1U ,K2U} : stj11: st← stj : st

12: Send st,N to the server . N is the current dataset sizeServer:

13: R← []14: for all stj ∈ st do15: Rj ← Init()16: for all {fq,K1A,K2A,K1U ,K2U} ∈ stj do17: L← []18: c← 019: lA ← F (K1A, c)20: dA ← IAj [l

A]21: while dA 6= ⊥ do22: id← Dec(K2A, dA)23: lU ← F (K1U , id)24: dU ← IUj [l

U ]25: f ← Dec(K2U , dU )26: L← {id, f} : L27: c← c+ 128: lA ← F (K1A, c)29: dA ← IA[lA]

30: idf ← log( N|L| )

31: for all {id, f} ∈ L do32: tf-idf ← f × idf × fq33: if Rj [id] = ⊥ then Rj [id]← 0

34: Rj [id]← Rj [id] + tf-idf

35: Rj ← Sort(Rj)36: R← Rj : R

37: S ← ISR(R)38: Send S to the client

Fig. 2: The MuSE scheme, based on PRF F and RCPA scheme (Enc,Dec).

perform this training and storing cb = {cbj}mj=0. Moreover,in the Update and Search operations the client now starts byusing cb to cluster the inputed features (in media requiring thisstep) before indexing/searching them.

Finally, multimodal searching (i.e., search in multipleformats simultaneously) can be achieved by searching in eachformat in separate and merging results with a multimodalmerging function, such as logarithmic Inverse Square Rank(ISR) rank-fusion [37]. Figure 2 presents MuSE, our finalefficient dynamic multimodal scheme.

Security and Leakage Analysis. We now sketch a proofof security for MuSE. A full proof of security can be

found in the Appendix Section of this paper. Our securityanalysis follows the real/ideal simulation paradigm that isstandard in secure multi-party computations [28]. We defineL=(LStp,LUpd,LSrch) as a leakage function that captures allinformation MuSE is ideally allowed to leak. Intuitively Loutputs the following:

• The setup protocol leaks m, the number of distinctmodalities supported (i.e., LStp = m).

• Updates leak the type of each of its sub-updates (new ad-dition or frequency update, with deletions indistinguish-able from other updates). If the added/updated featureshave already been searched for, it also leaks the corre-sponding object identifiers and frequencies (i.e., LUpd =

6

{opi ∈ {add, upd}, {idi, fi} : idi ∈ LSrch}ni=0).• Search protocols leak the size N of the dataset and,

for all features w contained in a query object, theyalso leak search, access, and frequency patterns. Searchpatterns, i.e., if queries are being repeated, are due to thedeterministic nature of the search tokens used. Accesspatterns correspond to the set of object ids that containeach feature queried for. Frequency patterns means thataccess patterns not only include occurrences, but alsofrequencies (i.e., LSrch = N, IDw, {idi, fi}|R|i=0).

These leakage components, particularly search and accesspatterns, are unavoidable in efficient SSE and considered min-imal leakage [39]. Frequency patterns are additional leakagecharacteristic of ranked SSE schemes [18], nonetheless we willaddress them next in our PHom-MuSE scheme at the costof additional cryptographic overhead. Forward privacy (i.e.,preventing updates from revealing if they match contents withprevious queries) can be orthogonally addressed, as in [7], byintroducing a public-key scheme in the encryption of keywordcounters (we leave this as future work).

Non-adaptive security [15] follows if we can prove thatMuSE leaks nothing beyond what is specified in L. This proofrelies on F being a secure PRF and (Enc,Dec) being a RCPA-secure encryption scheme. Additionally if F is modeled as arandom oracle [5], adaptive security can also be proven andwe can state that:

Theorem 1. MuSE is correct and L-secure against adaptiveattacks.

The proof of this theorem can be found in the AppendixSection.

A. Multimodal SSE without Frequency LeakageAn issue with MuSE, that was not present in previous

exact-match SSE schemes, is the leakage of frequency patternswith search operations. To solve this problem we propose avariant of MuSE that addresses this leakage, at the cost ofincreased cryptographic overhead. Our proposal, called PHom-MuSE, is based on Partially Homomorphic cryptography, moreconcretely on an additively homomorphic scheme such as thePaillier cryptosystem [42].

We design PHom-MuSE through simple modifications toMuSE. In the Setup operation the client now additionallygenerates a private/public key pair for the Paillier scheme.Then, in update operations, we replace the RCPA encryptionof keyword frequencies (dU ← Enc(K2U , f), line 11 inthe update protocol, Figure 2) with their public-key Paillierencryption. Only the client, who has the private key, candecrypt these values.

Given the use of homomorphic encryption, when respondingto search operations the server can calculate search scoresthrough encrypted frequency additions (and multiplicationswith public parameters, which can be seen as a series ofhomomorphic additions). The result is the protection of bothfrequency values and final search scores. In more detail, inthe TF-IDF function, frequencies f will be encrypted and

multiplied by public parameters idf and fq (line 32 in theSearch protocol) and the resulting scores for the same objectid will be homomorphically added (line 34). However, sinceorder is not preserved by homomorphic encryption (line 35),the server can not sort search results and perform multimodalmerging (as was done in MuSE). It must now be the clientto perform these operations. The client performs this afterreceiving encrypted results from the server and decryptingthem with the Paillier private key.

We now define LPHom , the leakage that PHom-MuSE isideally allowed to reveal, as an iteration of our previousleakage function L for MuSE. In more detail, the only dif-ference between LPHom and L is that frequency patterns arenot revealed when performing search operations, nor whenadding/updating a feature that has already been searched.Furthermore, we can prove that:

Theorem 2. PHom-MuSE is correct and LPHom -secureagainst adaptive attacks.

The proof for this theorem can be sketched by extendingthe proof of Theorem 1. The Paillier cryptosystem is used asa black-box component, and PHom-MuSE involves no addi-tional security protocols. Hence, a simulator S can simulate allthe interactions in the protocol using the information it obtainsfrom LPHom . Correctness and security against adaptive attacksfollows in the random oracle model and assuming Paillier is acorrect and RCPA Additively-Homomorphic scheme. Detailsare straightforward and thus omitted.

V. IMPLEMENTATION

We implemented prototype versions of our MuSE andPHom-MuSE schemes. These prototypes will be used forexperimental evaluation in the next section, while for nowwe focus on describing their implementation. We focused ourprototypes on supporting multimodal data with text and imagemedia formats. All code was developed in C++, with a littleover 2000 lines of original code. Cryptographic computationswere implemented using the OpenSSL 1.0.2 library4. PRFswere implemented with an HMAC function, using SHA1 asthe underlying cryptographic hash function. The (Enc,Dec)RCPA encryption scheme was implemented with AES in CTRmode, using a 256-bit key. For the Paillier scheme, we usedthe the ACSC project5, with a 1024-bit key.

Algorithms for processing and indexing text data wereimplemented by us. Text feature extraction was performed firstby keyword stemming (Porter Stemming algorithm) and stop-words removal [35]. Indexing was done through the Single-Pass in Memory Indexing (SPIMI) algorithm and as indexingstructures we used the inverted list index approach [35]. Forprocessing and indexing images we used the OpenCV 2.4.10library6. For feature extraction, we used the SURF keypointdetection [4] and Dense Pyramid descriptor extraction [31]algorithms. As a rich media format, images need to be trainedbefore they can be efficiently indexed. We used hierarchical

4https://www.openssl.org/5http://acsc.cs.utexas.edu/libpaillier/6http://opencv.org/

7

0

500

1000

1500

2000

2500

3000

3500

4000

0 5000 10000 15000 20000 25000

Time(s)

DatasetSize(NrofObjects)

Encryption Networking Indexing Total

Fig. 3: Performance of MuSE in the Update operation.

k-means and the Bag of Visual Words model for this [41]. Forthis model we trained a codebook tree with height four andleaf width ten, resulting in ten thousand clusters.

Ranking of search results in each media format was done us-ing the TF-IDF function, as described in Figure 2. Ranked re-sults were then merged into the final multimodal search resultsthrough rank fusion, more concretely the logarithmic InverseSquare Rank (ISR) rank-fusion algorithm [37]. Finally, weremark that our schemes have a high flexibility of deploymentand configuration, meaning that the implementation describedis just one possible combination of different cryptographic andinformation retrieval techniques.

VI. EXPERIMENTAL EVALUATION

In this section we perform an experimental evaluation of theperformance and search precision of our MuSE and PHom-MuSE schemes, comparing them with the state of the art onencrypted multimodal search [17]. We conducted experimentsas follows: client implementations were executed in a Mac-book Pro with Mac OS X 10.13.1, 16GB of RAM, 500GBSSD disk, and 2.3Ghz quad-core Core i7 CPU; server imple-mentations were deployed in the Amazon AWS cloud, usingan EC2 m5.large instance. Communications were performedon a 5MB/s connection, with 49.932ms round-trip time. Weused the MIR-Flickr dataset [24] as a multimodal dataset with25000 objects composed of both image (users’ photos) andtext (photo tags) media formats.

A. Update PerformanceIn our first experiment we measured the performance of

MuSE and PHom-MuSE when executing the Update opera-tion. To this end, we performed batch updates with increasingnumbers of objects, ranging from 1 to 25000 objects, andmeasured the performance cost of each sub-operation involved:Encryption (i.e., cryptographic computations), Networking(communication between the client and server), and Indexing(feature extraction, data-structure accesses, and other indexingrelated computations). For convenience, total results for eachscheme are also exhibited. Figure 3 shows the results forMuSE and Figure 4 for PHom-MuSE. Table II further com-pares the two schemes with MIE [17], a similar approach fromthe literature. Results represent an average of five executions.

Starting with MuSE, Figure 3 shows that its Update op-eration can be very efficient. An update for a single objectexhibits a total performance cost of around 0.18 seconds, whilea batch update for 25000 objects can be performed in under

0

20000

40000

60000

80000

100000

120000

140000

160000

0 5000 10000 15000 20000 25000

Time(s)

DatasetSize(NrofObjects)


Fig. 4: Performance of PHom-MuSE in the Update operation.Scheme Encryption Networking Indexing Total

MIE [17] 16.38 292.18 349.71 658.27MuSE 80.86 295.36 347.83 724.05

Phom-MuSE 25596.22 2076.69 344.66 28017.57

TABLE II: Update performance comparison between MuSE, PHom-MuSE,and MIE [17], for a batch update of 5000 objects. All values are in seconds.

3700 seconds (a linear increase). Analysing the sub-operationsinvolved, we can observe that MuSE’s cryptographic compu-tations (Encryption in Figure 3) are very efficient. Moreover,Indexing computations are the biggest bottleneck in MuSE’supdate, followed by Networking. Networking overheads canfurther be improved with a faster network connection. Re-garding indexing computations, these include typical featureextraction and data-structure accesses that are also required inplaintext multimodal retrieval systems with no security guar-antees. This means that MuSE’s security properties actuallyadd very little overhead to overall system performance.

Analysing Figure 4, however, we can conclude that PHom-MuSE has a much higher cryptographic overhead. For a batchupdate of 25000 objects, PHom-MuSE has a cryptographicperformance cost of around 128000 seconds (35.5 hours), anincrease of around 277 times in comparison with MuSE’scryptographic cost of 461 seconds. This is due to the use ofpartially homomorphic encryption to prevent the leakage offrequency patterns, namely the Paillier scheme. Furthermore,networking in PHom-MuSE also exhibits an increased per-formance cost in comparison with MuSE, due to the higherciphertext expansion of the Paillier scheme. For an update of250000 objects MuSE has a networking cost of around 1500seconds, while PHom-MuSE exhibits a cost of 10383 seconds.

Table II also shows us that MuSE exhibits similar per-formance as the state of the art on encrypted multimodalretrieval [17] (MIE in Table II). The major difference betweenMuSE and the competing solution MIE lies in the encryptionoverhead, which can be explained by the better securityguarantees offered by MuSE (see Table I in Section II-A,where MuSE has leakage level L0=>L2 and MIE has L2).

B. Search PerformanceIn our second experiment we measured the performance of

MuSE and PHom-MuSE when executing Search operations,while also comparing with the competing alternative MIE[17]. To achieve this goal, we performed queries with asample query object (chosen at random from the dataset) andmeasured the performance of both schemes as we scaled thedataset size, from 1 to 25000 objects. Again we show total

8

0

5

10

15

20

25

30

0 5000 10000 15000 20000 25000

Time(s)

DatabaseSize(NrofObjects)


Fig. 5: Performance of MuSE in the Search operation.

0

20

40

60

80

100

120

140

160

0 5000 10000 15000 20000 25000

Time(s)

DatabaseSize(NrofObjects)


Fig. 6: Performance of PHom-MuSE in the Search operation.

performance costs for each scheme, as well as for each sub-operation (Encryption, Networking, and Indexing). Figure 5shows the results for MuSE, Figure 6 for PHom-MuSE, andTable III compares the two with MIE. Results represent anaverage of 50 independent executions.

Comparing total results from the two figures, we canobserve that MuSE is once again more efficient than PHom-MuSE (notice the difference in y-axis scale). For a dataset sizeof 25000 objects, MuSE exhibits a total search performancecost of around 28 seconds while PHom-MuSE requires around147 seconds to complete the same operation. This is mostlydue to the higher Encryption and Networking overheads ofPHom-MuSE, and is observable at all dataset sizes.

Analysing sub-operations in detail, we can observe that inboth schemes Encryption overhead has the highest impacton total performance cost. Moreover, this cost increases aswe scale the dataset size. This is a natural observation, asincreasing the dataset size means that increasingly more entrieshave to be accessed to resolve the same query. Indexingcomputations, in contrast, exhibit constant performance as weincrease the dataset size in both schemes, since most overheadhere comes from processing the query. The same effect can beobserved for Networking performance in MuSE, however inPHom-MuSE this performance cost increases with the datasetsize. This can be explained by the fact that while in MuSEthe server has access to the decrypted relevance scores of thequery and is able to sort them and return only the top k (inour experiments, we set k to 20), in PHom-MuSE the serveronly sees homomorphically encrypted scores and has to returnthem to the client for decryption and sorting.

Comparing with the competing alternative MIE [17] (Ta-ble III), we can observe that MuSE (and Phom-MuSE as well)has a higher total performance cost. As in the update operation,

Scheme Encryption Networking Indexing TotalMIE [17] 0.008 0.072 0.6 0.68

MuSE 4.64 0.001 0.55 5.191Phom-MuSE 16.38 11.63 1.56 29.57

TABLE III: Search performance of MuSE, PHom-MuSE, and MIE [17]), fora query on a repository with 5000 objects. All values are in seconds.

Plaintext MIE MuSE PHom-MuSEmAP (%) 57.938 57.562 57.965 57.881

TABLE IV: Mean Average Precision (mAP) for the Holidays dataset.

this is mostly due to Encryption overheads, and is the tradeofffor achieving better security guarantees. Nonetheless, sincemost of this overhead comes from server side computationsand our prototype implementation is single-threaded, we be-lieve it can be further reduced through parallelization ofcryptographic tasks.

C. Search PrecisionThe final experiment we conducted assessed the search

precision of our schemes, comparing it with a plaintext sys-tem without any security guarantees and with the competingalternative MIE from the literature [17]. Since the MIR-Flickrdataset, although a good choice for performance evaluation,did not contain a group of queries with relevance set thatwould allow us to assess precision, we used the Inria Holidaysdataset [25] for this experiment. The Holidays dataset containsan online evaluation package, consisting of 500 pre-chosenqueries and their expected results, allowing a transparent andindependent evaluation of precision results. This is an imageonly dataset, showing that our schemes do not affect queryprecision for this media format. Nonetheless, similar resultsare expected for other formats and multimodal searching.

Table IV shows the results obtained, with an average of 50independent executions. Results for the four approaches arevery similar, which can be explained by the fact that both ourschemes, as well as the competing alternative MIE, preservethe search precision of the indexing and searching algorithmsused. Furthermore, these results fundamentally represent theprecision of the indexing and searching algorithms used inour prototype implementations, meaning that they can possiblybe further improved by exploring other algorithms from thestate of the art in information retrieval, without impacting thesecurity guarantees of our schemes.

VII. CONCLUSION

In this paper we addressed the problem of multimodalsearchable encryption, allowing client applications to store,update, and search their multimodal data in remote cloudservers with privacy guarantees. We started by providing a newframework, based on an empirical analysis of the literatureon searchable encryption and its common leakage, that re-searchers and developers can use to characterize their schemesand better understand their security properties. Then we for-mally defined dynamic multimodal searchable encryption anddesigned two schemes supporting its functionality: an efficientscheme, called MuSE, that exhibits similar performance asprevious exact-match schemes for text data, but that leaksa new type of patterns which we call frequency patterns;

9

and a less efficient scheme (although still practical) basedon partially homomorphic encryption, called PHom-MuSE,that prevents the leakage of frequency patterns and providesthe same security properties as previous text exact-matchschemes, although greatly improving usability. We formallyevaluated the security of our schemes and implemented them.Using our prototype implementations we conducted a thoroughexperimental evaluation of performance and search precision.Results showed that both MuSE and PHom-MuSE exhibitpractical performance for real world deployments, makingdifferent tradeoffs between security and performance, andpreserving the search precision of the multimodal retrievalalgorithms used.

ACKNOWLEDGMENTS

This work was supported by FCT/MCTES through projectNOVA LINCS (UID/CEC/04516/2013) and the EU, throughproject LightKone (grant agreement no 732505).

REFERENCES

[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski,G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A view ofcloud computing. Communications of the ACM, 53(4):50–58, 2010.

[2] P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli.Multimodal fusion for multimedia analysis: A survey. MultimediaSystems, 16(6):345–379, 2010.

[3] F. Baldimtsi and O. Ohrimenko. Sorting and Searching Behind theCurtain. In Financial Cryptography - FC’15, 2015.

[4] H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded Up RobustFeatures. In ECCV’06, pages 404–417. Springer, 2006.

[5] M. Bellare and P. Rogaway. Random Oracles are Practical : A Paradigmfor Designing Efficient Protocols. In CCS’93, pages 1–20. ACM, 1993.

[6] A. Bessani, M. Correia, B. Quaresma, F. Andre, and P. Sousa. DepSKY:Dependable and Secure Storage in a Cloud-of-Clouds. ACM ToS, 9(4),2013.

[7] R. Bost. Sophos - Forward Secure Searchable Encryption. In CCS’16.ACM, 2016.

[8] M. Brandenburger, C. Cachin, and N. Knezevic. Don’t Trust theCloud, Verify: Integrity and Consistency for Cloud Object Stores. InSYSTOR’15. ACM, 2015.

[9] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou. Privacy-PreservingMulti-Keyword Ranked Search over Encrypted Cloud Data. IEEE TPDS,25(1):222–233, 2014.

[10] D. Cash, P. Grubbs, J. Perry, and T. Ristenpart. Leakage-Abuse AttacksAgainst Searchable Encryption. In CCS’15, pages 668–679. ACM, 2015.

[11] D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M. Rosu, andM. Steiner. Dynamic searchable encryption in very-large databases:Data structures and implementation. In NDSS’14, 2014.

[12] A. Chen. GCreep: Google Engineer Stalked Teens, Spied on Chats.Gawker. http://gawker.com/5637234, 2010.

[13] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka,and J. Molina. Controlling data in the cloud: outsourcing computationwithout outsourcing control. In CCSW’09, 2009.

[14] T. Cook. A Message to Our Customers. Apple. https://www.apple.com/customer-letter, 2016.

[15] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky. Searchable Sym-metric Encryption: Improved Definitions and Efficient Constructions. InCCS’06, 2006.

[16] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval. ACMComputing Surveys (CSUR), 40(2):1–60, 2008.

[17] B. Ferreira, J. Leitao, and H. Domingos. Multimodal Indexable Encryp-tion for Mobile Cloud-based Applications. In DSN’17. IEEE, 2017.

[18] B. Ferreira, J. Rodrigues, J. Leitao, and H. Domingos. Privacy-Preserving Content-Based Image Retrieval in the Cloud. In SRDS’15.IEEE, 2015.

[19] T. Frieden. VA will pay $20 million to settle lawsuit over stolen laptop’sdata. CNN. http://tinyurl.com/lg4os9m, 2009.

[20] S. Garg, P. Mohassel, and C. Papamanthou. TWORAM: efficientoblivious RAM in two rounds with applications to searchable encryption.In CRYPTO’16, pages 563–592. Springer, 2016.

[21] C. Gentry, S. Halevi, and N. P. Smart. Homomorphic evaluation of theAES circuit. In CRYPTO’12, pages 850–867. Springer, 2012.

[22] G. Greenwald and E. MacAskill. NSA Prism program taps in to userdata of Apple, Google and others. The Guardian, 2013.

[23] J. A. Hartigan. Clustering algorithms. Wiley, 1975.[24] M. J. Huiskes and M. S. Lew. The MIR Flickr Retrieval Evaluation. In

MIR’08, New York, NY, USA, 2008. ACM.[25] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak

geometric consistency for large scale image search. In ECCV’08.Springer, 2008.

[26] S. Kamara and C. Papamanthou. Parallel and dynamic searchablesymmetric encryption. In Financial Cryptography - FC’13, 2013.

[27] S. Kamara, C. Papamanthou, and T. Roeder. Dynamic searchablesymmetric encryption. In CCS’12, pages 965–976. ACM, 2012.

[28] J. Katz and Y. Lindell. Introduction to Modern Cryptography. CRCPRESS, 2007.

[29] B. H. Kim and D. Lie. Caelus: Verifying the Consistency of CloudServices with Battery-Powered Devices. In Proceedings of the 36thIEEE Symposium on Security and Privacy - S&P’15. IEEE, 2015.

[30] B. Lau, S. P. Chung, C. Song, Y. Jang, W. Lee, and A. Boldyreva.Mimesis Aegis: A Mimicry Privacy Shield-A System’s Approach to DataPrivacy on Public Cloud. In USENIX Security, pages 33–48, 2014.

[31] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatialpyramid matching for recognizing natural scene categories. In CVPR’06,volume 2, pages 2169–2178. IEEE, 2006.

[32] M. Levy and M. Sandler. Music information retrieval using social tagsand audio. IEEE Transactions on Multimedia, 11(3):383–395, 2009.

[33] D. Lewis. iCloud Data Breach: Hacking And Celebrity Photos. Forbes.https://tinyurl.com/nohznmr, 2014.

[34] W. Lu, A. Swaminathan, A. L. Varna, and M. Wu. Enabling Searchover Encrypted Multimedia Databases. IS&T/SPIE Electronic Imaging,7254, 2009.

[35] C. D. Manning, P. Raghavan, and H. Schutze. An Introduction toInformation Retrieval, volume 1. Cambridge University Press, 2009.

[36] M. Meeker. Internet Trends 2016. In Code Conference, 2016.[37] A. Mourao, F. Martins, and J. Magalhaes. NovaSearch at TREC

2013 Federated Web Search Track : Experiments with rank fusion. InTREC’13, 2013.

[38] A. Mourao, F. Martins, and J. Magalhaes. Multimodal medical infor-mation retrieval with unsupervised rank fusion. Computerized MedicalImaging and Graphics, May 2014.

[39] M. Naveed. The Fallacy of Composition of Oblivious RAM andSearchable Encryption. Technical report, Cryptology ePrint Archive,Report 2015/668, 2015.

[40] M. Naveed, M. Prabhakaran, and C. A. Gunter. Dynamic SearchableEncryption via Blind Storage. In S&P’14. IEEE, 2014.

[41] D. Nister, H. Stewenius, D. Nister, and H. Stewenius. Scalablerecognition with a vocabulary tree. In CVPR’06. IEEE, 2006.

[42] P. Paillier. Public-key cryptosystems based on composite degree resid-uosity classes. In EUROCRYPT’99, pages 223–238. IACR, 1999.

[43] D. Rushe. Google: don’t expect privacy when sending to Gmail. TheGuardian. http://tinyurl.com/kjga34x, 2013.

[44] P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor andits application to action recognition. In MM’07. ACM, 2007.

[45] A. Shraer, C. Cachin, A. Cidon, I. Keidar, Y. Michalevsky, andD. Shaket. Venus: Verification for untrusted cloud storage. In CCSW’10.ACM, 2010.

[46] D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searcheson encrypted data. In S&P’00, pages 44–55. IEEE, 2000.

[47] E. Stefanov, C. Papamanthou, and E. Shi. Practical Dynamic SearchableEncryption with Small Leakage. In NDSS’14, 2014.

[48] E. Stefanov, M. Van Dijk, E. Shi, and et al. Path oram: An extremelysimple oblivious ram protocol. In CCS’13, pages 299–310. ACM, 2013.

[49] C. Wang, N. Cao, K. Ren, and W. Lou. Enabling Secure and EfficientRanked Keyword Search over Outsourced Cloud Data. IEEE TPDS,23(8):1467–1479, aug 2012.

[50] Z. Xia, X. Wang, L. Zhang, Z. Qin, X. Sun, and K. Ren. A privacy-preserving and copy-deterrence content-based image retrieval scheme incloud computing. IEEE TIFS, 11(11):2594–2608, 2016.

10

APPENDIX - PROOF OF MUSE SECURITY

In this appendix we formally prove Theorem 1. Formally,L=(LStp, LUpd, LSrch) is a stateful party in an ideal securitygame, defined as follows:

Definition 2. Let∏

=(Setup,Update,Search) be a dynamicmultimodal SSE scheme and L a leakage function. Foralgorithms A and S, define the following games:

Real∏A (λ): The game runs K ← Setup() and gives A(1λ) a

timestamp t. Then A repeatedly invokes Update and Searchprotocols, picking client inputs in. The game responds byrunning Search or Update protocols with client input (K,in)and server input EDB (the encrypted dataset), giving thetranscript to A (the server is deterministic so this constitutesits entire view). Eventually A returns a bit used as the game’soutput.

Ideal∏A,S(λ): The game runs S(L()) and gives A(1λ) a

timestamp t. Then A repeatedly invokes Update and Searchprotocols, picking client inputs in. The game responds bygiving the output of L(in) to S, which outputs a simulatedtranscript that is given to A. Eventually A returns a bit usedby the game.∏

is L-secure against adaptive attacks if for all adversariesA there is a simulator S such that:

Pr [Real∏A (λ) = 1] - Pr [Ideal

∏A,S(λ) = 1] ≤ negl(λ)

Amongst its state L keeps: a set ID initialized to containall object identifiers in the dataset; and a list Q describingall operations issued so far, where an entry takes the form(i,op,. . . ), meaning an operation counter, an operation type,and then one or more inputs to the operation.

We define sp(w,Q), the search pattern of feature w withrespect to Q, to be the indices of operations that searched forw: sp(w,Q) = {j : (j, srch, w) ∈ Q}.

For object id, feature w, and frequency f, the add patternof id, w, f with respect to Q corresponds to the indices thatadded/updated (w,f ) to id:ap(w,id,f,Q) = {j : (j, add, w, id, f ) ∈ Q, w ∈Wid} ∪ {j : (j,updt, w, id, f )∈Q, w∈Wid}.

Finally, the add pattern of w with respect to Q and ID isthe set of all ids to which w was ever added, along with therespective frequencies and indices showing when it was added:AP(w,Q,ID) = {(id, f, ap(w,id,f,Q)) : id∈ID, ap(w,id,f,Q) 6= ∅}.

Intuitively, sp captures what we previously called searchleakage, while AP captures what we called access and fre-quency leakage. Given a set of setup, update, and searchoperations, L produces outputs as follows:• On initial setup, L initiates its state with i← 0, empty list

Q and empty set ID, outputting the number of modalitiesm.

• For a search operation on w, L appends (i,srch,w) to Qand increments i, outputting sp(w,Q), AP(w,Q,ID), andthe current size of the dataset N.

• For an addition/update operation (w,id,f ), L appends(i,add/updt,w,id,f ) to Q, adds id to ID, and incrementsi. It outputs sp(w,Q) and, if this is non-empty, it alsooutputs id and f.

We are now ready to prove Theorem 1.

Proof. We begin by proving correctness and security againstnon-adaptive attacks. Correctness follows as collisions be-tween the outputs of PRF F will only happen with negligibleprobability. Additionally when we model F as a random oracleH (for proving adaptive security), simulator S can programH so that its outputs are truly random and hence withoutcollisions.

Proving non-adaptive security implies showing that S, givenonly the leakage output of L, can produce the view of theserver and the two are indistinguishable except for a negligibleprobability in λ. Setup operations are easy to simulate. Sincewhen Setup is performed L only outputs a timestamp ofexecution and the number of modalities m, S can be triviallyshown to have the same view as the server.

To simulate search operations, S iterates over the log ofqueries choosing keys K1Ai ,K2Ai ,K1Ui ,K2Ui

$←− {0, 1}λ forthe i-th query. Then, for each id ∈ DB(wi), S computes lA,dA, lU , and dU as specified in the real experiment (but usingthe keys it chose instead), adding each group of labels to alist L. Additionally it creates a dataset γ with N entries, fillingit with simulated objects picked uniformly at random (if γalready existed, S adjusts its size with the new N).

To simulate update operations, S iterates over the logof adds/updates and decides for each group of labels(lA,dA,lU ,dU ) sent if it is supposed to be random (andmeaningless) or if the pair should be computed with one of thekeys used for search operations. It does this by using both theadd pattern leakage AP from the search queries and the leakagefrom update operations, which includes id and f if the keywordwas previously searched. Finally S adds the computed labels toL and, after processing all operations, it outputs the simulateddataset EDB = Create(γ,L).

A simple hybrid argument shows that the simulator’s outputis indistinguishable from the real server view. The first hybridshows that selecting each K1Ai ,K2Ai ,K1Ui ,K2Ui at randomis indistinguishable from deriving them from KA,KU , bythe PRF security of F. The next hybrid shows that thelabels lA, lU and ciphertexts dA, dU for un-queried featuresare pseudorandom, by the RCPA security of (Enc,Dec). Thisproves non-adaptive security.

Finally, security against adaptive attacks can be proven byhaving S program a random oracle H to model the behavior ofPRF F, outputting truly random labels in response to adaptivequeries. The only defects in this new simulation occur whenan adversary manages to query the random oracle with a keybefore it is revealed, which can be shown to happen withnegligible probability in λ.

11

muse: multimodal searchable encryption for cloud applications · 2018-07-23 · muse: multimodal...

Documents