ppsp icassp17v10

175
Privacy Preserving Speech Processing Gérard Chollet (IV & CNRS-IMT) Jean-Jacques Quisquater (UCL) Bhiksha Raj (CMU) 1 www.intelligentvoice.com ICASSP-2017

Upload: gerard-chollet

Post on 19-Mar-2017

60 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Ppsp icassp17v10

ICASSP-2017 1

Privacy Preserving Speech Processing

Gérard Chollet (IV & CNRS-IMT)Jean-Jacques Quisquater (UCL)

Bhiksha Raj (CMU)

www.intelligentvoice.com

Page 2: Ppsp icassp17v10

ICASSP-2017 2

Some Issues

• SIRI (or a hacker who breaks into SIRI) can– Use (edit) your voice recordings to impersonate you

– Learn about you• Your identity, gender, nationality (accent), emotional state..

– Track you from uploads / communications of voice recordings

• Nothing specific to SIRI, same issues with Google Now, Alexa, Cortana,…

• Not a futuristic scenario– Everytime you use your voice, you leave a print behind!!

Page 3: Ppsp icassp17v10

ICASSP-2017

More problems

• Doctors / Lawyers / Government agencies / Banks wish to use a cloud based speech recognition service– But can’t – HIPAA / CNIL / laws prevent them from exposing the data

• Speech data warehouses could be mined for useful market patterns– But the audio also contains recordings of people reciting their credit card

numbers, social security numbers etc..3

Speech Recognition System

text

Page 4: Ppsp icassp17v10

ICASSP-2017 4

Opinions

www.intelligentvoice.com

How happy are we about living our lives in the thrall of the big harvesters of data, the Googles, the Amazons, the Apples?

Source:We live in the Big Cloud: And we hate it… Is it time for Hipster IT?Nigel Cannings, CTO Intelligent Voicehttps://hackernoon.com/we-live-in-the-big-cloud-and-we-hate-it-is-it-time-for-hipster-it-1f130a44d2b8#.tqb3xnsdl https://surveys.google.com/reporting/survey?survey=fwn6wwimqoqkezlhur3zrdh2oe

Page 5: Ppsp icassp17v10

ICASSP-2017

The Problem• Security: NSA/GCHQ must monitor calls for public safety

– Caller may be a known miscreant / terrorist– Call may relate to planning subversive activity

• The gist of the problem:– NSA is possibly looking for key words or phrases

• Did we hear “bomb the pentagon”??

– Or if some key people are calling in• Was that Ayman al Zawahiri’s voice?

• But must have access to all audio to do so– Including recordings by perfectly innocent people

5

Page 6: Ppsp icassp17v10

ICASSP-2017

The NSA Problem as a Metaphor• Telephone company unwilling to expose

audio to NSA– May provide encrypted data to NSA

• NSA cannot expose what it is trying to find to the telephone company– May provide it in encrypted form though

6

Page 7: Ppsp icassp17v10

ICASSP-2017

Abstracting the problem

• Data holder willing to provide encrypted data– A locked box

• Mining entity willing to provide encrypted keywords– Sealed packages

• Must find if keywords occur in data– Find if contents of sealed packages are also present in the locked box

• Without unsealing packages or opening the box!

• Data are spoken audio7

Page 8: Ppsp icassp17v10

ICASSP-2017 8

Outline of the tutorial

• Section 1: INTRODUCTION: PRIVACY AND SPEECH (10 mins)

• Section 2: CRYPTO AND LSH TOOLS (80 mins)• (Coffee break)• Section 3: PRIVACY FOR SPEECH (80 mins)• Section 4: CONCLUSIONS and DISCUSSIONS (10 mins)

Page 9: Ppsp icassp17v10

ICASSP-2017 9

INTRODUCTIONPRIVACY AND SPEECH

Section 1

Page 10: Ppsp icassp17v10

ICASSP-2017 10

Introduction : Privacy and Speech

• The problem of privacy.• Speech applications• Privacy issues in speech applications.

Page 11: Ppsp icassp17v10

ICASSP-2017 11

Current market targets for IV Intelligent Voice® takes your company's phone calls (and email and IM) and turns them into smart data

Page 12: Ppsp icassp17v10

ICASSP-2017 12

Our motivation

• Clients do not want to maintain HW and SW !

• Sensitive Data• GPU hardware

expensive• Cloud cheap

Page 13: Ppsp icassp17v10

ICASSP-2017 13

Privacy Preserving Speech to Text

Intelligent Voice

Client

KEY

2) E(data,f) 1) f (lex, AM, LM, Search)

3) E(f(data))

Cloud

1) The Client receives f = (Software and Models) from IntelligentVoice

2) The Client encrypts automaticallywith his private KEY both f andhis data

3) The Cloud returns the encrypted result

4) The Client decrypts the encryptedresult using his private KEY

Page 14: Ppsp icassp17v10

ICASSP-2017 14

TOPICS IN SPEECH PROCESSING

IDENTITYPHONETICS

RECOGNITION

GRAPHEMESTO PHONEMES

SYNTHESIS

ANALYSIS COMPRESSION and STORAGE

RESTITUTION

CODING and TRANSMISSION

Bla-blabla … Bla-bla

LANGUAGEMESSAGE

EMOTIONS

Page 15: Ppsp icassp17v10

ICASSP-2017 15

Speaker Identification

Speaker Verification

Speech RecognitionYou said

”hello, world!”

A Brief Primer on Speech Processing Tasks

Which one ofAlice, Bob, Carol, Dave, … are you?

(multi-class)

Are you really Bob? Yes/No

(binary)

• All are pattern classification tasks• Not addressing secure communication of speech (much literature on this topic).

Biometrics

Page 16: Ppsp icassp17v10

ICASSP-2017

Automatic Speech Processing Technologies

• Lexical content comprehension– Recognition

• Determining the sequence of words that was spoken

– Keyterm spotting• Detecting if specified terms have occurred in a recording

• Biometrics and non-lexical content recognition– Identification

• Identifying the speaker

– Verification/Authentication• Confirming that a speaker is who he/she claims to be

• All of these involve statistical pattern classification16

Page 17: Ppsp icassp17v10

ICASSP-2017 17

Biometric Applications : Speaker ID

• This biometric application deal with determining the identity of the speaker.

• Here, the set C is a closed set of candidate speakers for a recording

• In open set speaker id, speaker verification needs to be performed on the selected candidate.

?

t

CCtC XPC );(logmaxargˆC

Page 18: Ppsp icassp17v10

ICASSP-2017 18

Biometric Applications: Speaker ID• C is a set of “candidate” speakers for a

recording– Parameters of their models are learned from data

for the speaker

• The set C may include a “Universal” speaker representing the “none-of-the-above” option– The parameters U for the universal speaker are

learned using data from many speakers

– U is often called a Universal Background Model

Page 19: Ppsp icassp17v10

ICASSP-2017 19

Biometric Applications: Speaker Verification

• A user claims an identity S with some data D• System must confirm if the user is who he claims to be• C consists of S and universal speaker U

– The parameters S for speaker S are obtained by adapting

U to data from the speaker S

• Decision is based on comparing the likelihoods that D could have been generated by S or U

Page 20: Ppsp icassp17v10

ICASSP-2017 20

Feature Computation

• Do not work on speech signal– Work on sequence of feature vectors computed from speech

• E.g. MFCC vector sequence

• “speech recording “ sequence of feature vectors derived from it– X is actually a sequence of feature vectors

• X = [X0 X1 … XT-1 ]

• For the privacy-preserving frameworks we will assume that the user’s client device can compute these features.

Page 21: Ppsp icassp17v10

ICASSP-2017 21

Learning model parameters

• GMM parameters:– Adapting Bob’s GMM to Alice’s data

• Only adapt means

– Outcome: Bob gets Encrypted means mk for each Gaussian

– (Pathak and Raj, Interspeech 2011)

• HMM parameters– Similarly complicated– (Smaragdis and Shashanka, IEEE TASLP, May 07)

Page 22: Ppsp icassp17v10

• “Probabilistic function of a Markov chain”

• A dynamical system for time-varying processes

A More Complex Model: Hidden Markov Models

22ICASSP-2017

Page 23: Ppsp icassp17v10

ICASSP-2017 23

Three Basic HMM Problems

• What is the probability that it will generate a specific observation sequence

• What is the most probable state sequence, for a given observation sequence– The state segmentation problem

• How do we learn the parameters of the HMM from observation sequences

Page 24: Ppsp icassp17v10

ICASSP-2017 24

CRYPTOGRAPHIC TOOLSSection 2

Page 25: Ppsp icassp17v10

ICASSP-2017 25

Cryptography• What is secure computation (JJQ)• Homomorphic Encryption (JJQ)

– What is encryption. Public key and symmetric encryption.– Homomorphic encryption.

• Partially homomorphic encryption• Fully homomorphic encryption• State of the art and limitations

• Secure multiparty computation (JJQ)– Basic ideas– Garbled circuits– Secure protocols– Obfuscation– Secret sharing

• Zero Knowledge proofs. (JJQ)

Page 26: Ppsp icassp17v10

ICASSP-2017 26

Section 2.1

• What is secure computation?• Homomorphic Encryption• Secure Multiparty Computation• Zero Knowledge proofs

Page 27: Ppsp icassp17v10

ICASSP-2017 27

What is Secure (multi-party) Computation

The goal of creating methods for (2 or more) parties to jointly compute a function over their inputs while keeping those inputs private.

Page 28: Ppsp icassp17v10

ICASSP-2017 28

Cryptography Basics

Encryption EK1(.)

Plaintext (M) Ciphertext (C)

Encryption Key (K1)

EK1(M) = C

A Good Cryptosystem – all the security inherent in the knowledge of keys, and none in the knowledge of algorithms (Kerkhoffs principle)

Decryption DK2(.)

Ciphertext (C)OriginalPlaintext (M)

Decryption Key (K2)

DK2(C) = M Lossless transformation!

Page 29: Ppsp icassp17v10

ICASSP-2017 29

Cryptography BasicsSymmetric Cryptosystem (same key)

EncryptHello! t4$We9 Decrypt Hello!

Client System

=In general

Page 30: Ppsp icassp17v10

ICASSP-2017 30

Cryptography BasicsPublic-key (asymmetric) Cryptosystem

First described in(Diffie and Hellman, 1976)

EncryptHello! t4$We9 Decrypt Hello!

Client System

Public key exchange

Anybody can Encrypt!

Only the receiver can decrypt!

Page 31: Ppsp icassp17v10

ICASSP-2017 31

Encryption Not Invertible (without key)

Semantic Security (without key)

Ciphertexts jointly uninformative (with high proba no info)

1 2 1 2

Properties of Ideal Cryptosystem

Page 32: Ppsp icassp17v10

ICASSP-2017 32

Jointly Uninformative: Mutual Information

• Mutual information between any pair/collection of ciphertext messages is 0• Semantic security model: Mutual information between two separate encryptions of the

same message is 0!! We cannot distinguish which ciphertext is coming from a given plaintext.

Mutual Information between ciphertexts

Distance between plaintexts

Comparing directly ciphertexts is

USELESS!!

Page 33: Ppsp icassp17v10

ICASSP-2017 33

Section 2.2

• What is secure computation?• Homomorphic Encryption• Secure Multiparty Computation• Zero Knowledge proofs

Page 34: Ppsp icassp17v10

ICASSP-2017 34

Homomorphic Encryption

Allows for operations to be performed on ciphertexts without requiring knowledge of corresponding plaintexts

First idea based on RSA. Need to be careful about the use.

Page 35: Ppsp icassp17v10

ICASSP-2017 35

Homomorphism Example: RSAPublic key encryption scheme (Rivest, Shamir, Adelman ‘77)

Based on number theory and modulo n operations.

Homomorphic multiplication

• Cannot perform simple addition, however.• It is not Semantically Secure

gXXE ][

][)(][][ XYEXYYXYEXE ggg

Page 36: Ppsp icassp17v10

ICASSP-2017 36

Homomorphism Example: PaillierPublic key encryption scheme (Pascal Paillier, Eurocrypt 99).

• Homomorphic addition (useful for counting: application to vote)– Encrypted numbers can be added together.– Encrypted numbers can be added to non encrypted scalars.

• Homomorphic multiplication:– Encrypted numbers can be multiplied by a non encrypted scalar.

• Cannot multiply two encrypted numbers (Partially homomorphic)• It is Semantically Secure!

XgXE ][

][][][ YXEgggYEXE YXYX

][][ XYEggXE XYYXY

Page 37: Ppsp icassp17v10

ICASSP-2017 37

A fully homomorphic encryption = FHE(addition and multiplication)

• Trivial (only for illustration) example: Encoding a bit

– rb is a random integer

– kb is some M-bit integer– p is a very large, odd L-bit number, L >> M.

• p is the key

• To decode:.

Page 38: Ppsp icassp17v10

ICASSP-2017 38

A fully homomorphic scheme(continued)

• .

• .

• Supports both addition and multiplication in the encrypted domain

Page 39: Ppsp icassp17v10

ICASSP-2017 39

FHE and computation• All computations can be expressed as a combination of multiplications

and additions, ie, we can work with Polynomials

• An encryption scheme that permits both addition and multiplication in the encrypted domain permits computation of any function in the encrypted domain

• FHE schemes typically expressed in terms of homomorphic computation of NAND gates (an universal gate)– A NAND B = 1 – A.B [Requires addition and multiplication]– All computations can be expressed as circuits, which can be expressed as

combinations of NAND gates

Page 40: Ppsp icassp17v10

ICASSP-2017 40

Utility of Homomorphic Encryption

x

f()

f(x) = ?? I can evaluate f(.) as a service

E [x]

E [f(x)]

Public key exchange

f(x)

Page 41: Ppsp icassp17v10

ICASSP-2017 41

Homomorphic Encryption: Problems

• The “noise” doubles in size after each bit-product operation– Noise quickly becomes greater than p

• Decryption (which requires “mod p”) fails• Even in this trivial symmetric key toy system

• Each bit-level operation now requires an operation over L bits– L is size of p in bits, and must be large

M bits M bits

2M bits

Page 42: Ppsp icassp17v10

ICASSP-2017 42

Fully Homomorphic Encryption (FHE)

• Unclear whether fully homomorphic schemes were even possible until 2009

• Breakthrough work by Craig Gentry (2009, 2010)– Solves noise problem

• Still not very practical but an active area of research– Individual bit-level computations still take too much time– Computations 100,000x to 1,000,000,000,000x slower than

unencrypted computation

Page 43: Ppsp icassp17v10

ICASSP-2017 43

Section 2.3

• What is secure computation?• Homomorphic Encryption• Secure Multiparty Computation• Zero Knowledge proofs

Page 44: Ppsp icassp17v10

ICASSP-2017 44

Secure Multiparty Computation (SMC)

• A group of untrusting parties desire to compute a joint function of their private data

• “Ideal” situation: All of them send their data to a trusted third party– Who computes the function

and only reveals results

Page 45: Ppsp icassp17v10

ICASSP-2017 45

Practical SMC (no possible leak)

• Parties communicate directly with one another following specified protocols

• Outcome ideally identical to “ideal” case– Function computed without revealing data

• Protocol: A sequence of steps, involving two or more parties, to accomplish a computational task

Page 46: Ppsp icassp17v10

ICASSP-2017 46

Practical SMC• Employs many tools:

– Conventional encryption– Partially homomorphic encryption– Oblivious transfer

• Select one of N items without revealing which one– Noise masking

• Hiding information by adding random noise– Secret sharing

• Share data between M people so that at least K of them must collaborate to expose it

Page 47: Ppsp icassp17v10

ICASSP-2017 47

SMC : A Trivial Example

• A group of people want to find out their average salary– Without revealing individual salaries

Page 48: Ppsp icassp17v10

ICASSP-2017 48

SMC: A Trivial Example (problems!)

• A group wants to calculate the average salary

x1

(x1 + r)

Chooses a random rAdds his/her salary

xi

(x1 + … + xi-1 + r) (x1 + … + xi + r)

Adds his/her salary

xn

(x1 + … + xn +r)

rSubtracts the random number rAdditive masking

(x1 + … + xn +r)

Page 49: Ppsp icassp17v10

ICASSP-2017 49

Secure Two-Party Computation

• Originally introduced by Yao in 1986• Express all computation as a circuit

– Alice has a circuit with some inputs– Bob has other inputs– Alice and Bob must compute circuit collaboratively without exposing their

inputs to one another

• Garbled circuit– Can do this through a combination of oblivious transfer (OT) and

symmetric encryption

I have a, and cI want the output f

I have b

Page 50: Ppsp icassp17v10

ICASSP-2017 50

Secure Two-Party Computation

• Define a number of “primitive” operations that Alice and Bob can perform together– Without revealing their respective inputs– By following standard protocols

• Decompose overall computation in terms of these primitives

• Collaboratively compute the chain of primitives for the final output

Page 51: Ppsp icassp17v10

ICASSP-2017 51

Secure Inner Product Protocol

• Alice has a vector X• Bob has a vector Y• They wish to compute <X,Y> without exposing

their vectors to one another

I have Y I have X

We want <X.Y>

Page 52: Ppsp icassp17v10

ICASSP-2017 52

Secure Inner Product

• Alice generates public and private keys. She sends the public key Ke to Bob

• She encrypts her vector using her public key and sends it to Bob:– Alice Bob : Enc[X] = Enc[X[0]], Enc[X[2]], … Enc[X[N]]

• Bob homomorphically multiplies Y[i] with X[i] Enc[X[i]]Y[i] = Enc[X[i]Y[i]]

• He homomorphicaly adds the sample-wise products

Pi Enc[X[i]Y[i]] = Enc [Si X[i]Y[i]] = Enc[<X,Y>]

I have Y I have XKe, E[X]

E[<X,Y>]

Bob has E[<X,Y>]. He never saw X

I Use: PaillierE[x] a Kx

E[x]E[y] = E[x+y]E[x]y = E[xy]

Page 53: Ppsp icassp17v10

ICASSP-2017 53

Secure Inner Product

• Bob has Enc[<X,Y>]• Bob homomorphically subtracts a random number r to get Enc[<X,Y> - r]

– And sends this to Alice who decrypts– Alice gets <X,Y> - r. Bob has r– They must add their answers to get <X,Y>.

• They have additive shares of the answer

I have E[<X,Y>] and Ke

I have r I have <X,Y> - r

Page 54: Ppsp icassp17v10

ICASSP-2017 54

Primitive: Secure Inner Product (SIP)

• SIP: Alice has vector X. Bob has Y.

• Outcome:– Bob and Alice get additive

shares rA and rB which must be

added to get the answer

• Alternately, either Alice or Bob gets the entire answer

Y X

rB rA

E[<X,Y>]

rA+rB =<X,Y>

Page 55: Ppsp icassp17v10

ICASSP-2017 55

SMC Primitives• General format: Computing simple function

f(X,Y) of Alice’s private data X

and Bob’s private data Y

• One of the following outcomes:

– Both parties get random additive shares of the

result

• Alice gets rA, Bob gets rB

• Actual result f(X,Y) = rA+rB

– One party gets encrypted result Enc[f(X,Y)]

– One party gets the complete result f(X,Y)

Y X

rB rA

E[f(X,Y)]

f(X,Y)

rA+rB =f(X,Y)

Page 56: Ppsp icassp17v10

ICASSP-2017 56

Examples of Primitives

• Secure inner product– f(X,Y) = <X,Y>– Also possible if Bob has E[Y]

• Secure max– f(X,Y) = maxi (Xi + Yi)

• Secure max-ID– f(X,Y) = argmaxi Xi + Yi

• Several such primitives can be defined

Y X

rB rA

E[f(X,Y)]

f(X,Y)

rA+rB =f(X,Y)

Page 57: Ppsp icassp17v10

ICASSP-2017 57

Computing Complex functions with SMC Primitives

• Conventional computation: User Alice sends data to system Bob• Bob computes an algorithm

• SMC: Computation recast as a sequence of primitives• Alice and Bob compute primitives via SMC• Bob gets the result

ALGORITHMBOB

BOB

ALICE

BOB

Page 58: Ppsp icassp17v10

ICASSP-2017 58

Typical Assumptions

• Parties are semi-honest, i.e. honest-but-curious– The party tries to get as much information from the result and

outputs of intermediate steps

– However, the party does not act maliciously (e.g. by lying about the inputs used)

• They follow the protocol correctly

• Can be “fixed” through “zero-knowledge proofs”– “Expensive” protocols that verify answers without knowing the

answer

Page 59: Ppsp icassp17v10

ICASSP-2017 59

Section 2.4

• What is secure computation?• Homomorphic Encryption• Secure Multiparty Computation• Zero Knowledge proofs

Page 60: Ppsp icassp17v10

ICASSP-2017 60

Zero Knowledge Proofs (ZKPs)

• ZKP : – “Prover” has some information– “Verifier” wants to ensure she has it– But Prover will not reveal information to Verifier– She can use ZKPs to convince Verifier

Page 61: Ppsp icassp17v10

ICASSP-2017 61

Zero Knowledge Proofs (ZKPs)

• Peggy has a magic word to open a secret door in a cave

• Victor wants to pay for the secret, but not until he’s sure she knows it

• Peggy will tell the secret but not until she receives the money

Quisquater et al. ’89, figure from Wikipedia

Page 62: Ppsp icassp17v10

ICASSP-2017 62

Zero Knowledge Proofs (ZKPs)

Victor Peggy

I have its solution!

Page 63: Ppsp icassp17v10

ICASSP-2017 63

Zero Knowledge Proofs (ZKPs)

Peggy

Peggy goes to a different room than Victor and chooses a random permutation σ of {1,…,9}

Page 64: Ppsp icassp17v10

ICASSP-2017 64

Zero Knowledge Proofs (ZKPs)

Victor Peggy

Victor can:1. Choose one of the rows.2. Choose one of the columns.3. Choose one of the sub-boxes.4. See the permuted version of the original puzzle.

Page 65: Ppsp icassp17v10

ICASSP-2017 65

Zero Knowledge Proofs (ZKPs)• Assume that Peggy’s information is a solution to a hard problem

• Peggy converts her problem to an isomorphic one• Peggy solves the new problem and commits answer• Peggy reveals the new instance to Victor• Victor asks Peggy either to

– prove the instances are isomorphic; or– open the committed answer and prove it’s a solution

• Repeat n times

• Typical hard problems: finding graph isomorphisms or Hamiltonian cycles (NP-complete problems, but …)

Page 66: Ppsp icassp17v10

ICASSP-2017

A Musical Conundrum

• Alice has just found a short piece of music on the web– Possibly from an illegal site!

• She likes it. She would like to find out the name of the song

66

Page 67: Ppsp icassp17v10

ICASSP-2017

Alice and her song

• Bob has a large, organized catalogue of songs

• Simple solution:– Alice sends her song snippet to Bob– Bob matches it against his catalogue– Returns the ID and metadata of the song that has the best

match to the snippet

67

Page 68: Ppsp icassp17v10

ICASSP-2017

Alice has a problem

• Her snippet may have been illegally downloaded• She may go to jail if Bob sees it

– Bob may be the DRM police..

68

Page 69: Ppsp icassp17v10

ICASSP-2017

An Unacceptable Solution• Alice distrusts Bob

– So…

• Bob could send his catalogue to Alice to do the matching herself..– Really??– Bob’s catalogue is his IP.– Alice may be a competitor

• Or a malicious person wanting to expose Bob’s catalogue

• Bob distrusts Alice– Will not send her his catalogue

69

Page 70: Ppsp icassp17v10

ICASSP-2017

Solving Alice’s Problem

• Alice could encrypt her snippet and send it to Bob• Bob could work entirely on the encrypted data

– And obtain an encrypted result to return to Alice!• A job for Secure Multi-party Computation

70

Page 71: Ppsp icassp17v10

ICASSP-2017

Lessons Learned• Possible to perform complex collaborative

operations without revealing information!– Through careful use of cryptographic tools

• Illustrates a few concepts– Homomorphic encryption– SMC– Oblivious Transfer– Primitives

71

Page 72: Ppsp icassp17v10

ICASSP-2017 72

COFFEE BREAK!

Page 73: Ppsp icassp17v10

ICASSP-2017 73

Dealing with Speech• Speech applications. (GC)• General formalism (GC)• Speaker verification and diarization (GC)• Recognition (GC)

– Isolated word recognition and keyword spotting– Large Vocabulary Continuous Speech recognition

• CNN, RNN, BLSTM, Sequence to Sequence, Attention– Spoken Dialog Systems, Speech to Speech translation

• Computation with privacy (BR)– Tools– Application to problems in biometrics– Application to problems in recognition– Application to retrieval– Computational and practical challenges– Hashing based solutions

Page 74: Ppsp icassp17v10

ICASSP-2017 74

Motivations for DNN approaches

Traditional ASR involves many hand-crafted stages of feature extraction: mfccs, deltas, delta-deltas, cmvns, …

Gaussian Mixture Models, i-Vectors, …

Hidden Markov Models

Weighted Finite State Automata

Viterbi, Expectation Maximisation, Baum Welch, …

There are many steps requiring extensive insight and expertise

New end-to-end deep learning approaches making significant simplifications to this work flow

Typically making use of DNNs, RNNs, and …

Computational power is simplifying this process

Traditional Speech Processing such as ASR is complicated

3

Page 75: Ppsp icassp17v10

ICASSP-2017 75

4Convolution Networks: Brief History

Fukushima, Kunihiko, ‘Neocognitron: A Self-organizing Neural Network Model for a Mechanism

of Pattern Recognition Unaffected by Shift in Position,’ Biological Cybernetics 36 (4): 193-202,

1980

LeNet 5 (1998), image source: http://yann.lecun.com/exdb/lenet/

Inspired from receptive fields in the visual cortex

Notable Implementations:

Fukushima’s NeoCognitron

Explicit parallel implementations

LeCun’s LeNet-5

Ciresan’s GPU Implementation

GoogLeNet

(1980)

(1988)

(1998)

(2011)

(2014)

www.intelligentvoice.com

Page 76: Ppsp icassp17v10

ICASSP-2017 76

5 GoogLeNetState of the Art winner of the ImageNet 2014 competition: classifying 1.2M images into 1K classes

Convolution neural network inspired by LeCun’s LeNet-5

Has 9 ‘Inception’ modules, multiple convolution sizes, and pooling in each module

Stochastic Gradient Descent used to train the network with ‘dropout’ which helps prevents overfitting

Szegedy, ‘Going deeper with convolutions,’ arXiv, 2014

www.intelligentvoice.com

Page 77: Ppsp icassp17v10

ICASSP-2017 77

6 GoogLeNet StructureTopology consists of ‘Inception’ modules consisting of:

Convolutions – Filters for extracting features, filter size tends to be small in the early layers, bigger in later layers

Pooling – dimensionality reduction

Softmax loss for predicting classes at 3 progressive stages of the network

Other – concatenations for combining convolutions

‘Rinse and Repeat’ 9 times

www.intelligentvoice.com

Page 78: Ppsp icassp17v10

ICASSP-2017 78

7 NIST LRE Competition6 Language clusters, 20 dialects:

Arabic (Egyptian, Iraqi, Levantine, Maghrebi, Modern Standard)

Chinese (Cantonese, Mandarin, Min, Wu)

English (British, General American, Indian)

French (West African, Haitian Creole)

Iberian (Caribbean Spanish, European Spanish, Latin American Spanish, Brazilian Portuguese)

Slavic (Polish, Russian)

500+ hours audio

data set very unbalanced

2015 NIST Language Recognition Evaluation,http://www.nist.gov/itl/iad/lre15.cfm

www.intelligentvoice.com

Page 79: Ppsp icassp17v10

ICASSP-2017 79

8

RASTA CuFFT

MATLABSOX PYTHON

Spectrogram Convolution Network

Based on Nvidia’s Digits implementation

of GoogLeNet

Converted speech to 256x256 pixel spectrograms

Tried different spectral representations and

coding…

256x

256

www.intelligentvoice.com

Page 80: Ppsp icassp17v10

ICASSP-2017 80

10GoogLeNet Processing

www.intelligentvoice.com

Page 81: Ppsp icassp17v10

ICASSP-2017 81

11GoogLeNet Processing

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

www.intelligentvoice.com

Page 82: Ppsp icassp17v10

ICASSP-2017 82

12 GoogLeNet Processing

Apply convolutions to extract primitives

such as edges

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

www.intelligentvoice.com

Page 83: Ppsp icassp17v10

ICASSP-2017 83

13GoogLeNet Processing

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

Apply convolutions to extract primitives

such as edges

Object parts extracted

www.intelligentvoice.com

Page 84: Ppsp icassp17v10

ICASSP-2017 84

14GoogLeNet Processing

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

Apply convolutions to extract primitives

such as edges

Object parts extracted

Full Spectral Features,e.g. phones, words

www.intelligentvoice.com

Page 85: Ppsp icassp17v10

ICASSP-2017 85

15GoogLeNet Processing

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

Apply convolutions to extract primitives

such as edges

Object parts extracted

Full Spectral Features,e.g. phones, words

Refinement of accuracy

www.intelligentvoice.com

Page 86: Ppsp icassp17v10

ICASSP-2017 86

16GoogLeNet Processing

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

Apply convolutions to extract primitives

such as edges

Object parts extracted

Full Spectral Features,e.g. phones, words

Dialect Classification

Loss1Loss2

Loss3

Refinement of accuracy

www.intelligentvoice.com

Page 87: Ppsp icassp17v10

ICASSP-2017 87

17NIST LRE Results

0 50 100

Accuracy – 83.99 (Top-1), 98.89% (Top-5)

English-South_asian_(Indian) Portuguese-

BrazilianSpanish-

Spanish-European Chinese-Min_Dong

Arabic-Modem_Standar

dChinese-

Cantonese Arabic-Egyptian English-

British Spanish-Caribbean Slavic-

Russian Arabic-Maghrebi

Chinese-MandarinArabic-Iraqi

English-American French-

West_AfricanChinese-

Wu Slavic-Polish French-Haitian Arabic-

Leventine

www.intelligentvoice.com

Page 88: Ppsp icassp17v10

ICASSP-2017 88

What about Speech Recognition?We can use spectrograms to train a convolution net to learn to recognise languages and dialectsCan we apply the same idea to a speech recognition task?We decided to test this idea using the TIMIT corpusTIMIT is the most accurately transcribed corpus in existencePhoneticians exhaustively transcribed the position of the phones in every utterance of the corpus

18

Page 89: Ppsp icassp17v10

ICASSP-2017 89

TIMIT Speech Corpus

1.4M spectrograms for the training set

Sliding window used for timing

4 to 5 phones in each window

61 Phoneme Classes

19 Speech Recognition

www.intelligentvoice.com

?

Page 90: Ppsp icassp17v10

ICASSP-2017 90

Speech Recognition20

www.intelligentvoice.com

Page 91: Ppsp icassp17v10

ICASSP-2017 91

21 Speech Recognition

www.intelligentvoice.com

Page 92: Ppsp icassp17v10

ICASSP-2017 92

21 Speech Recognition

www.intelligentvoice.com

Page 93: Ppsp icassp17v10

ICASSP-2017 93

Automatic Speech to Text transcription and Indexing

www.intelligentvoice.com

?

NTIMIT Speech Corpus Added

Page 94: Ppsp icassp17v10

ICASSP-2017 94

22

Convolution networks can do more than classify static images

They were inspired from receptive fields which are temporal classifiers

Convolution networks automate the feature extraction process

Initial findings with NTIMIT (telephone speech version) hint at much more noise robustness than classical ASR approaches

Technological advances in deep learning such as with embedded deep learning are being designed with convolution networks in mind

Conclusions of these experiments with CNNs

What’s next?

www.intelligentvoice.com

Page 95: Ppsp icassp17v10

ICASSP-2017 95

AttentionEncoder LSTMsSoftmax

Decoder LSTMs

xn xn-1 </s>

ym

y1</s>

www.intelligentvoice.com

Automatic Speech to Text transcription and Indexing

Sequence to Sequence ModelEncrypted search based on phonetic strings

Lexicons do not contains every possible word

Developed Seq2Seq model for converting plain text to phonetic strings

Page 96: Ppsp icassp17v10

ICASSP-2017 96

Privacy challenges in speech processing

• Privacy challenges in speech processing are similar to those for other forms of data

• We will assume the presence of two entities– A “capable” server entity

• “capable” in the sense of computationally powerful, and with memory and storage

– A “lightweight” client entity• “Lightweight” in the sense of weak computational abilities.

Page 97: Ppsp icassp17v10

ICASSP-2017 97

Assumed framework

• Who owns what:• Query is always private to client

– Response to client may be private– Alternately, response may be exposed to server

• Data/model on server may be “owned” by ..– Server, not exposable to client– Client, not exposable to server– Third party, not exposable to server or client

Client Server

Query

Model/Database Compute power

Response

Response

Page 98: Ppsp icassp17v10

ICASSP-2017 98

When response is private

• Computation must be performed on private query to obtain private results

• Model/Database may be private or public• Situation ideal for Homomorphic Encryption

Client Server

Query

Model/Database Compute power

Response

Page 99: Ppsp icassp17v10

ICASSP-2017 99

HE basic formalism

• The basic idea of HE still follows the original proposal by Craig Gentry

• The build on a somewhat homomorphic encryption scheme, e.g.

• To decode:

Page 100: Ppsp icassp17v10

100

The problem with Somewhat HE• E[b] + E[c] = (b + 2rb +kbp) + (c + 2rc +kcp)

= (b+c) + 2(rb+rc) + (kb+kc)p

= (b+c) + 2r + kp = E[b + c]

• The noise increases by 1 bit– Addition of two numbers adds one bit

• E[b] . E[c] = (b + 2rb +kbp).(c + 2rc +kcp)

= (bc) + 4(rbrc) + kp + 2(other terms)

= E[bc]• The noise doubles in bits

– Multiplying two numbers doubles the number of bits

cryptographic solutions© Chollet, Petrovska, Raj

Page 101: Ppsp icassp17v10

ICASSP-2017 101

Problem with Somewhat HE and bootstrapping

• An L-level circuit– Each level comprises multiplications

• At each level, the number of bits doubles• Very soon we run out of bits

– The level of noise is greater than the width the encryption

• Solution: Bootstrapping– DECRYPT in the encrypted domain

L=4

Page 102: Ppsp icassp17v10

ICASSP-2017 102

Bootstrapping• Decryption is just another arithmetic operation• Like any other arithmetic operation, it can be performed in

the encrypted domain using HE to obtain an encrypted result

• But the result will the the encrypted decryption of the encrypted data– I.e just another encryption of the original data– But with reset noise

• Bootstrapping permits computation of circuits of arbitrary depth, but at a great cost

Page 103: Ppsp icassp17v10

ICASSP-2017 103

The cost of HE

• The computation speed of HE is measured via the per-gate computation time (ratio of encrypted to cleartext computation time)– Expressed as , where is a security parameter (typically

100)• Actually a polynomial in security parameter lambda

– Bootstrapping: Original – Bootstrapping also requires hiding the private key in

the public key and hoping subset sum is complicated

Page 104: Ppsp icassp17v10

ICASSP-2017 104

A less general solution

• If the depth of the circuit is known a priori, design the encryption such that the noise does not fold over within the required computations– Encryption customized to circuit depth

L=4

Page 105: Ppsp icassp17v10

ICASSP-2017 105

Closer to state-of-the-art

• Brakerski Gentry Vaikuntanathan (BGV) encryption– Uses the learning with error problem as basis for encryption– Uses a remodularization step to reduce noise increase with

computations• Increases linearly with layers, instead of exponentially

– Customizes encryption to circuit depth– Processes multiple bits simultaneously

• Enables parallel computation

• Improved speed BGV: or ( is depth of circuit)• No need for public key to carry private key• Limitation: Circuit depth

Page 106: Ppsp icassp17v10

ICASSP-2017 106

Publicly available tools• HELib

– Brakerski Gentry Vaikuntanathan (BGV) encryption

• https://github.com/shaih/HElib• Incorporate it into your code and test• Typical results

– Speed (from “Subring Homomorphic Encryption” by Arita and Handa) : time in ms, on 2.8ghz Celeron

– Circuit depth limitation also remains

Page 107: Ppsp icassp17v10

ICASSP-2017 107

HE: Other Limitations• Can only perform computations that can be expressed as

polynomials• Cannot perform branching

– Generalized implementation of “IF” not possible• Can be done through expensive circuit expansion

• No arbitrary looping– Cannot do the if required

• No binary search• Division is not possible• No max/argmax• Limit on number of operations

Page 108: Ppsp icassp17v10

ICASSP-2017 108

Public tools: Cryptonet

• Fully-homomorphic implementation of Convolutional Neural Networks– Replaces RELU activation with polynomial– Replaces Maxout with mean operation– Fixed depth of circuit– Weights are assumed to be cleartext, do not consume bits of noise– BGV encryption optimized for network depth– No argmax/softmax in the final layer; entire output returned to user

• Possible to perform various computer-vision-like operations on encrypted data

Page 109: Ppsp icassp17v10

ICASSP-2017 109

When the outcome is made available to the server

• This situation is more amenable to functional encryption

Client Server

Query

Model/Database Compute power

Response

Page 110: Ppsp icassp17v10

ICASSP-2017 110

Functional Encryption• An encryption scheme that permits the server to evaluate user-

specified functions on private data.• Client produces

– Public key pk

– Function specific secret key sk

• Specific to the function f()

– Public-key encrypted data may be stored on the server• By anyone, including server

• Server computes– Dec(E[x],sk) to obtain f(x)

– Learns nothing else

Page 111: Ppsp icassp17v10

ICASSP-2017 111

Functional Encryption

• A variety of FE methods have been proposed– Can compute arbitrary Boolean functions on public

(exposed to server) index– Can only compute limited functions on private indices

• Inner products, simple Boolean functions

• Restrictions:– Client in charge of generating function-specific key– Only simple functions– Useful for some kinds of tasks, e.g. mining multi-user data

Page 112: Ppsp icassp17v10

112

The “problem” with conventional encryption

• The Encryption of any number X carries NO information about the encryption of any other number Y– Or even of a second encryption of X itself, if the encryption is

semantically secure– By requirement!

X

Enc[X]

Page 113: Ppsp icassp17v10

113

Information Theoretic Security

• The mutual information between the encryption of any two messages is 0– Regardless of the distance between them

• The MI between the encryption of any group of messages is 0• This ensures that you cannot learn about the data simply by viewing large

numbers of encrypted messages and studying their patterns

d(X,Y)

MI(Enc[X], Enc[Y])

Page 114: Ppsp icassp17v10

114

Semantic Security

• Given: The encryption of X, and the encryption mechanism – i.e. given only the public key

• But not the decryption key• If we scan the space with a “probe” Y by encrypting Y and comparing to the

Encryption of X we will not find X even if Y = X!!

XY

Page 115: Ppsp icassp17v10

ICASSP-2017 115

An alternate solution: Hashing

• An alternate form of encryption: Cryptographic hashes– Weaker requirements: repeatable encryption– Not invertible– Very fast– Can we use these instead of encryption?

Page 116: Ppsp icassp17v10

116

Cryptographic hash: Basics• A cryptographic hash function maps variable length clear text messages

to a fixed length cipher text

• Not necessarily invertible– Encryption schemes can be cryptographic hashes, but not all cryptographic

hashes are encryption schemes

• The bit pattern of the hash is random– And uninformative about the underlying cleartext– Repeated hash of the same string results in the same output

• E.g. MD5, SHA-1, SHA-2, SHA-3

Page 117: Ppsp icassp17v10

117

Hashing: Weakening privacy

• Given: The encryption of X, and the encryption mechanism • If we scan with probe Y, we will find X

– Enc(Y) = Enc(X) if Y = X– Enables authentication using passwords

XY

Page 118: Ppsp icassp17v10

118

Information Theoretic Security: Hashing

• The mutual information between the encryption of any two messages is 0– Regardless of the distance between them

• Except if d(X,Y) = 0

d(X,Y)

MI(Enc[X], Enc[Y])

Page 119: Ppsp icassp17v10

ICASSP-2017 119

Still not ok

• Exact matches rarely occur in pattern matching tasks like speech processing

Page 120: Ppsp icassp17v10

120

The problem: revisited

• The computational and information theoretic challenges arise because of the attempt at perfect security– Traditional encryption and hashing schemes attempt to hide

all information about the original data

X

Enc[X]d(X,Y)

MI(Enc[X], Enc[Y])

Page 121: Ppsp icassp17v10

121

The bear and the hunters

• Pragmatic solution: Be more secure than the next guy– Don’t be the easiest target

Page 122: Ppsp icassp17v10

122

Information Theoretic Security

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract

geometry of high-dimensional data

d(X,Y)

MI(Enc[X], Enc[Y])

Page 123: Ppsp icassp17v10

123

Information Theoretic Security

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract

geometry of high-dimensional data

d(X,Y)

MI(Enc[X], Enc[Y])

Leaky region

Page 124: Ppsp icassp17v10

124

Information Theoretic Security

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract geometry of high-

dimensional data

• With user-selected leakage?

d(X,Y)

MI(Enc[X], Enc[Y])

Leaky region

Page 125: Ppsp icassp17v10

125

Information Leakage

• Fully secure hash: Will not know anything about X unless Y = X• Leaky hash: Will get some information about X if Y is within D of X

XY

XY

Page 126: Ppsp icassp17v10

ICASSP-2017 126

Challenges

• How do we design such a hash• Particularly one that works on real-valued

vectors..

Page 127: Ppsp icassp17v10

127

LSH with Euclidean Distance

• A vector X gets converted to a vector of M numbers H(X) = [h1(X) h2(X) h3(X) … hM(X)]

• Vi is a random vector drawn from a normal distribution

• bi is a random number between 0 and w• w is the quantization width

wbVXbVXhXh ii

T

iiii ),;()(

Page 128: Ppsp icassp17v10

128

Euclidean LSH

• A 2-D example

Page 129: Ppsp icassp17v10

129

Euclidean LSH

• A 2-D example• To calculate the first component in the hash key: h1(X)• Generate random vector V1 and bias b1

– (V1, b1) are the user’s private parameter

V1

b1

Page 130: Ppsp icassp17v10

130

Euclidean LSH

• A 2-D example

• “Stripe” the space orthogonal to the vector V1

• Count stripes starting from bias location

• The first component in the hash key ID of its stripe : h1(X) = 1

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

b1

Page 131: Ppsp icassp17v10

131

Euclidean LSH

• A 2-D example• The second component in the hash key : h2(X) = -2

– (V2, b2) are also user’s private parameter

V1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

b2

Page 132: Ppsp icassp17v10

132

Euclidean LSH

• The two-component hash:H(X) = [h1(X) h2(X)] = [1 -2]– [(V1, b1), (V2, b2)] are the user’s private parameter

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

Page 133: Ppsp icassp17v10

133

Euclidean LSH

• H(X) = [1 -2]• All vectors in the highlighted cell will have the

same LSH key

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

Page 134: Ppsp icassp17v10

134

Trivial distance computation with Euclidean LSH

• If two vectors are in the same cell, they will have identical LSH keys

• As vectors move away, the Manhattan distance between their hashes increases

max0

Page 135: Ppsp icassp17v10

135

The size of the cell

• Increasing the number of components in H(X) makes the cell smaller

• H(X) = [h1(X) h2(X)]

= [ 1 -2]

crypto- transforms© Chollet, Petrovska, Raj

Page 136: Ppsp icassp17v10

136

• Increasing the number of components in H(X) makes the cell smaller

• H(X) = [h1(X) h2(X) h3(X)]

= [ 1 -2 7]

The size of the cell

crypto- transforms© Chollet, Petrovska, Raj

Page 137: Ppsp icassp17v10

137

• Increasing the number of components in H(X) makes the cell smaller

• H(X) = [h1(X) h2(X) h3(X) h4(X)]

= [ 1 -2 7 0]

The size of the cell

crypto- transforms© Chollet, Petrovska, Raj

Page 138: Ppsp icassp17v10

138

Randomness in the size of the cell

• Increasing key length reduced cell size

• Reduced cell size more likely that two vectors that fall in the same cell (have same LSH key) are very close

• Also makes it more likely to miss valid vectors– Which may fall outside the cell simply

because of the vagaries of its shape

Page 139: Ppsp icassp17v10

139

Adapting LSH: Secure Modular Hashing

• Modular quantization of randomly shifted random projections of data

Page 140: Ppsp icassp17v10

140

Secure modular hashes

• Conventional LSH

V1

0 1 2 3 4 5-5 -4 -3 -2 -1

b1

Page 141: Ppsp icassp17v10

141

Secure modular hashes

• Conventional LSH

V1

b1

0 1 0 1 0 11 0 1 0 1 0 1

Page 142: Ppsp icassp17v10

142

Secure Modular Hashes

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]

V1

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

Page 143: Ppsp icassp17v10

143

Secure Modular Hashes

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]

V1

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

crypto- transforms© Chollet, Petrovska, Raj

Page 144: Ppsp icassp17v10

144

Secure Modular Hashes

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]– Images of the green region are indistinguishable

V1

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

Page 145: Ppsp icassp17v10

145

Information Leakage

XY

• Only provides information about other vectors that lie within small ball

Page 146: Ppsp icassp17v10

146

SMH

• Plot of Hamming(Q(X),Q(Y)) vs Euclidean d(X,Y) for different values of

D, and different numbers of bits in Q(X)

Simulations: L-dimensional vectors, M bit hashes

crypto- transforms© Chollet, Petrovska, Raj

Page 147: Ppsp icassp17v10

147

Summary of information hiding methods

• Conventional encryption– Not useful

• Homomorphic encryption– Secure, expensive, only reveals

outcome of computation to client/user– Many limitations in usage

• Secure multiparty computation– Security related to cost. – Outcome of computation to either

client or server– Technically unlimited in usage

• Functional encryption– Secure, expensive, only reveals

outcome of computation to server– Many limitations in usage

• Hashing– Less secure, very low cost, result of

computation to server– Only permits distance computation

and comparison

Page 148: Ppsp icassp17v10

ICASSP-2017 148

Applying these to speech tasks

• How feasible is private speech processing now?

• We will assume the presence of two entities– A “capable” server entity

• “capable” in the sense of computationally powerful, and with memory and storage

– A “lightweight” client entity• “Lightweight” in the sense of weak computational abilities.

• Server and client do not trust one another

Page 149: Ppsp icassp17v10

ICASSP-2017 149

Assumption in what follows(and what was presented)

• User Alice and System Bob• User has a smart phone or computation capable device

– Communicates with server using this device

• User’s client device also performs feature computation and all other necessary computation

• Client may perform one-time expensive computation in setup stage

Page 150: Ppsp icassp17v10

ICASSP-2017 150

Automatic speech recognition

• System has models– Traditional : Acoustic and language models– End-to-end neural net based: Neural network– Client has audio, requires recognition output

Client Server

Query

Model

Compute power

Response

Page 151: Ppsp icassp17v10

ICASSP-2017 151

6 Applying HE to ASRUser generates and transmits data

System performs conventcomputations

Returns vector of probabilities to user

Can work on short data blocks

Can perform low-perplexity tasks

E.g. phone recognition

www.intelligentvoice.com

User

audio

result

Page 152: Ppsp icassp17v10

ICASSP-2017 152

Feasibility of Private ASR• Homomorphic Encryption:

– Feasible for small, low complexity tasks– Currently not feasible using any formalism in generic setting

• Not a computational limitation; limitation results from inherent limitations of FHE

• SMC based solutions:– Client and server perform computations collaboratively using SMC protocols– Theoretically feasible

• Possible for small grammars• Impractical in more general settings

– Extreme communication and computational overhead, particularly on client

• Will require devising of zero-knowledge proofs to verify computation

Page 153: Ppsp icassp17v10

ICASSP-2017 153

Speaker Mining

• Server possesses a speech database– Ownership issues to be considered shortly

• Client mines it for a particular speaker without revealing query or response to server

Client Server

Query

Data owned by server Compute

power

Response

Page 154: Ppsp icassp17v10

ICASSP-2017 154

Speaker Mining

• Setup : Server retains appropriately parameterized version of speech– E.g. I-vectors

• Client queries with similarly parameterized query vector• Server computes response by direct matching or classification

Client Server

Query

Response

Page 155: Ppsp icassp17v10

ICASSP-2017 155

Speaker Mining: Server owns data

• Server possesses a speech database owned by server• Client mines it for a particular speaker without

revealing query or response to server

Client Server

Query

Data owned by server Compute

power

Response

Page 156: Ppsp icassp17v10

ICASSP-2017 156

Feasibility• Setup:

– Client obtains model for the speaker– Server evaluates it on entire corpus

• Homomorphic Encryption:– Feasible under specific formalisms

• Very slow

• SMC based solutions:– Theoretically feasible under specific formalisms

• Practical under “honest but curious” assumption of security

– Very slow for large corpora

Page 157: Ppsp icassp17v10

ICASSP-2017 157

Speaker Mining: Client owns data

• Server possesses a speech database owned by client• Client mines it for a particular speaker without

revealing data, query or response to server

Client Server

Query

Data owned by client Compute

power

Response

Page 158: Ppsp icassp17v10

ICASSP-2017 158

Feasibility• Homomorphic Encryption:

– Feasible but impracticably slow

• SMC based solutions:– Feasible, but slow– Insecure

• Hashing based solutions:– Client stores hashes of i-vector representations of speech on server– Client matches query i-vectors of recordings from speaker to recover other

recordings by speaker in the server corpus– Feasible, practicable– Potential security issues over many searches

Page 159: Ppsp icassp17v10

ICASSP-2017 159

Speech Mining

• Server possesses a speech database– Ownership issues mentioned shortly..

• Client mines it for a particular speaker without revealing query or response to server

Client Server

Query

Data owned by server Compute

power

Response

Page 160: Ppsp icassp17v10

ICASSP-2017 160

21 Mining

www.intelligentvoice.com

Server can potentially compute phonetic recognitionon blocks on audio homomorphically without seeing audio

Challenges: how to mine them

Page 161: Ppsp icassp17v10

ICASSP-2017 161

Mining Speech: Server owns data

• Server possesses a speech database owned by server• Client mines it for words/phrases without revealing them to server• Fundamentally no different from client owning data

– Data must be encrypted prior to processing to hide response from server

Client Server

Query

Data owned by server Compute

power

Response

Page 162: Ppsp icassp17v10

ICASSP-2017 162

Mining Speech: Client owns data

• Server possesses a speech database owned by client– May not “see” it

• Client must mine it for patterns– Result private to client

Client Server

Query

Compute power

Response

Data owned by client

Page 163: Ppsp icassp17v10

ICASSP-2017 163

Feasibility• Approach:

– Client stores phoneme decode of data on server in encrypted form

– Client searches for other phonetic patterns later, also privately

• SMC based solutions:– Theoretically feasible, impractical

• Homomorphic Encryption:– Feasible, slow

Page 164: Ppsp icassp17v10

164

Mining Speech: Third party owns data

• Server possesses a speech database owned by one or more third parties– May not “see” it

• Client must mine it for patterns– Only works if

• Query and result may be exposed to server• Query may be exposed to third parties

ICASSP-2017

Client Server

Query

Compute power

Response

Data owned by third party

Page 165: Ppsp icassp17v10

ICASSP-2017 165

Feasibility• Approach:

– Data owners upload to server encrypted– Client/server broadcasts query– Data owners provide access

• Functional encryption based solution– Feasible, but will not scale

Page 166: Ppsp icassp17v10

ICASSP-2017 166

Verification

• Client enrolls with server using private speech data– Server never sees data in the clear

• Client attempts to authenticate using private data– Server never sees data– Server authenticates (result with server)

Client Server

Query

Model

Compute power

Response

Page 167: Ppsp icassp17v10

ICASSP-2017 167

Speaker Verification

• Client computes parameterization of voice (e.g. I-vector) and sends to server for registration

• Client queries with similarly parameterized query vector• Server matches to model to decide to authenticate

Client Server

Query

Response

Registration Model

Page 168: Ppsp icassp17v10

ICASSP-2017 168

Feasibility• SMC based solutions:

– Theoretically feasible, impractical

• Homomorphic Encryption– Feasible, but impractical

• Functional encryption – Client computes features (e.g. I-vectors) during enrollment– Client ships features (e.g. I-vectors) for authentication

• Sever sees neither

– Feasible, but expensive

• Hashing– Same setup as for functional encryption

– Feasible and currently practicable, minimal computational overhead

Page 169: Ppsp icassp17v10

ICASSP-2017 169

SOME CONCLUSIONS AND DISCUSSIONSSection 4

Page 170: Ppsp icassp17v10

ICASSP-2017 170

Some conclusions and discussions

• So where are we now ?– What can we solve ?– Where must we go ?

• What if homomorphic encryption becomes a reality ?

• The immediate and the distant future

Page 171: Ppsp icassp17v10

ICASSP-2017 171

State of the union and future

• A variety of tools exist– But are generally too inefficient or limited in scope– Not sufficiently advanced to provide generic solutions– Rapid advances occurring

• The majority of applications remain infeasible– Primarily for computational reasons– But also for theoretical limitations of current tools– HE and FE are partial solutions, but may never provide complete

solutions• Theoretical limitations

Page 172: Ppsp icassp17v10

ICASSP-2017 172

But the problem is real and serious

• Speech-based services more popular than before• The issues of privacy and security are increasingly

relevant• Future solutions:

– Improvement in current tools– Improvement/modification in the manner in which we

perform speech tasks, to make them better suit tools– Development must happen in tandem– Much work remains

Page 173: Ppsp icassp17v10

ICASSP-2017 173

Links :

• Lecture of Bhiksha Raj on YouTube : https://vimeo.com/87341704

• With the slides at :http://mlsp2012.conwiz.dk/fileadmin/lectures/mlsp2012_Raj.pdf

Page 174: Ppsp icassp17v10

ICASSP-2017 174

Further readings[Aguilar, 2013] Aguilar-Melchor C., Fau S., Fontaine C., Gogniat G. & Sirdey R. “Recent Advances in Homomorphic Encryption: a Possible Future for Signal Processing in the Encrypted Domain”, IEEE Signal Processing Magazine Vol 30:2.

[Boufounos, 2011] Boufounos, P. & Rane, S. “Secure Binary Embeddings for Privacy Preserving Nearest Neighbors”, in Proc. IEEE Workshop on Information Forensics and Security, Brazil, Dec. 2011. MERL TR2011-077

[Gentry, 2009] Gentry, C. “A fully homomorphic encryption scheme”, PhD thesis, Stanford

[Gomez, 2016] Gomez-Barrero, M., Fierrez J. & Galbally, J. “Variable-length Template Protection based on Homomorphic Encryption with Application to Signature Biometrics”, 4th International Workshop on Biometrics and Forensics.

[Jimenez, 2015] Jimenez A., Raj B.“Secure Modular Hashing”, Proc. IEEE Workshop on Information Forensic and Security.

[Jimenez, 2017] Jimenez A., Raj B.“Privacy preserving distance computation using somewhat trusted third parties”, Special Session on « Privacy Preserving Signal Processing », ICASSP

[Naehrig, 2011] Naehrig, M., Lauter, K. & Vaikuntanathan, V. “Can Homomorphic Encryption be Practical ?”, Proc. Of the 3rd ACM Workshop on Cloud Computing Security, pp 113-124

Page 175: Ppsp icassp17v10

ICASSP-2017 175

Further readings[Pathak, 2013] Pathak M., Raj B., Rane S., Smaragdis P.“Privacy-preserving speech processing: cryptographic and string-matching frameworks show promise”, IEEE Signal Processing Magazine 30:2, pp. 62-74, March 2013

[Portelo, 2015] Portelo J., Trancoso I., Raj B.“Logsum using Garbled Circuits”, PLoS one, 10(3): e0122236

[Rane, 2013] Rane, S & Boufounos, P-T. “Privacy-preserving Nearest Neighbor Methods: Comparing Signals without revealing them”, MERL, TR 2013-004, IEEE Signal Processing Mag. Vol 30:2, pp. 18-28, Feb 2013.

[Smaragdis, 2007] Smaragdis, P. & Shashanka, M. “A framework for Secure Speech Recognition”, IEEE Trans. ASLP Vol 15:4, pp 1404-1413.

[vanDijk, 2010] Van Dijk, M. & Juels, A., “On the impossibility of cryptography alone for Privacy Preserving Cloud Computing”, Proc. Of HotSec.

[Xie, 2015] Xie, P., Bilenko, M., Finley, T., Gilad-Bachrach, R., Lauter, K. & Naehrig, M., “Crypto-Nets: Neural Networks over Encrypted Data”, ICLR, arXiv: 1412.6181

[Zyskind, 2015] Zyskind, G., Nathan O. & Pentland, A., “Decentralizing Privacy using Blockchain to Protect Personal Data”, IEEE Workshop on Security and Privacy, San Jose