ppsp icassp17v10

ICASSP-2017 1

Privacy Preserving Speech Processing

Gérard Chollet (IV & CNRS-IMT)Jean-Jacques Quisquater (UCL)

Bhiksha Raj (CMU)

www.intelligentvoice.com

ICASSP-2017 2

Some Issues

• SIRI (or a hacker who breaks into SIRI) can– Use (edit) your voice recordings to impersonate you

– Learn about you• Your identity, gender, nationality (accent), emotional state..

– Track you from uploads / communications of voice recordings

• Nothing specific to SIRI, same issues with Google Now, Alexa, Cortana,…

• Not a futuristic scenario– Everytime you use your voice, you leave a print behind!!

ICASSP-2017

More problems

• Doctors / Lawyers / Government agencies / Banks wish to use a cloud based speech recognition service– But can’t – HIPAA / CNIL / laws prevent them from exposing the data

• Speech data warehouses could be mined for useful market patterns– But the audio also contains recordings of people reciting their credit card

numbers, social security numbers etc..3

Speech Recognition System

ICASSP-2017 4

Opinions

How happy are we about living our lives in the thrall of the big harvesters of data, the Googles, the Amazons, the Apples?

Source:We live in the Big Cloud: And we hate it… Is it time for Hipster IT?Nigel Cannings, CTO Intelligent Voicehttps://hackernoon.com/we-live-in-the-big-cloud-and-we-hate-it-is-it-time-for-hipster-it-1f130a44d2b8#.tqb3xnsdl https://surveys.google.com/reporting/survey?survey=fwn6wwimqoqkezlhur3zrdh2oe

ICASSP-2017

The Problem• Security: NSA/GCHQ must monitor calls for public safety

– Caller may be a known miscreant / terrorist– Call may relate to planning subversive activity

• The gist of the problem:– NSA is possibly looking for key words or phrases

• Did we hear “bomb the pentagon”??

– Or if some key people are calling in• Was that Ayman al Zawahiri’s voice?

• But must have access to all audio to do so– Including recordings by perfectly innocent people

ICASSP-2017

The NSA Problem as a Metaphor• Telephone company unwilling to expose

audio to NSA– May provide encrypted data to NSA

• NSA cannot expose what it is trying to find to the telephone company– May provide it in encrypted form though

ICASSP-2017

Abstracting the problem

• Data holder willing to provide encrypted data– A locked box

• Mining entity willing to provide encrypted keywords– Sealed packages

• Must find if keywords occur in data– Find if contents of sealed packages are also present in the locked box

• Without unsealing packages or opening the box!

• Data are spoken audio7

ICASSP-2017 8

Outline of the tutorial

• Section 1: INTRODUCTION: PRIVACY AND SPEECH (10 mins)

• Section 2: CRYPTO AND LSH TOOLS (80 mins)• (Coffee break)• Section 3: PRIVACY FOR SPEECH (80 mins)• Section 4: CONCLUSIONS and DISCUSSIONS (10 mins)

ICASSP-2017 9

INTRODUCTIONPRIVACY AND SPEECH

Section 1

ICASSP-2017 10

Introduction : Privacy and Speech

• The problem of privacy.• Speech applications• Privacy issues in speech applications.

ICASSP-2017 11

Current market targets for IV Intelligent Voice® takes your company's phone calls (and email and IM) and turns them into smart data

ICASSP-2017 12

Our motivation

• Clients do not want to maintain HW and SW !

• Sensitive Data• GPU hardware

expensive• Cloud cheap

ICASSP-2017 13

Privacy Preserving Speech to Text

Intelligent Voice

Client

2) E(data,f) 1) f (lex, AM, LM, Search)

3) E(f(data))

1) The Client receives f = (Software and Models) from IntelligentVoice

2) The Client encrypts automaticallywith his private KEY both f andhis data

3) The Cloud returns the encrypted result

4) The Client decrypts the encryptedresult using his private KEY

ICASSP-2017 14

TOPICS IN SPEECH PROCESSING

IDENTITYPHONETICS

RECOGNITION

GRAPHEMESTO PHONEMES

SYNTHESIS

ANALYSIS COMPRESSION and STORAGE

RESTITUTION

CODING and TRANSMISSION

Bla-blabla … Bla-bla

LANGUAGEMESSAGE

EMOTIONS

ICASSP-2017 15

Speaker Identification

Speaker Verification

Speech RecognitionYou said

”hello, world!”

A Brief Primer on Speech Processing Tasks

Which one ofAlice, Bob, Carol, Dave, … are you?

(multi-class)

Are you really Bob? Yes/No

(binary)

• All are pattern classification tasks• Not addressing secure communication of speech (much literature on this topic).

Biometrics

ICASSP-2017

Automatic Speech Processing Technologies

• Lexical content comprehension– Recognition

• Determining the sequence of words that was spoken

– Keyterm spotting• Detecting if specified terms have occurred in a recording

• Biometrics and non-lexical content recognition– Identification

• Identifying the speaker

– Verification/Authentication• Confirming that a speaker is who he/she claims to be

• All of these involve statistical pattern classification16

ICASSP-2017 17

Biometric Applications : Speaker ID

• This biometric application deal with determining the identity of the speaker.

• Here, the set C is a closed set of candidate speakers for a recording

• In open set speaker id, speaker verification needs to be performed on the selected candidate.

CCtC XPC );(logmaxargˆC

ICASSP-2017 18

Biometric Applications: Speaker ID• C is a set of “candidate” speakers for a

recording– Parameters of their models are learned from data

for the speaker

• The set C may include a “Universal” speaker representing the “none-of-the-above” option– The parameters U for the universal speaker are

learned using data from many speakers

– U is often called a Universal Background Model

ICASSP-2017 19

Biometric Applications: Speaker Verification

• A user claims an identity S with some data D• System must confirm if the user is who he claims to be• C consists of S and universal speaker U

– The parameters S for speaker S are obtained by adapting

U to data from the speaker S

• Decision is based on comparing the likelihoods that D could have been generated by S or U

ICASSP-2017 20

Feature Computation

• Do not work on speech signal– Work on sequence of feature vectors computed from speech

• E.g. MFCC vector sequence

• “speech recording “ sequence of feature vectors derived from it– X is actually a sequence of feature vectors

• X = [X0 X1 … XT-1 ]

• For the privacy-preserving frameworks we will assume that the user’s client device can compute these features.

ICASSP-2017 21

Learning model parameters

• GMM parameters:– Adapting Bob’s GMM to Alice’s data

• Only adapt means

– Outcome: Bob gets Encrypted means mk for each Gaussian

– (Pathak and Raj, Interspeech 2011)

• HMM parameters– Similarly complicated– (Smaragdis and Shashanka, IEEE TASLP, May 07)

• “Probabilistic function of a Markov chain”

• A dynamical system for time-varying processes

A More Complex Model: Hidden Markov Models

22ICASSP-2017

ICASSP-2017 23

Three Basic HMM Problems

• What is the probability that it will generate a specific observation sequence

• What is the most probable state sequence, for a given observation sequence– The state segmentation problem

• How do we learn the parameters of the HMM from observation sequences

ICASSP-2017 24

CRYPTOGRAPHIC TOOLSSection 2

ICASSP-2017 25

Cryptography• What is secure computation (JJQ)• Homomorphic Encryption (JJQ)

– What is encryption. Public key and symmetric encryption.– Homomorphic encryption.

• Partially homomorphic encryption• Fully homomorphic encryption• State of the art and limitations

• Secure multiparty computation (JJQ)– Basic ideas– Garbled circuits– Secure protocols– Obfuscation– Secret sharing

• Zero Knowledge proofs. (JJQ)

ICASSP-2017 26

Section 2.1

• What is secure computation?• Homomorphic Encryption• Secure Multiparty Computation• Zero Knowledge proofs

ICASSP-2017 27

What is Secure (multi-party) Computation

The goal of creating methods for (2 or more) parties to jointly compute a function over their inputs while keeping those inputs private.

ICASSP-2017 28

Cryptography Basics

Encryption EK1(.)

Plaintext (M) Ciphertext (C)

Encryption Key (K1)

EK1(M) = C

A Good Cryptosystem – all the security inherent in the knowledge of keys, and none in the knowledge of algorithms (Kerkhoffs principle)

Decryption DK2(.)

Ciphertext (C)OriginalPlaintext (M)

Decryption Key (K2)

DK2(C) = M Lossless transformation!

ICASSP-2017 29

Cryptography BasicsSymmetric Cryptosystem (same key)

EncryptHello! t4$We9 Decrypt Hello!

Client System

=In general

ICASSP-2017 30

Cryptography BasicsPublic-key (asymmetric) Cryptosystem

First described in(Diffie and Hellman, 1976)

EncryptHello! t4$We9 Decrypt Hello!

Client System

Public key exchange

Anybody can Encrypt!

Only the receiver can decrypt!

ICASSP-2017 31

Encryption Not Invertible (without key)

Semantic Security (without key)

Ciphertexts jointly uninformative (with high proba no info)

1 2 1 2

Properties of Ideal Cryptosystem

ICASSP-2017 32

Jointly Uninformative: Mutual Information

• Mutual information between any pair/collection of ciphertext messages is 0• Semantic security model: Mutual information between two separate encryptions of the

same message is 0!! We cannot distinguish which ciphertext is coming from a given plaintext.

Mutual Information between ciphertexts

Distance between plaintexts

Comparing directly ciphertexts is

USELESS!!

ICASSP-2017 33

Section 2.2

ICASSP-2017 34

Homomorphic Encryption

Allows for operations to be performed on ciphertexts without requiring knowledge of corresponding plaintexts

First idea based on RSA. Need to be careful about the use.

ICASSP-2017 35

Homomorphism Example: RSAPublic key encryption scheme (Rivest, Shamir, Adelman ‘77)

Based on number theory and modulo n operations.

Homomorphic multiplication

• Cannot perform simple addition, however.• It is not Semantically Secure

gXXE ][

][)(][][ XYEXYYXYEXE ggg

ICASSP-2017 36

Homomorphism Example: PaillierPublic key encryption scheme (Pascal Paillier, Eurocrypt 99).

• Homomorphic addition (useful for counting: application to vote)– Encrypted numbers can be added together.– Encrypted numbers can be added to non encrypted scalars.

• Homomorphic multiplication:– Encrypted numbers can be multiplied by a non encrypted scalar.

• Cannot multiply two encrypted numbers (Partially homomorphic)• It is Semantically Secure!

XgXE ][

][][][ YXEgggYEXE YXYX

][][ XYEggXE XYYXY

ICASSP-2017 37

A fully homomorphic encryption = FHE(addition and multiplication)

• Trivial (only for illustration) example: Encoding a bit

– rb is a random integer

– kb is some M-bit integer– p is a very large, odd L-bit number, L >> M.

• p is the key

• To decode:.

ICASSP-2017 38

A fully homomorphic scheme(continued)

• Supports both addition and multiplication in the encrypted domain

ICASSP-2017 39

FHE and computation• All computations can be expressed as a combination of multiplications

and additions, ie, we can work with Polynomials

• An encryption scheme that permits both addition and multiplication in the encrypted domain permits computation of any function in the encrypted domain

• FHE schemes typically expressed in terms of homomorphic computation of NAND gates (an universal gate)– A NAND B = 1 – A.B [Requires addition and multiplication]– All computations can be expressed as circuits, which can be expressed as

combinations of NAND gates

ICASSP-2017 40

Utility of Homomorphic Encryption

f(x) = ?? I can evaluate f(.) as a service

E [f(x)]

Public key exchange

ICASSP-2017 41

Homomorphic Encryption: Problems

• The “noise” doubles in size after each bit-product operation– Noise quickly becomes greater than p

• Decryption (which requires “mod p”) fails• Even in this trivial symmetric key toy system

• Each bit-level operation now requires an operation over L bits– L is size of p in bits, and must be large

M bits M bits

2M bits

ICASSP-2017 42

Fully Homomorphic Encryption (FHE)

• Unclear whether fully homomorphic schemes were even possible until 2009

• Breakthrough work by Craig Gentry (2009, 2010)– Solves noise problem

• Still not very practical but an active area of research– Individual bit-level computations still take too much time– Computations 100,000x to 1,000,000,000,000x slower than

unencrypted computation

ICASSP-2017 43

Section 2.3

ICASSP-2017 44

Secure Multiparty Computation (SMC)

• A group of untrusting parties desire to compute a joint function of their private data

• “Ideal” situation: All of them send their data to a trusted third party– Who computes the function

and only reveals results

ICASSP-2017 45

Practical SMC (no possible leak)

• Parties communicate directly with one another following specified protocols

• Outcome ideally identical to “ideal” case– Function computed without revealing data

• Protocol: A sequence of steps, involving two or more parties, to accomplish a computational task

ICASSP-2017 46

Practical SMC• Employs many tools:

– Conventional encryption– Partially homomorphic encryption– Oblivious transfer

• Select one of N items without revealing which one– Noise masking

• Hiding information by adding random noise– Secret sharing

• Share data between M people so that at least K of them must collaborate to expose it

ICASSP-2017 47

SMC : A Trivial Example

• A group of people want to find out their average salary– Without revealing individual salaries

ICASSP-2017 48

SMC: A Trivial Example (problems!)

• A group wants to calculate the average salary

(x1 + r)

Chooses a random rAdds his/her salary

(x1 + … + xi-1 + r) (x1 + … + xi + r)

Adds his/her salary

(x1 + … + xn +r)

rSubtracts the random number rAdditive masking

(x1 + … + xn +r)

ICASSP-2017 49

Secure Two-Party Computation

• Originally introduced by Yao in 1986• Express all computation as a circuit

– Alice has a circuit with some inputs– Bob has other inputs– Alice and Bob must compute circuit collaboratively without exposing their

inputs to one another

• Garbled circuit– Can do this through a combination of oblivious transfer (OT) and

symmetric encryption

I have a, and cI want the output f

I have b

ICASSP-2017 50

Secure Two-Party Computation

• Define a number of “primitive” operations that Alice and Bob can perform together– Without revealing their respective inputs– By following standard protocols

• Decompose overall computation in terms of these primitives

• Collaboratively compute the chain of primitives for the final output

ICASSP-2017 51

Secure Inner Product Protocol

• Alice has a vector X• Bob has a vector Y• They wish to compute <X,Y> without exposing

their vectors to one another

I have Y I have X

We want <X.Y>

ICASSP-2017 52

Secure Inner Product

• Alice generates public and private keys. She sends the public key Ke to Bob

• She encrypts her vector using her public key and sends it to Bob:– Alice Bob : Enc[X] = Enc[X[0]], Enc[X[2]], … Enc[X[N]]

• Bob homomorphically multiplies Y[i] with X[i] Enc[X[i]]Y[i] = Enc[X[i]Y[i]]

• He homomorphicaly adds the sample-wise products

Pi Enc[X[i]Y[i]] = Enc [Si X[i]Y[i]] = Enc[<X,Y>]

I have Y I have XKe, E[X]

E[<X,Y>]

Bob has E[<X,Y>]. He never saw X

I Use: PaillierE[x] a Kx

E[x]E[y] = E[x+y]E[x]y = E[xy]

ICASSP-2017 53

Secure Inner Product

• Bob has Enc[<X,Y>]• Bob homomorphically subtracts a random number r to get Enc[<X,Y> - r]

– And sends this to Alice who decrypts– Alice gets <X,Y> - r. Bob has r– They must add their answers to get <X,Y>.

• They have additive shares of the answer

I have E[<X,Y>] and Ke

I have r I have <X,Y> - r

ICASSP-2017 54

Primitive: Secure Inner Product (SIP)

• SIP: Alice has vector X. Bob has Y.

• Outcome:– Bob and Alice get additive

shares rA and rB which must be

added to get the answer

• Alternately, either Alice or Bob gets the entire answer

E[<X,Y>]

rA+rB =<X,Y>

ICASSP-2017 55

SMC Primitives• General format: Computing simple function

f(X,Y) of Alice’s private data X

and Bob’s private data Y

• One of the following outcomes:

– Both parties get random additive shares of the

result

• Alice gets rA, Bob gets rB

• Actual result f(X,Y) = rA+rB

– One party gets encrypted result Enc[f(X,Y)]

– One party gets the complete result f(X,Y)

E[f(X,Y)]

f(X,Y)

rA+rB =f(X,Y)

ICASSP-2017 56

Examples of Primitives

• Secure inner product– f(X,Y) = <X,Y>– Also possible if Bob has E[Y]

• Secure max– f(X,Y) = maxi (Xi + Yi)

• Secure max-ID– f(X,Y) = argmaxi Xi + Yi

• Several such primitives can be defined

E[f(X,Y)]

f(X,Y)

rA+rB =f(X,Y)

ICASSP-2017 57

Computing Complex functions with SMC Primitives

• Conventional computation: User Alice sends data to system Bob• Bob computes an algorithm

• SMC: Computation recast as a sequence of primitives• Alice and Bob compute primitives via SMC• Bob gets the result

ALGORITHMBOB

ICASSP-2017 58

Typical Assumptions

• Parties are semi-honest, i.e. honest-but-curious– The party tries to get as much information from the result and

outputs of intermediate steps

– However, the party does not act maliciously (e.g. by lying about the inputs used)

• They follow the protocol correctly

• Can be “fixed” through “zero-knowledge proofs”– “Expensive” protocols that verify answers without knowing the

answer

ICASSP-2017 59

Section 2.4

ICASSP-2017 60

Zero Knowledge Proofs (ZKPs)

• ZKP : – “Prover” has some information– “Verifier” wants to ensure she has it– But Prover will not reveal information to Verifier– She can use ZKPs to convince Verifier

ICASSP-2017 61

• Peggy has a magic word to open a secret door in a cave

• Victor wants to pay for the secret, but not until he’s sure she knows it

• Peggy will tell the secret but not until she receives the money

Quisquater et al. ’89, figure from Wikipedia

ICASSP-2017 62

Victor Peggy

I have its solution!

ICASSP-2017 63

Peggy goes to a different room than Victor and chooses a random permutation σ of {1,…,9}

ICASSP-2017 64

Victor Peggy

Victor can:1. Choose one of the rows.2. Choose one of the columns.3. Choose one of the sub-boxes.4. See the permuted version of the original puzzle.

ICASSP-2017 65

Zero Knowledge Proofs (ZKPs)• Assume that Peggy’s information is a solution to a hard problem

• Peggy converts her problem to an isomorphic one• Peggy solves the new problem and commits answer• Peggy reveals the new instance to Victor• Victor asks Peggy either to

– prove the instances are isomorphic; or– open the committed answer and prove it’s a solution

• Repeat n times

• Typical hard problems: finding graph isomorphisms or Hamiltonian cycles (NP-complete problems, but …)

ICASSP-2017

A Musical Conundrum

• Alice has just found a short piece of music on the web– Possibly from an illegal site!

• She likes it. She would like to find out the name of the song

ICASSP-2017

Alice and her song

• Bob has a large, organized catalogue of songs

• Simple solution:– Alice sends her song snippet to Bob– Bob matches it against his catalogue– Returns the ID and metadata of the song that has the best

match to the snippet

ICASSP-2017

Alice has a problem

• Her snippet may have been illegally downloaded• She may go to jail if Bob sees it

– Bob may be the DRM police..

ICASSP-2017

An Unacceptable Solution• Alice distrusts Bob

– So…

• Bob could send his catalogue to Alice to do the matching herself..– Really??– Bob’s catalogue is his IP.– Alice may be a competitor

• Or a malicious person wanting to expose Bob’s catalogue

• Bob distrusts Alice– Will not send her his catalogue

ICASSP-2017

Solving Alice’s Problem

• Alice could encrypt her snippet and send it to Bob• Bob could work entirely on the encrypted data

– And obtain an encrypted result to return to Alice!• A job for Secure Multi-party Computation

ICASSP-2017

Lessons Learned• Possible to perform complex collaborative

operations without revealing information!– Through careful use of cryptographic tools

• Illustrates a few concepts– Homomorphic encryption– SMC– Oblivious Transfer– Primitives

ICASSP-2017 72

COFFEE BREAK!

ICASSP-2017 73

Dealing with Speech• Speech applications. (GC)• General formalism (GC)• Speaker verification and diarization (GC)• Recognition (GC)

– Isolated word recognition and keyword spotting– Large Vocabulary Continuous Speech recognition

• CNN, RNN, BLSTM, Sequence to Sequence, Attention– Spoken Dialog Systems, Speech to Speech translation

• Computation with privacy (BR)– Tools– Application to problems in biometrics– Application to problems in recognition– Application to retrieval– Computational and practical challenges– Hashing based solutions

ICASSP-2017 74

Motivations for DNN approaches

Traditional ASR involves many hand-crafted stages of feature extraction: mfccs, deltas, delta-deltas, cmvns, …

Gaussian Mixture Models, i-Vectors, …

Hidden Markov Models

Weighted Finite State Automata

Viterbi, Expectation Maximisation, Baum Welch, …

There are many steps requiring extensive insight and expertise

New end-to-end deep learning approaches making significant simplifications to this work flow

Typically making use of DNNs, RNNs, and …

Computational power is simplifying this process

Traditional Speech Processing such as ASR is complicated

ICASSP-2017 75

4Convolution Networks: Brief History

Fukushima, Kunihiko, ‘Neocognitron: A Self-organizing Neural Network Model for a Mechanism

of Pattern Recognition Unaffected by Shift in Position,’ Biological Cybernetics 36 (4): 193-202,

LeNet 5 (1998), image source: http://yann.lecun.com/exdb/lenet/

Inspired from receptive fields in the visual cortex

Notable Implementations:

Fukushima’s NeoCognitron

Explicit parallel implementations

LeCun’s LeNet-5

Ciresan’s GPU Implementation

GoogLeNet

(1980)

(1988)

(1998)

(2011)

(2014)

ICASSP-2017 76

5 GoogLeNetState of the Art winner of the ImageNet 2014 competition: classifying 1.2M images into 1K classes

Convolution neural network inspired by LeCun’s LeNet-5

Has 9 ‘Inception’ modules, multiple convolution sizes, and pooling in each module

Stochastic Gradient Descent used to train the network with ‘dropout’ which helps prevents overfitting

Szegedy, ‘Going deeper with convolutions,’ arXiv, 2014

ICASSP-2017 77

6 GoogLeNet StructureTopology consists of ‘Inception’ modules consisting of:

Convolutions – Filters for extracting features, filter size tends to be small in the early layers, bigger in later layers

Pooling – dimensionality reduction

Softmax loss for predicting classes at 3 progressive stages of the network

Other – concatenations for combining convolutions

‘Rinse and Repeat’ 9 times

ICASSP-2017 78

7 NIST LRE Competition6 Language clusters, 20 dialects:

Arabic (Egyptian, Iraqi, Levantine, Maghrebi, Modern Standard)

Chinese (Cantonese, Mandarin, Min, Wu)

English (British, General American, Indian)

French (West African, Haitian Creole)

Iberian (Caribbean Spanish, European Spanish, Latin American Spanish, Brazilian Portuguese)

Slavic (Polish, Russian)

500+ hours audio

data set very unbalanced

2015 NIST Language Recognition Evaluation,http://www.nist.gov/itl/iad/lre15.cfm

ICASSP-2017 79

RASTA CuFFT

MATLABSOX PYTHON

Spectrogram Convolution Network

Based on Nvidia’s Digits implementation

of GoogLeNet

Converted speech to 256x256 pixel spectrograms

Tried different spectral representations and

coding…

ICASSP-2017 80

10GoogLeNet Processing

ICASSP-2017 81

Database:501248 spectrograms for training24352 spectrograms for validation51501 spectrograms for testing

ICASSP-2017 82

12 GoogLeNet Processing

Apply convolutions to extract primitives

such as edges

ICASSP-2017 83

such as edges

Object parts extracted

ICASSP-2017 84

such as edges

Full Spectral Features,e.g. phones, words

ICASSP-2017 85

such as edges

Refinement of accuracy

ICASSP-2017 86

such as edges

Dialect Classification

Loss1Loss2

Refinement of accuracy

ICASSP-2017 87

17NIST LRE Results

0 50 100

Accuracy – 83.99 (Top-1), 98.89% (Top-5)

English-South_asian_(Indian) Portuguese-

BrazilianSpanish-

Spanish-European Chinese-Min_Dong

Arabic-Modem_Standar

dChinese-

Cantonese Arabic-Egyptian English-

British Spanish-Caribbean Slavic-

Russian Arabic-Maghrebi

Chinese-MandarinArabic-Iraqi

English-American French-

West_AfricanChinese-

Wu Slavic-Polish French-Haitian Arabic-

Leventine

ICASSP-2017 88

What about Speech Recognition?We can use spectrograms to train a convolution net to learn to recognise languages and dialectsCan we apply the same idea to a speech recognition task?We decided to test this idea using the TIMIT corpusTIMIT is the most accurately transcribed corpus in existencePhoneticians exhaustively transcribed the position of the phones in every utterance of the corpus

ICASSP-2017 89

TIMIT Speech Corpus

1.4M spectrograms for the training set

Sliding window used for timing

4 to 5 phones in each window

61 Phoneme Classes

19 Speech Recognition

ICASSP-2017 90

Speech Recognition20

ICASSP-2017 91

ICASSP-2017 92

ICASSP-2017 93

Automatic Speech to Text transcription and Indexing

NTIMIT Speech Corpus Added

ICASSP-2017 94

Convolution networks can do more than classify static images

They were inspired from receptive fields which are temporal classifiers

Convolution networks automate the feature extraction process

Initial findings with NTIMIT (telephone speech version) hint at much more noise robustness than classical ASR approaches

Technological advances in deep learning such as with embedded deep learning are being designed with convolution networks in mind

Conclusions of these experiments with CNNs

What’s next?

ICASSP-2017 95

AttentionEncoder LSTMsSoftmax

Decoder LSTMs

xn xn-1 </s>

y1</s>

Automatic Speech to Text transcription and Indexing

Sequence to Sequence ModelEncrypted search based on phonetic strings

Lexicons do not contains every possible word

Developed Seq2Seq model for converting plain text to phonetic strings

ICASSP-2017 96

Privacy challenges in speech processing

• Privacy challenges in speech processing are similar to those for other forms of data

• We will assume the presence of two entities– A “capable” server entity

• “capable” in the sense of computationally powerful, and with memory and storage

– A “lightweight” client entity• “Lightweight” in the sense of weak computational abilities.

ICASSP-2017 97

Assumed framework

• Who owns what:• Query is always private to client

– Response to client may be private– Alternately, response may be exposed to server

• Data/model on server may be “owned” by ..– Server, not exposable to client– Client, not exposable to server– Third party, not exposable to server or client

Client Server

Model/Database Compute power

Response

ICASSP-2017 98

When response is private

• Computation must be performed on private query to obtain private results

• Model/Database may be private or public• Situation ideal for Homomorphic Encryption

Client Server

Response

ICASSP-2017 99

HE basic formalism

• The basic idea of HE still follows the original proposal by Craig Gentry

• The build on a somewhat homomorphic encryption scheme, e.g.

• To decode:

The problem with Somewhat HE• E[b] + E[c] = (b + 2rb +kbp) + (c + 2rc +kcp)

= (b+c) + 2(rb+rc) + (kb+kc)p

= (b+c) + 2r + kp = E[b + c]

• The noise increases by 1 bit– Addition of two numbers adds one bit

• E[b] . E[c] = (b + 2rb +kbp).(c + 2rc +kcp)

= (bc) + 4(rbrc) + kp + 2(other terms)

= E[bc]• The noise doubles in bits

– Multiplying two numbers doubles the number of bits

cryptographic solutions© Chollet, Petrovska, Raj

ICASSP-2017 101

Problem with Somewhat HE and bootstrapping

• An L-level circuit– Each level comprises multiplications

• At each level, the number of bits doubles• Very soon we run out of bits

– The level of noise is greater than the width the encryption

• Solution: Bootstrapping– DECRYPT in the encrypted domain

ICASSP-2017 102

Bootstrapping• Decryption is just another arithmetic operation• Like any other arithmetic operation, it can be performed in

the encrypted domain using HE to obtain an encrypted result

• But the result will the the encrypted decryption of the encrypted data– I.e just another encryption of the original data– But with reset noise

• Bootstrapping permits computation of circuits of arbitrary depth, but at a great cost

ICASSP-2017 103

The cost of HE

• The computation speed of HE is measured via the per-gate computation time (ratio of encrypted to cleartext computation time)– Expressed as , where is a security parameter (typically

100)• Actually a polynomial in security parameter lambda

– Bootstrapping: Original – Bootstrapping also requires hiding the private key in

the public key and hoping subset sum is complicated

ICASSP-2017 104

A less general solution

• If the depth of the circuit is known a priori, design the encryption such that the noise does not fold over within the required computations– Encryption customized to circuit depth

ICASSP-2017 105

Closer to state-of-the-art

• Brakerski Gentry Vaikuntanathan (BGV) encryption– Uses the learning with error problem as basis for encryption– Uses a remodularization step to reduce noise increase with

computations• Increases linearly with layers, instead of exponentially

– Customizes encryption to circuit depth– Processes multiple bits simultaneously

• Enables parallel computation

• Improved speed BGV: or ( is depth of circuit)• No need for public key to carry private key• Limitation: Circuit depth

ICASSP-2017 106

Publicly available tools• HELib

– Brakerski Gentry Vaikuntanathan (BGV) encryption

• https://github.com/shaih/HElib• Incorporate it into your code and test• Typical results

– Speed (from “Subring Homomorphic Encryption” by Arita and Handa) : time in ms, on 2.8ghz Celeron

– Circuit depth limitation also remains

ICASSP-2017 107

HE: Other Limitations• Can only perform computations that can be expressed as

polynomials• Cannot perform branching

– Generalized implementation of “IF” not possible• Can be done through expensive circuit expansion

• No arbitrary looping– Cannot do the if required

• No binary search• Division is not possible• No max/argmax• Limit on number of operations

ICASSP-2017 108

Public tools: Cryptonet

• Fully-homomorphic implementation of Convolutional Neural Networks– Replaces RELU activation with polynomial– Replaces Maxout with mean operation– Fixed depth of circuit– Weights are assumed to be cleartext, do not consume bits of noise– BGV encryption optimized for network depth– No argmax/softmax in the final layer; entire output returned to user

• Possible to perform various computer-vision-like operations on encrypted data

ICASSP-2017 109

When the outcome is made available to the server

• This situation is more amenable to functional encryption

Client Server

Response

ICASSP-2017 110

Functional Encryption• An encryption scheme that permits the server to evaluate user-

specified functions on private data.• Client produces

– Public key pk

– Function specific secret key sk

• Specific to the function f()

– Public-key encrypted data may be stored on the server• By anyone, including server

• Server computes– Dec(E[x],sk) to obtain f(x)

– Learns nothing else

ICASSP-2017 111

Functional Encryption

• A variety of FE methods have been proposed– Can compute arbitrary Boolean functions on public

(exposed to server) index– Can only compute limited functions on private indices

• Inner products, simple Boolean functions

• Restrictions:– Client in charge of generating function-specific key– Only simple functions– Useful for some kinds of tasks, e.g. mining multi-user data

The “problem” with conventional encryption

• The Encryption of any number X carries NO information about the encryption of any other number Y– Or even of a second encryption of X itself, if the encryption is

semantically secure– By requirement!

Enc[X]

Information Theoretic Security

• The mutual information between the encryption of any two messages is 0– Regardless of the distance between them

• The MI between the encryption of any group of messages is 0• This ensures that you cannot learn about the data simply by viewing large

numbers of encrypted messages and studying their patterns

d(X,Y)

MI(Enc[X], Enc[Y])

Semantic Security

• Given: The encryption of X, and the encryption mechanism – i.e. given only the public key

• But not the decryption key• If we scan the space with a “probe” Y by encrypting Y and comparing to the

Encryption of X we will not find X even if Y = X!!

ICASSP-2017 115

An alternate solution: Hashing

• An alternate form of encryption: Cryptographic hashes– Weaker requirements: repeatable encryption– Not invertible– Very fast– Can we use these instead of encryption?

Cryptographic hash: Basics• A cryptographic hash function maps variable length clear text messages

to a fixed length cipher text

• Not necessarily invertible– Encryption schemes can be cryptographic hashes, but not all cryptographic

hashes are encryption schemes

• The bit pattern of the hash is random– And uninformative about the underlying cleartext– Repeated hash of the same string results in the same output

• E.g. MD5, SHA-1, SHA-2, SHA-3

Hashing: Weakening privacy

• Given: The encryption of X, and the encryption mechanism • If we scan with probe Y, we will find X

– Enc(Y) = Enc(X) if Y = X– Enables authentication using passwords

Information Theoretic Security: Hashing

• The mutual information between the encryption of any two messages is 0– Regardless of the distance between them

• Except if d(X,Y) = 0

d(X,Y)

MI(Enc[X], Enc[Y])

ICASSP-2017 119

Still not ok

• Exact matches rarely occur in pattern matching tasks like speech processing

The problem: revisited

• The computational and information theoretic challenges arise because of the attempt at perfect security– Traditional encryption and hashing schemes attempt to hide

all information about the original data

Enc[X]d(X,Y)

MI(Enc[X], Enc[Y])

The bear and the hunters

• Pragmatic solution: Be more secure than the next guy– Don’t be the easiest target

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract

geometry of high-dimensional data

d(X,Y)

MI(Enc[X], Enc[Y])

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract

geometry of high-dimensional data

d(X,Y)

MI(Enc[X], Enc[Y])

Leaky region

• Hashing techniques that only reveal information if vectors are very close?– Still an exponentially hard problem to extract geometry of high-

dimensional data

• With user-selected leakage?

d(X,Y)

MI(Enc[X], Enc[Y])

Leaky region

Information Leakage

• Fully secure hash: Will not know anything about X unless Y = X• Leaky hash: Will get some information about X if Y is within D of X

ICASSP-2017 126

Challenges

• How do we design such a hash• Particularly one that works on real-valued

vectors..

LSH with Euclidean Distance

• A vector X gets converted to a vector of M numbers H(X) = [h1(X) h2(X) h3(X) … hM(X)]

• Vi is a random vector drawn from a normal distribution

• bi is a random number between 0 and w• w is the quantization width

wbVXbVXhXh ii

iiii ),;()(

Euclidean LSH

• A 2-D example

Euclidean LSH

• A 2-D example• To calculate the first component in the hash key: h1(X)• Generate random vector V1 and bias b1

– (V1, b1) are the user’s private parameter

Euclidean LSH

• A 2-D example

• “Stripe” the space orthogonal to the vector V1

• Count stripes starting from bias location

• The first component in the hash key ID of its stripe : h1(X) = 1

0 1 2 3 4 5-5 -4 -3 -2 -1

Euclidean LSH

• A 2-D example• The second component in the hash key : h2(X) = -2

– (V2, b2) are also user’s private parameter

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

Euclidean LSH

• The two-component hash:H(X) = [h1(X) h2(X)] = [1 -2]– [(V1, b1), (V2, b2)] are the user’s private parameter

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

Euclidean LSH

• H(X) = [1 -2]• All vectors in the highlighted cell will have the

same LSH key

0 1 2 3 4 5-5 -4 -3 -2 -1

V20 1 2 3 4 5-5 -4 -3 -2 -1-8 -7 -6

Trivial distance computation with Euclidean LSH

• If two vectors are in the same cell, they will have identical LSH keys

• As vectors move away, the Manhattan distance between their hashes increases

The size of the cell

• Increasing the number of components in H(X) makes the cell smaller

• H(X) = [h1(X) h2(X)]

= [ 1 -2]

• H(X) = [h1(X) h2(X) h3(X)]

= [ 1 -2 7]

• H(X) = [h1(X) h2(X) h3(X) h4(X)]

= [ 1 -2 7 0]

Randomness in the size of the cell

• Increasing key length reduced cell size

• Reduced cell size more likely that two vectors that fall in the same cell (have same LSH key) are very close

• Also makes it more likely to miss valid vectors– Which may fall outside the cell simply

because of the vagaries of its shape

Adapting LSH: Secure Modular Hashing

• Modular quantization of randomly shifted random projections of data

Secure modular hashes

• Conventional LSH

0 1 2 3 4 5-5 -4 -3 -2 -1

Secure modular hashes

• Conventional LSH

0 1 0 1 0 11 0 1 0 1 0 1

Secure Modular Hashes

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

• Solution: Banded Hashing– Euclidean LSH with binary output Q(X)=[1,1]– Images of the green region are indistinguishable

V20 1 0 1 0 11 0 1 0 10 1 0 0 1

0 1 0 1 0 11 0 1 0 1 0 1

Information Leakage

• Only provides information about other vectors that lie within small ball

• Plot of Hamming(Q(X),Q(Y)) vs Euclidean d(X,Y) for different values of

D, and different numbers of bits in Q(X)

Simulations: L-dimensional vectors, M bit hashes

Summary of information hiding methods

• Conventional encryption– Not useful

• Homomorphic encryption– Secure, expensive, only reveals

outcome of computation to client/user– Many limitations in usage

• Secure multiparty computation– Security related to cost. – Outcome of computation to either

client or server– Technically unlimited in usage

• Functional encryption– Secure, expensive, only reveals

outcome of computation to server– Many limitations in usage

• Hashing– Less secure, very low cost, result of

computation to server– Only permits distance computation

and comparison

ICASSP-2017 148

Applying these to speech tasks

• How feasible is private speech processing now?

• We will assume the presence of two entities– A “capable” server entity

• “capable” in the sense of computationally powerful, and with memory and storage

– A “lightweight” client entity• “Lightweight” in the sense of weak computational abilities.

• Server and client do not trust one another

ICASSP-2017 149

Assumption in what follows(and what was presented)

• User Alice and System Bob• User has a smart phone or computation capable device

– Communicates with server using this device

• User’s client device also performs feature computation and all other necessary computation

• Client may perform one-time expensive computation in setup stage

ICASSP-2017 150

Automatic speech recognition

• System has models– Traditional : Acoustic and language models– End-to-end neural net based: Neural network– Client has audio, requires recognition output

Client Server

Compute power

Response

ICASSP-2017 151

6 Applying HE to ASRUser generates and transmits data

System performs conventcomputations

Returns vector of probabilities to user

Can work on short data blocks

Can perform low-perplexity tasks

E.g. phone recognition

result

ICASSP-2017 152

Feasibility of Private ASR• Homomorphic Encryption:

– Feasible for small, low complexity tasks– Currently not feasible using any formalism in generic setting

• Not a computational limitation; limitation results from inherent limitations of FHE

• SMC based solutions:– Client and server perform computations collaboratively using SMC protocols– Theoretically feasible

• Possible for small grammars• Impractical in more general settings

– Extreme communication and computational overhead, particularly on client

• Will require devising of zero-knowledge proofs to verify computation

ICASSP-2017 153

Speaker Mining

• Server possesses a speech database– Ownership issues to be considered shortly

• Client mines it for a particular speaker without revealing query or response to server

Client Server

Data owned by server Compute

Response

ICASSP-2017 154

Speaker Mining

• Setup : Server retains appropriately parameterized version of speech– E.g. I-vectors

• Client queries with similarly parameterized query vector• Server computes response by direct matching or classification

Client Server

Response

ICASSP-2017 155

Speaker Mining: Server owns data

• Server possesses a speech database owned by server• Client mines it for a particular speaker without

revealing query or response to server

Client Server

Response

ICASSP-2017 156

Feasibility• Setup:

– Client obtains model for the speaker– Server evaluates it on entire corpus

• Homomorphic Encryption:– Feasible under specific formalisms

• Very slow

• SMC based solutions:– Theoretically feasible under specific formalisms

• Practical under “honest but curious” assumption of security

– Very slow for large corpora

ICASSP-2017 157

Speaker Mining: Client owns data

• Server possesses a speech database owned by client• Client mines it for a particular speaker without

revealing data, query or response to server

Client Server

Data owned by client Compute

Response

ICASSP-2017 158

Feasibility• Homomorphic Encryption:

– Feasible but impracticably slow

• SMC based solutions:– Feasible, but slow– Insecure

• Hashing based solutions:– Client stores hashes of i-vector representations of speech on server– Client matches query i-vectors of recordings from speaker to recover other

recordings by speaker in the server corpus– Feasible, practicable– Potential security issues over many searches

ICASSP-2017 159

Speech Mining

• Server possesses a speech database– Ownership issues mentioned shortly..

• Client mines it for a particular speaker without revealing query or response to server

Client Server

Response

ICASSP-2017 160

21 Mining

Server can potentially compute phonetic recognitionon blocks on audio homomorphically without seeing audio

Challenges: how to mine them

ICASSP-2017 161

Mining Speech: Server owns data

• Server possesses a speech database owned by server• Client mines it for words/phrases without revealing them to server• Fundamentally no different from client owning data

– Data must be encrypted prior to processing to hide response from server

Client Server

Response

ICASSP-2017 162

Mining Speech: Client owns data

• Server possesses a speech database owned by client– May not “see” it

• Client must mine it for patterns– Result private to client

Client Server

Compute power

Response

Data owned by client

ICASSP-2017 163

Feasibility• Approach:

– Client stores phoneme decode of data on server in encrypted form

– Client searches for other phonetic patterns later, also privately

• SMC based solutions:– Theoretically feasible, impractical

• Homomorphic Encryption:– Feasible, slow

Mining Speech: Third party owns data

• Server possesses a speech database owned by one or more third parties– May not “see” it

• Client must mine it for patterns– Only works if

• Query and result may be exposed to server• Query may be exposed to third parties

ICASSP-2017

Client Server

Compute power

Response

Data owned by third party

ICASSP-2017 165

Feasibility• Approach:

– Data owners upload to server encrypted– Client/server broadcasts query– Data owners provide access

• Functional encryption based solution– Feasible, but will not scale

ICASSP-2017 166

Verification

• Client enrolls with server using private speech data– Server never sees data in the clear

• Client attempts to authenticate using private data– Server never sees data– Server authenticates (result with server)

Client Server

Compute power

Response

ICASSP-2017 167

Speaker Verification

• Client computes parameterization of voice (e.g. I-vector) and sends to server for registration

• Client queries with similarly parameterized query vector• Server matches to model to decide to authenticate

Client Server

Response

Registration Model

ICASSP-2017 168

Feasibility• SMC based solutions:

– Theoretically feasible, impractical

• Homomorphic Encryption– Feasible, but impractical

• Functional encryption – Client computes features (e.g. I-vectors) during enrollment– Client ships features (e.g. I-vectors) for authentication

• Sever sees neither

– Feasible, but expensive

• Hashing– Same setup as for functional encryption

– Feasible and currently practicable, minimal computational overhead

ICASSP-2017 169

SOME CONCLUSIONS AND DISCUSSIONSSection 4

ICASSP-2017 170

Some conclusions and discussions

• So where are we now ?– What can we solve ?– Where must we go ?

• What if homomorphic encryption becomes a reality ?

• The immediate and the distant future

ICASSP-2017 171

State of the union and future

• A variety of tools exist– But are generally too inefficient or limited in scope– Not sufficiently advanced to provide generic solutions– Rapid advances occurring

• The majority of applications remain infeasible– Primarily for computational reasons– But also for theoretical limitations of current tools– HE and FE are partial solutions, but may never provide complete

solutions• Theoretical limitations

ICASSP-2017 172

But the problem is real and serious

• Speech-based services more popular than before• The issues of privacy and security are increasingly

relevant• Future solutions:

– Improvement in current tools– Improvement/modification in the manner in which we

perform speech tasks, to make them better suit tools– Development must happen in tandem– Much work remains

ICASSP-2017 173

Links :

• Lecture of Bhiksha Raj on YouTube : https://vimeo.com/87341704

• With the slides at :http://mlsp2012.conwiz.dk/fileadmin/lectures/mlsp2012_Raj.pdf

ICASSP-2017 174

Further readings[Aguilar, 2013] Aguilar-Melchor C., Fau S., Fontaine C., Gogniat G. & Sirdey R. “Recent Advances in Homomorphic Encryption: a Possible Future for Signal Processing in the Encrypted Domain”, IEEE Signal Processing Magazine Vol 30:2.

[Boufounos, 2011] Boufounos, P. & Rane, S. “Secure Binary Embeddings for Privacy Preserving Nearest Neighbors”, in Proc. IEEE Workshop on Information Forensics and Security, Brazil, Dec. 2011. MERL TR2011-077

[Gentry, 2009] Gentry, C. “A fully homomorphic encryption scheme”, PhD thesis, Stanford

[Gomez, 2016] Gomez-Barrero, M., Fierrez J. & Galbally, J. “Variable-length Template Protection based on Homomorphic Encryption with Application to Signature Biometrics”, 4th International Workshop on Biometrics and Forensics.

[Jimenez, 2015] Jimenez A., Raj B.“Secure Modular Hashing”, Proc. IEEE Workshop on Information Forensic and Security.

[Jimenez, 2017] Jimenez A., Raj B.“Privacy preserving distance computation using somewhat trusted third parties”, Special Session on « Privacy Preserving Signal Processing », ICASSP

[Naehrig, 2011] Naehrig, M., Lauter, K. & Vaikuntanathan, V. “Can Homomorphic Encryption be Practical ?”, Proc. Of the 3rd ACM Workshop on Cloud Computing Security, pp 113-124

ICASSP-2017 175

Further readings[Pathak, 2013] Pathak M., Raj B., Rane S., Smaragdis P.“Privacy-preserving speech processing: cryptographic and string-matching frameworks show promise”, IEEE Signal Processing Magazine 30:2, pp. 62-74, March 2013

[Portelo, 2015] Portelo J., Trancoso I., Raj B.“Logsum using Garbled Circuits”, PLoS one, 10(3): e0122236

[Rane, 2013] Rane, S & Boufounos, P-T. “Privacy-preserving Nearest Neighbor Methods: Comparing Signals without revealing them”, MERL, TR 2013-004, IEEE Signal Processing Mag. Vol 30:2, pp. 18-28, Feb 2013.

[Smaragdis, 2007] Smaragdis, P. & Shashanka, M. “A framework for Secure Speech Recognition”, IEEE Trans. ASLP Vol 15:4, pp 1404-1413.

[vanDijk, 2010] Van Dijk, M. & Juels, A., “On the impossibility of cryptography alone for Privacy Preserving Cloud Computing”, Proc. Of HotSec.

[Xie, 2015] Xie, P., Bilenko, M., Finley, T., Gilad-Bachrach, R., Lauter, K. & Naehrig, M., “Crypto-Nets: Neural Networks over Encrypted Data”, ICLR, arXiv: 1412.6181

[Zyskind, 2015] Zyskind, G., Nathan O. & Pentland, A., “Decentralizing Privacy using Blockchain to Protect Personal Data”, IEEE Workshop on Security and Privacy, San Jose

ppsp icassp17v10

Software

nova skripta ppsp zasticena

road map program ppsp

ppsp - sistemas.anac.gov.br

sk ppsp 2013 badung

tahapan program ppsp, tahap implementasi

6045 p1 ppsp adminisitrasi perkantoran

problem statement of p2p streaming protocol (ppsp)...

sasaran program ppsp di daerah

uu 19 2000 ppsp

week dates inf 2 irf imd course codes - ips · nmr :...

o grupo prÉ-70 do ppsp - discrepantes.com.br§ão... · o...

1289 p2 ppsp teknik kendaraan ringan

week dates inf 1 irf imd - universiti sains malaysia ·...

presentasi umum data sekunder ppsp

year 1 sem 1.pdf · nmr : nuclear meeting room (ppsp) dk4 :...

2089 p3 ppsp multimedia

fasilitator ppsp dalam pembangunan sanitasi

1289 p1 ppsp teknik kendaraan ringan

sistemsurveilen&dan&pengawasan&ppsp&...

1289 p3 ppsp teknik kendaraan ringan