a deep dive into dex file format--chiossi
TRANSCRIPT
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
1/22
Rodrigo Chiossi ABS 2014
A deep dive into DEX file format
Rodrigo Chiossi
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
2/22
Rodrigo Chiossi ABS 2014
Bio
● Rodrigo Chiossi
– Android Engineer @ Intel !C
– AndroidXRef
● """#android$ref#%om
– De$terit&
● https'((gith)*#%om(r%hiossi(de$terit&
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
3/22
Rodrigo Chiossi ABS 2014
vervie"
● DEX +ile Str)%t)re
– Chara%teristi%s
– ,EB12-
–
Relative Inde$ing – ./!+-
– !he Big 3eader and the data#
● DEX Instr)mentation
– !he String Add %ase
● DEX ,imitations
– Bitness restri%tions
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
4/22
Rodrigo Chiossi ABS 2014
DEX Str)%t)re
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
5/22
Rodrigo Chiossi ABS 2014
DEX roperties
● Red)%ed .emor& +ootprint
– ,EB12- en%oding
– Relative Inde$ing
–
Single file for all %lasses 5vs# 1 file per %lass in #%lassformat6
– 7o d)pli%ate strings
● .odified /!+- String En%oding
● Stri%t re8)irements for alignment● Even more stri%t r)ntime verifier 5De$pt6
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
6/22
Rodrigo Chiossi ABS 2014
,EB12-
● Encoding format from DWARF3.
● Used to encode signed (SLEB128 andULEB128p1 and !nsigned (ULEB128 n!m"ers.
● Used in DE# for encoding 32$"it n!m"ers.● Numbers are encoded using 1 to 5 bytes.
– Depending on t%e %ig%est ‘1’$"it
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
7/22
Rodrigo Chiossi ABS 2014
,EB12- E$ample
3EX BI7 S,EB12- /,EB12- /,EB12-p1
00 00000000 0 0 1
01 00000001 1 1 0
9f 011111111 1 129 12:
-0 9f 10000000
011111111
12- 1:2;: 1:2;;
● $1 is !sed to represent t%e &')&DE# *a+!e.
● Encoded as ULEB128p1, &')&DE# re-!ires on+ one"te to "e encoded.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
8/22
Rodrigo Chiossi ABS 2014
Relative Inde$ing
● .an& DEX o*
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
9/22
Rodrigo Chiossi ABS 2014
Relative Inde$ing E$ample
+ield ID +ield 7ame
###
1024 field>1
102; field>2
###
10?: field>?
###
● +ield ,ist'
– +ield>1= field>2= field>?
● En%oding'
– 1024= 1= 11
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
10/22
Rodrigo Chiossi ABS 2014
.odified /!+-
● /sed for en%oding all strings in the DEX format#
● Chara%ters ma& have 1= 2 or ? *&tes#
● Strings are terminated *& a single n)ll *&te#
● hen parsing string>data>item= the )ft1:>siefield %annot *e )sed to %al%)late the sie of thefollo"ing data as it onl& represents the n)m*er
of %hara%ters in the ./!+- string#● ASCII strings are ./!+- legal strings
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
11/22
Rodrigo Chiossi ABS 2014
!he Big 3eader
● Besides the header>item= "e have si$ other str)%t)resthat des%ri*e the DEX file'
– string>id>item list
– t&pe>id>item list
– proto>id>item list
– field>id>item list
– method>id>item list
– %lass>def>item list
● !his str)%t)res define all the f)n%tional %ontent of theDEX file#
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
12/22
Rodrigo Chiossi ABS 2014
!he .ap
● !he DEX file ma& %ontain an optional str)%t)re%alled the .ap= %omposed *& map>itemstr)%t)res#
●
!he .ap str)%t)re %ontains information a*o)t allthe offsets in the file and "hat is the t&pe of%ontent in that offset#
● Although optional according to the file format
specification, the existence and correctnessof the map is enforced by DexOpt.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
13/22
Rodrigo Chiossi ABS 2014
!he Data
● All the %ontent of the DEX file not in the *igheader goes to the Data area#
● ffsets to str)%t)res in the data area m)st *e
*igger than the end of the *ig header# !hispropert& is enfor%ed *& De$pt#
● It is o to have gaps in the middle of the datase%tion#
● !he map is part of the data area#
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
14/22
Rodrigo Chiossi ABS 2014
!he ,in Data
● ptional area at the end of the Data area#
● +ormat )nspe%ified#
● 7ever present in 7ormal aps#
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
15/22
Rodrigo Chiossi ABS 2014
DEX Instr)mentation
● Case St)d&' String add
– String manip)lation is re8)ired for mosto*f)s%ation(deo*f)s%ation te%hni8)es#
– Can *e e$tended for repla%ing and removingstrings#
● *
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
16/22
Rodrigo Chiossi ABS 2014
String Str)%t)re
● Represented " t%e pair (string_id_item,string_data_item
● string_id_item +ist m!st "e sorted
– Sorted " t%e !tf1/ code points of t%e string
● Strings are referenced " its inde0 position in t%estring_id_item +ist.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
17/22
Rodrigo Chiossi ABS 2014
Adding a string>id>item
● !st "e added in t%e position of t%e +ist t%at i++ eep t%e +istsorted.
● 4eader ad5!stments6
– Data o7set.
– Fi+e sie.● aps ad5!stments6
– string_id_item map sie.
● Entire 9+e ad5!stments6
–
'7sets references in data area m!st "e s%ifted : "tes. – String references e-!a+ or "igger t%an t%e added string m!st "e
increased " 1.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
18/22
Rodrigo Chiossi ABS 2014
,EB12- E$pansion
● Some o7sets are encoded as ULEB128.
– E.g. code_of inside encoded_method o"5ect.
● Some stringiditem references are encoded asULEB128.
– E.g. name_idx inside annotation_element o"5ect.
● After s%ifting o7sets or increasing string_id_item references, t%e sie of t%e LEB128 in "tes maincrease.
●
)f t%e e0pansion occ!rs, f!rt%er s%ifting of o7sets isneeded in t%e 9+e.
● aps sie and o7set m!st "e !pdated.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
19/22
Rodrigo Chiossi ABS 2014
Alignment
● Some str!ct!res in t%e DE# 9+e m!st "e :$"tea+igned.
– E.g., code_item.
● string_id_item is :$"te in sie, so adding a ne
o"5ect i++ not misa+ign t%e DE#.● LEB128 e0pansion i++ often add 1 "te s%ifting, %ic%
i++ "rea a+ignment.
● )f rea+ignment is re-!ired, o7set references m!st "e
!pdated.● aps sie and o7set m!st "e !pdated.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
20/22
Rodrigo Chiossi ABS 2014
Adding a string>data>item
● !st "e inside t%e data area.
● 4eader ad5!stments6
– Data sie.
– Fi+e sie.
● aps ad5!stments6
– string_data_item map sie.
● Entire 9+e ad5!stments6
– '7sets references after t%e o7set of t%e ne string_data_item m!st "e s%ifted" t%e sie of t%e added o"5ect.
– String references e-!a+ or "igger t%an t%e added string m!st "e increased " 1.
● ;%ec for LEB128 e0pansion and app+ s%ifting.
● ;%ec for a+ignment and app+ s%ifting.
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
21/22
Rodrigo Chiossi ABS 2014
DEX Bit Restri%tions
● ?2 *its en%oding
– Stati% fields "ith fi$ed ?2 *it sie 5E#g#string>id>item6#
–
ffsets e$pe%ted to *e "ithin ?2 *it range#● ,ess than ?2 *its en%oding
– Class= t&pe= proto and other lists alie are limited to1: *its in sie#
-
8/19/2019 A Deep Dive Into Dex File Format--chiossi
22/22
Rodrigo Chiossi ABS 2014
Rodrigo Chiossi
r.c%iossi