csce 5610 project report - gozick

26
Embedded Architecture Comparison: A Smartphone Approach CSCE 5610 Computer Architecture Project Dr. Kavi 5/5/2011 Brandon Gozick With smartphones increasing popularity each month, we breakdown the hardware attributes of the most popular Android based smartphones the past two years. After a brief introduction on relevant architecture, we implement and present results on multiple benchmarks that aim to test the performance of the CPU, I/O operations and Memory performance. An overall comparison of each phone is given along with a conclusive evaluation of the best performing phone which we then correlate that with it hardware capabilities.

Upload: gooseneck06

Post on 28-Nov-2014

298 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSCE 5610 Project Report - Gozick

Embedded Architecture Comparison: A Smartphone Approach

C S C E 5 6 1 0

C o m p u t e r A r c h i t e c t u r e

P r o j e c t

D r . K a v i

5 / 5 / 2 0 1 1

Brandon Gozick

With smartphones increasing popularity each month, we

breakdown the hardware attributes of the most popular

Android based smartphones the past two years. After a

brief introduction on relevant architecture, we implement

and present results on multiple benchmarks that aim to

test the performance of the CPU, I/O operations and

Memory performance. An overall comparison of each

phone is given along with a conclusive evaluation of the

best performing phone which we then correlate that with it

hardware capabilities.

Page 2: CSCE 5610 Project Report - Gozick

Table of Contents Introduction .................................................................................................................................................. 2

Relevant Architectures .......................................................................................................................... 2

ARMv6 ................................................................................................................................................... 4

ARMv7 ................................................................................................................................................... 5

Related Work ................................................................................................................................................ 6

Experimental Setup ....................................................................................................................................... 7

Linpack .................................................................................................................................................. 9

Nbench .................................................................................................................................................. 9

Quadrant ............................................................................................................................................. 10

DHTDroid ............................................................................................................................................. 11

Results ......................................................................................................................................................... 11

Linpack ................................................................................................................................................ 12

Quadrant ............................................................................................................................................. 13

Nbench ................................................................................................................................................ 14

DHTDroid ............................................................................................................................................. 15

Problems and Conclusion ............................................................................................................................ 17

References .................................................................................................................................................. 19

Appendix A .................................................................................................................................................. 21

Page 3: CSCE 5610 Project Report - Gozick

Introduction A smartphone is a mobile phone which offers advanced capabilities beyond that of a typical mobile

device used the past several years. This smartphone, in most cases today, often contains functionality

similar to that of a personal computer (PC). There is no industry standard definition of a smartphone,

except for a public acceptance of being able to accomplish tasks they would normally perform at a

desktop or laptop. A smartphone that is portable or mobile and able to accomplish daily and even

complex tasks is an ideal tool for an always on the go society. For most, a smartphone is a phone that

runs a complete and efficient operating system providing an easy to use interface and a dedicated

platform to attract application developers leading to an increase user base [1].

Mobile smartphones today are abundant among the population and only increasing in umber. With this

rise in popularity, the demand for these multi-tasking machines have also increased, and at a dramatic

rate. This has resulted in an extreme growth in embedded architectures. These enhancements have led

to increasing speed and response time respectively while the demand for the next best thing is still

conveyed by the smartphone users, an ever increasing percentage of the current population between

the ages of 15-24. It’s reported that 84% of mobile phone users in the United States are between these

ages own a smartphone [2]. Currently in the United States, over 86% of all mobile traffic originates from

a smartphone [3]. With this age group getting older, the demand for more complex processor

architectures will continue to grow. This growth however does have pressure points in which limit the

continued success that is wanted by so many.

With the effective development of smartphones, devices now begin to incorporate more and more

functionality. The main concern and problem seen by many users, is that the more features the

smartphone is capable of and carried out by the user, the more processing chips and processing cycles is

required in hardware. With this increase means an increase in power consumption. In other words, the

more hardware required by greater functionality, the more battery power is needed. This is a directly

positive relationship and is one of the greatest concerns when creating an embedded architecture,

especially a processor implemented in a smartphone. This idea creates limitations when designing an

embedded architecture and so we have to take into account the purpose of the device. The main reason

behind these hardware limitations is due to the lack of efficiently from the battery. Insufficient battery

power plagues embedded devices and especially mobile phones and smartphones. Many CPU

manufacturers are forced to underclock the CPU’s designed by ARM and others, to save power drained

from the battery. For this purpose, manufacturers have underclocked the CPU’s used, such as the

Motorola Droid clocked at 600MHz and underclocked to 550MHz for better performance by the user. To

follow this idea, many software developers have also extended this idea creating switchable CPU speeds

to overclock or underclock the processor on the fly based on the current task at hand.

Relevant Architectures

There are a few main embedded architectural design companies who create blueprints for today’s

smartphone processors. One company which is dominant today, housing its architecture in over 85% of

android smartphones is the Advanced RISC Machine (ARM). ARM utilizes a 32-bit reduced instruction set

computer (RISC) instruction set architecture (ISA) for most of application processors [4].

Page 4: CSCE 5610 Project Report - Gozick

The two notable ARM processors that we focus on are the ARMv6 and ARMv7 architectures

producing ARM11 and Cortex processors respectively. We illustrate ARM’s processor and architecture

layout in Figure 1. This presents ARM’s processor production line for the past few years featuring classic,

application and embedded processors. For this project and smartphone implementation, we focus on

ARM’s Classic and Application Processors which feature the ARMv6 and ARMv7 architectures

respectively. We will describe an overview of ARM architectures in general followed by a more detailed

analysis each architecture.

Figure 1 – ARM Processor Layout illustrating Classic, Application and Embedded processors. ARM architectures

are seen in the middle in gray text with the processors listed in its respective category above and ARM

architectural details below.

ARM has developed their architecture to be implemented across a very wide range of devices which

all require different performance need and the way we evaluate this performance. From embedded

devices such as DVD players, set-top boxes, televisions, mobile phones and even laptops, an ARM

processor is apparent in almost all our embedded devices we use every day. With these embedded

devices very much different than power hungry desktop computers, ARM has found a way to output

comparable performance without the necessity of a powerful power source. ARM’s simplistic

architectural design has resulted in processors that are on the level of performance requirements of

most today’s laptops, while consuming very little power. This low power consumption is the key idea

was stands ARM out from other design companies and what attracts today’s mobile phones to embed

their processors with a 32-bit ARM architecture. The reduced instruction set computer attributes allow

great functionalities which introduce a large uniform register file. The load and store architectural

feature provides data processing ease as the operations only use register contents rather than utilizing

direct memory contents. Addressing modes are designed to be simple not allocating a lot of space but

accomplishing tasks with minimal terms. With this, the loads and stores are developed in the register

contents and instruction fields only. The capability to load and store multiple instructions at the same

Page 5: CSCE 5610 Project Report - Gozick

time while performing conditional executions of instruction sets have maximized data throughput and

execution throughput respectively. Some of the following accessible features available and can be found

in the ARM architecture are:

From Figure 1 we can see the processors under the ARMv6 and ARMv7 architectures. We present

Figure 2, the ARM processor flow chart which illustrates increasing functionality and performance of the

processors along with the capability. We can see the ARM11 which utilizes the older ARMv6 architecture

is listed under the “classic” section while the newer “application” processors house the ARMv7

architecture of the Cortex A8 processor which we will focus here in this project. To better obtain an idea

why the Cortex A8 is used more today than the ARM11, we compare some aspects of both the ARMv6

and ARMv7 architectures.

Figure 2 - ARM Processor Flow Chart illustrating increasing performance and increasing capabilities of each processor. Most of the phones we use in this study utilizing the Cortex A8 processor which is an outdated processor according to ARM. With newer phones releasing with greater processor, we can see a significantly greater performance level.

ARMv6

The ARMv6 architecture utilized data operations such as the Single Instruction Multiple Data (SIMD)

technique. With a SIMD implementation a level of parallelism is achieved with multiple processing

elements performing a simultaneous instruction operation on multiple data entities. This technique

allowed for performance on a mobile phone which was never seen before. ARM has been working to

extend this work with a Multiple Data Multiple Instruction (MIMD) architectural implementation.

ARMv6 was design for the combination of low cost and high performance with a newly designed 32-bit

Level 1/2 Instruction Cache

Level 1/2 Instruction TLB

Level 1/2 Data Cache Refill

Level 1/2 Data Cache Access

Level 1/2 Data TLB refill

Mispredicted Branch Execution

No Prediction Branch Execution

Cycle Count

Predictable Branch Execution

Level 1/2 Data Cache Write-Back

Page 6: CSCE 5610 Project Report - Gozick

device. The previous market releasing featured a dominating population of 8-bit devices which had no

match for an ARM11 processor. It also featured an 8-bit stage pipeline along with a variable cache and

memory management unit which helped utilizes memory storage and performance. We list a few more

characteristics which helped make the ARMv6 architecture a dominate presence in the market.

ARMv7

ARMv7 features more a more advance technology then ARMv6 which produces a significantly

greater performance in processing as well as utilizing less power than the previous architecture. ARMv7

produces a new line called Cortex. Cortex utilizes a 16 and 32 bit instruction set providing all the useful

advantages of RISC while also having the advantage of a small code size with the 16-bit Thumb

instruction set architecture adding over 120 instructions. ARMv7 produces processors with a speed

range of 600MHz to 1GHz and utilized for applications requiring a 2000 DMIPS. This low power design

achieves high levels of performance achieved by the combination of a dual issue integer pipeline, an

integrated L2 cache and an efficient 13 stage pipeline. With the dual issue integer pipeline, ARM

introduces a superscalar pipeline which maximizes the use of operations by having the ability to issue

two instructions at the same time. ARM also features a dual ALU pipeline which is symmetric and

capable of handling most arithmetic instructions quick and efficiently. Branch prediction is increased

with the addition of the 13-stage pipeline which operates at a higher frequency than previous

architectures. To minimize branch prediction misses, the Cortex-A8 processor implements a two level

global history branch predictor which consists of a Branch Target Buffer (BTB) and a Global History

Buffer (GHB). Both of these structures can be accessed in parallel with the instruction fetches producing

a high optimized pipeline cycle. An example of this is shown below in Figure 3.

Figure 3 – Branch Prediction of the ARMv7 architecture which features the instruction fetch, decode, and the execution of the load and store instruction with both the ALU pipes present.

ARMv7 has a single cycle load use penalty for a fast access to the level 1 cache which is 16k or 32k

configurable featuring a 4-way set associative configuration. The data cache in the level one has a write

back feature with no write allocations. ARMv7 features an integrated level 2 cache giving it a dedicated

8-stage pipeline

SIMD capability

Enhanced DSP instructions for

increased performance

Variable Cache and Memory Management

Unit

Typical DMIPS of 965 at Max

Utilized at CPU speeds up to 600Mhz.

Page 7: CSCE 5610 Project Report - Gozick

low latency and high bandwidth when interacting with the implemented level 1 cache. It has an 8-way

set associativity with a size of 64K. Having it dedicated adds to better power performance and speed

performance. The Cortex A8 processor implements an advanced virtual memory system architecture on

an improved memory management unit and an advanced hardware floating point unit allowing for

greater precision operations. These features prove why most phones today utilize ARMv7 architecture

and the Cortex A8 combination which we will later talk about and present in Table 1.

Related Work There has been some work which aims to achieve a comparison of ARM architectures from previous

implementations and some that have tried to compare the performance of this architecture has on the

operating system and phone/user interaction. Some of these tests were performed on the CPU,

memory, and battery. We explain a few papers that have produced relevant data when comparing the

ARM architecture in smartphones as well as an introduction to benchmarking software written by a

software company based out of Austin Texas, DHTechnologies. We use this benchmarking software as

part of our ARM architecture comparison between different smartphones.

GreenDroid [5] is a project at University of California that focused on innovating and expanding on

microprocessor technology. Since they believe this is where the future is headed, they try to improve on

the architecture used today to be useful in a dual core and even a smaller environment as hardware

sizes reduce. They try and solve the silicon infrastructure of processors to produce an economically and

performance oriented processor using dark silicon which they dub conservation cores. These

conservation cores are extremely energy conservative compared to the microprocessor produced today.

GreenDroid is an actual prototype 45nm processor created using these conservation cores which tries to

attack Moore’s Law and leakage problems. The GreenDroid architecture is shown in Figure 4.

a b c

Figure 4 – The GreenDroid Architecture. GreenDroid is multicore mobile application processor that is made up of (a) 16 non-identical tiles that holds (b) common components to every tile such as the CPU, on-chip network and a shared L1 data cache and implanted in (c) represents the connections between the components and the conservation cores.

13-stage pipeline

4-way set associative 16K or

32K Level 1 Cache

8-way set associative 64K

dedicated Level 2 Cache

Variable Cache and Memory

Management Unit

Typical DMIPS of 2000 at Max

Utilized at CPU speeds up to

1GHz

Page 8: CSCE 5610 Project Report - Gozick

Freescale [6], a semiconductor company did research on the architectures needed to deliver a

product which would appeal to the market today. They analyzed different embedded products including

a mobile phone and focused that on the growth of the industry, the demand of performance and the

limitations present in today’s architectures to produce their own design. They have extended the ARM

architecture with their own to design an improved architecture called Mobile Extreme Convergence

(MXC) which is said to reduce power consumption, improve memory access times, and reach CPU

speeds around that of 500MHz. They did performance tests on their dedicated application processor

which featured 128KB on-chip L2 cache which achieved improvement from past designs. We present

some of their cache hit rate results in Figure 5 comparing their on-chip L2-cache and a system with no L2

cache.

a b

Figure 5 – L2 Cache performance comparison of a HTC G1 Smartphone containing an ARM11 process utilizing

ARMv6 architecture. (a) On the left is the L2 cache hit rate with no flash memory (b) features the L2 cache hit

rate with flash memory. We can see the box highlighting the typical operating range. Results show an on-chip L2

cache performance better in both scenarios.

DHTDroid [7] is a benchmarking utility toolset created to see how the Android operating system

interacts with the ARM architecture. The goal of this project was to implement a set of Android based

system benchmarks that generate an operating system and hardware abstraction vector that can be

compared across multiple smartphones with different hardware. The DHTDroid tool-set consists of 12

individual macro-benchmarks that stress test the CPU, the TLB, the cache, the memory, the I/O, and the

network capabilities. We can use this tool-set to identify potential hardware and operating system

issues which happens often as the Android OS gets updated every couple months. We can also perform

a performance test, store the results, and later compare to see if there is a hardware issue decreasing

system performance. We later discuss this in the methodology section as well as results in the Results

section.

Experimental Setup With all these new advancements in embedded architecture, a comparison of the performance of

each processor and architecture type is needed. With this we can try and gage how far or where the

Page 9: CSCE 5610 Project Report - Gozick

system enhancements are found. To do this we first need to gain a sample set consisting of numerous

smartphones, all using Android, which have different hardware implementations for a complete

comparison. Figure 6 illustrates five of the six phones that were used to evaluate the hardware. From left

to right in the figure the phones are HTC G1, HTC Hero, HTC Nexus One, Motorola Droid, and Samsung

Nexus S. The HTC EVO is not featured in this picture but is used in this project and is shown with

specifications in Table 1.

Figure 6 - Five Android Smartphones used to obtain CPU and memory benchmarks. From left to right, HTC G1, HTC Hero, HTC Nexus One, Motorola Droid, Samsung Nexus S. HTC EVO not present in the picture.

Since each phone has different hardware, we first have to recognize this and analyze how they are

different. We would like to note that each of these phones utilize a processor which is designed by ARM

but some feature more advanced architecture of ARMv7 rather than ARMv6 which will result in

decreased performance as we stated above in the Relevant Architecture section. Table 1 shows each

phone and its relative year, processor type, instruction set (architecture), CPU speed, and internal

memory specifications (RAM and ROM). These phones were released from late 2008 to the end of 2010.

Within this two year gap, there has been a significant growth in embedded architecture, specifically by

ARM with the upgrade from ARMv6 to ARMv7.

Table 1 – Android smartphone specifications which were used to test performance by benchmark analysis.

Phone Year Processor (CPU Core)

Instruction

Set

Max

Clock

Speed

Internal

Memory

RAM ROM

HTC G1 October

22, 2008

Qualcomm MSM7201A

ARM11 ARMv6 528 MHz 192 MB 256 MB

HTC Hero October

11, 2009

Qualcomm MSM7600A

ARM11 ARMv6 600 MHz 288 MB 512 MB

Motorola

Droid

October

17, 2009

TI OMAP 3430

ARM Cortex A8 ARMv7 550 MHz 256 MB 512 MB

HTC Nexus

One

March 16,

2010

Qualcomm QSD8250

Snapdragon ARM ARMv7 1 GHz 512 MB 512 MB

HTC EVO June 4,

2010

Qualcomm QSD8650

Snapdragon ARMv7 1 GHz 512 MB 1 GB

Samsung

Nexus S

December

16, 2010

Samsung Hummingbird

S5PC110 ARM Cortex A8 ARMv7 1 GHz 512 MB 16 GB

Page 10: CSCE 5610 Project Report - Gozick

Multiple benchmark programs were performed on each phone to try and gain an idea of which

phone was better in an area. We performed an MFLOP analysis, a CPU analysis, a memory analysis, and

an overall system benchmark analysis. We explain the overview of each technique and then present

results of each in the next section.

Linpack

Linpack benchmarks [8] have been used since the late 1970’s and early 1980’s. It was decided for

performance tests on supercomputers, but since then it has been a scale that has been used to grade

the computer system performance and a standard test on the TOP500 list, which details the world’s

most powerful computer systems. “The Linpack benchmark is a measure of a system’s floating point

computing power,” and recently has been ported over to the Android OS where it can be evaluate

performance of each update to the operating system [9]. It measures how fast a computer solves a

dense N by N system of linear equations, Ax=b, which is a very common task in computer engineering

system. The system is obtained by a Gaussian elimination technique with partial pivoting using

2/3·N3 + 2·N2 floating point operations. The configured end results in a number illustrating the millions

of floating point operations per second (MFLOPS). An example screenshot is shown in Figure 7 showing

Linpack for Android results in MFLOPS.

Figure 7 – Example simulation screenshot of Linpack for Android running on the Nexus S. Benchmark results represent the MFLOPS and the time taken to execute.

Nbench

Nbench is performance evaluating tool which tests the CPU efficiency. It is an old tool originally

written in 1995 for an old UNIX distribution. Since then, it has been modified and ported to many

operating systems such as Unix/Linux, Windows, ARM Evaluation, and recently, Android. Nbench

stresses the CPU on a number of areas including numeric sort, Fourier, Huffman and concludes with an

integer index and floating point index. “The benchmark was designed to expose the capabilities of a

system's CPU, FPU, memory and C compiler performance.” We can use this to compare with other

Page 11: CSCE 5610 Project Report - Gozick

Android smartphones to gauge and compare performance levels. An example screenshot of Nbench

running on an Android phone is shown in Figure 8 illustrating relevant CPU performance output.

Figure 8 – Example screenshot of Nbench running on an Android Device. The resulting output illustrates each of the CPU tests shown in the middle, while the indices of each value, memory, integer, and floating point, are shown at the top in yellow.

Quadrant

Quadrant is an independent benchmarking tool created specifically for Android devices. We run this

benchmark to get an idea of the total systems performance which includes the CPU, I/O and graphics.

Since quadrant tests all these aspects, they create their own levels and therefore we cannot gain an idea

of the scores that it outputs. As a result, we can only compare each phone’s quadrant score and identify

which phone performs better than the rest. Table 2 shows the measurement tests that were performed

on each of the four areas, CPU, Memory, I/O and Graphics. We also show screenshots of the program

running on an Android device. This is shown in

Table 2 – Quadrant Benchmark Tool showing the area being tested along with the benchmarking measurements

Quadrant Benchmark Tool

Hardware Measuring Test

CPU Branch Logic, Integer, Long Int, Short Int, Byte, Floating Point, Double Precision, Checksum, Compression, XML Parsing, Video Decoding (H.264), Audio Decoding (AAC)

Memory Throughput

I/O File System Reads, File System Writes Database Reads, Database Writes

Graphics 2D/3D – Frames per Second

Page 12: CSCE 5610 Project Report - Gozick

a b

Figure 9 – Screenshot examples of Quadrant Benchmarking Tool running on an Android device (a) illustrates the tests performed on the device to gain an overall score for Android device comparison (b) final output of the device which shows the score for each category specified as color, blue – CPU, red – Memory, green – I/O, Orange – 2D graphics, and yellow – 3D graphics.

DHTDroid

DHTDroid is a UNIX shell executable tool set which his written to analyze the hardware

functionalities of the present embedded architecture on a device. Specifically, it was custom written for

use on Android smartphones and targeted main for CPU and Memory performance. This benchmark

tool-set was ported from Linux, and re-written to meet embedded architecture standards. The tool set

consists of multiple performance evaluations scripts that are performed by the kernel to analyze a

number of benchmarks including:

1) cacheperf – Measures TLB, Cache, and memory performance

2) ctxswtch – Measures CPU and context switch performance

3) memcache – Measures data cache and memory efficiency/performance

4) syscpu – Measures CPU and system call subsystem performance

5) numsim – Measures the efficiency of executing math functions (vector, matrix)

Due to Android file system privileges, only a rooted phone, or phone who has gained access to the root

file system (administrator) can run these scripts on the phone. With this, we only had three of the six

phones that fell into this category: Motorola Droid, HTC Nexus One and HTC EVO. Each of the above

benchmarks, totaling five, was tested on these three phones.

Results

As we perform each benchmark on each phone, we look at each benchmark test individually and only

compare what is being analyzed on that test compared to the six phones being tested. Since the G1 is

the oldest and the Nexus S is the newest, we expect to see results reflect this idea that a newer phone

Page 13: CSCE 5610 Project Report - Gozick

will represent better benchmarks. However, this may not be the case. Since the efficiency of the

operating system also has influence in the performance of a device. With newer version of the Android

OS present throughout many of the devices, we might see benchmarks that result from better

performance instrumented in the software rather than hardware, but we do not neglect the fact that

there is different and to an extent, better hardware upgrades in each phone.

Table 3 – Performance Table which features results from the Linpack Benchmark (MFLOPS) and Quadrant. Each phones BOGO-MIPS value and external memory read/write times are shown.

Phone

MFLOPS

Quadrant BOGO-

MIPS

MEMORY

Single

Precision (SP)

Double

Precision

(DP)

Read

(MB/S)

Write

(MB/S)

HTC G1 2.213539 1.5514531 274 383.38 4.0 6.4

HTC Hero 5.9771547 4.1366262 746 599.65 4.3 6.5

Motorola

Droid 12.462279 7.048375 1099 615.35 5.5 6.4

HTC

Nexus One 16.530212 8.794049 1142 662.40 7.0 23.2

HTC EVO 17.004862 9.604989 1225 780.55 7.5 21.1

Samsung

Nexus S 15.497228 11.538064 1407 996.31 8.3 23.9

Linpack

Table 3 shows the performance results of two of the benchmark tests we performed, Linpack and

Quadrant. First we focus on the Linpack Results which are shown in column 2 of the table. We calculated

the Floating point Operations per Second of each processor (FLOPS) which is a type of measure of a

computer’s performance. With the feature of single instruction multiple data (SIMD) in the ARM

architecture, we can also calculate the double precision FLOPS (DP) with 64 bit, along with the standard

single precision FLOPS (SP). This parallelism capability allows these phones to generate a better

performance on a single core than that of a single core without the presence of SIMD. The FLOPS

present in the Table 3 are mega FLOPS meaning, that if a computer had 5 MFLOPS, it has 5 x 10^6

floating point operations a second. This is an important number and we can say, to an extent, that a

computer with a high FLOP number will have good CPU performance. The results for both the single

precision and double precision seem to follow a consistent path as better hardware/software

implementations have better CPU performance. This is shown clearly when comparing the HTC G1 with

the HTC EVO. The G1 has an ARMv6 ARM11 processor clocked at 528 MHz while the EVO has an ARMv7

Cortex A8 processor clocked at 1GHz. With the increase clock speed and significantly improved

hardware architecture, the MFLOP measurements illustrate this idea clearly. We can also see that the

single precision MFLOP is higher than that of the double precision. This is correct as a dual instruction

pipeline is slower than that of the single instruction eventually, but being able to incorporate two

instructions at once incorporates a much better system efficiency. This can be seen with the Samsung

Nexus S, which also has a Cortex A8 process at 1GHz. The Nexus S, though having a lower single

precision MFLOP number has a highest double precision MFLOP which can also have a great effect on

Page 14: CSCE 5610 Project Report - Gozick

the performance of a machine. Therefore, we should just not compare these two numbers together and

say a system with a higher single precision number is overall faster without factoring in what else is

going on in the infrastructure. We must see what else is happening in the system such as I/O operations

and memory limitations which both influence a system greater.

Quadrant

Table 3 also shows the Quadrant scores which try to quantify a systems overall performance levels

based on CPU operations, I/O operations, Memory, and graphics. We do not have individual scores for

each category but only the final number which we can use to compare with the other phones in this

project. Similar to the Linpack results, we still see the HTC G1 performing at the bottom of the list. With

the architecture and processor limits at the time it was released, we are not shocked with these results.

The Droid, Nexus One, EVO all have similar high Quadrant results which means that performance on

these phones are good, but not better than the Nexus S. With a score of 1407, it outperforms all the

phones in this project. However, the Quadrant benchmark is heavily influenced by GPU hardware which

tests the 2D and 3D graphics. With the G1, when smartphones were being released, embedded

designers were not focused on improving graphics of phones as they are today. So this benchmark

should only be used on phones released recently rather than an older phone built as an introductory

smartphone.

Figure 10 – Quadrant Benchmark Results. This illustrates an overall system benchmark which stresses the CPU, Memory, I/O, and 2D/3D graphics. Since this requires a graphical analysis, all these phones must have a graphics unit to be compared. We can see that the Nexus S outperforms all the phones but given the limitations to the Quadrant software used, we do not know in which area. We can assume that it is an accumulation for each area which gives it the highest overall score. Since this is an overall benchmark tool, we can only compare other phones scores and cannot make a statement on the contributions of the individual hardware.

0 500 1000

G1

Hero

Droid

Nexus One

EVO

Nexus S

1142

1099

746

274

1407

1225

Page 15: CSCE 5610 Project Report - Gozick

Nbench

Next, we look at the results given by the performance analysis tool, Nbench. We focus here on the

index given for the Memory, Integer and Float shown at the top as a scale in Figure 8. Each benchmark

focuses on different aspects of the system with the Integer index testing the CPU and how fast it can

perform calculations, Memory calculating memory performance within the system, and floating point

also looking at the CPU but this time using floating point as a guideline. The Nbench results seem to

reflect that as the other two previous benchmark tools. As we look with the Memory index, the Nexus

One, EVO, and Nexus S all top out around an index of 4 while the G1 and Hero are maxing out around

the 1.5 mark. Performing any integer calculations on these phones will result in better performance

times. The same could be said about Memory performance with the Nexus S having a greater index than

any of the others. This time, the precision is not as close as the Integer index, but gradually moves up

almost like a step function, from the Droid, Nexus One, EVO and finally Nexus S. With the floating point

index, we are confused here. We are not sure how this index is calculated but looking at the previous

MFLOP results from Linpack, it is consistent if this FLOP index was using a 10^7 scale. The Nexus S was

the fastest with double precision and the EVO with single precision. Results from the Droid, Nexus One,

EVO and Nexus S all perform well in this area which is significantly greater than both the Hero and G1. A

more detailed result on each phone using Nbench is given in Appendix B.

Figure 11 – Nbench results from the three benchmark tests, Integer, Memory and Floating Point. From this graph we can see that the G1, which is older in year, is not up to the performance levels of the newer phones released in 2010. These newer phones, Nexus One, EVO, and Nexus S, all stand out excelling each category. With respect to the Samsung Nexus S, with newer hardware and software, we expect this to be the best performing phone in this project and these results reflect that. A more detailed result on each phone is given in Appendix B.

Integer Memory Floating Point0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Be

nch

ma

rk V

alu

e

Nbench Performance Analysis

HTC G1

HTC Hero

Motorola Droid

HTC Nexus One

HTC EVO

Samsung Nexus S

Page 16: CSCE 5610 Project Report - Gozick

DHTDroid

Using this benchmark tool-set we focused on testing the CPU and the Memory. Using 5 different tests,

we try and examine the performance levels of the Motorola Droid, HTC Nexus One and HTC EVO. Since

we can only use this tool-set with phones that have root access, these were the only phones testing

using DHTDroid. We look at Memcache first which is a memory benchmark that walks through the data

cache and the physical memory subsystems, invoking different levels of the memory hierarchy.

Memcache performs read and write operations in memory and obtains the cache latency times which

are given in nanoseconds. We tested the 1024K memory level using increasing read and write sizes of 4B

to 525KB. Figure 12 illustrates the results from the memcache memory performance test. We can see

each phone starts out on the 1024K level using an increasing read and write size which varies in access

times. Latency values less than 100ns reflect a high data cache usage (read/write position), where

latency values higher than 100ns reflect operations that are mainly executed at physical memory

speeds. The complexity of the memory subsystem, as well as the number of CPU’s present in the phone

(in this case we only have one for each phone) all have significant impact on these benchmark results.

We can see the droid has higher latency times than the EVO and Nexus One. With the Nexus One and

EVO, they have different latency times when the cache is used while the physical memory system, they

have equaling access times.

Figure 12 – Results from the memcache performance test. Cache latency times of the 1024K memory hierarchy subsystem using increasing read/write operations. Lower cache times reflect a high data cache usage while higher latency times represent operations that are executed at physical memory speeds. The EVO and Nexus One perform well in this area compared to the Droid, but are equal when analyzing times with high data cache usage.

0 1 2 3 4 5

x 105

0

100

200

300

400

500

600

Memory Hierarchy Size

Cach

e L

ate

ncy (

ns)

Latency of Read/Write Operations in Memory

Motorola Droid

HTC EVO

HTC Nexus One

Page 17: CSCE 5610 Project Report - Gozick

CPU Context Switching

For this performance test, we use the ctxswtch program to evaluate the CPU and context switches. The

program sets a certain amount of context switches which is passed into the program as a specification.

We evaluated each phone using 4,200,000 context switches on the CPU. The results from this test are

shown in Table 4. The table shows the total number of context switches perfumed with the total amount

of time spent for each phone. We can see the Droid lacks in CPU performance here as the total time was

more than twice the time spent by the Nexus One and triple the time spent by the EVO. This means that

the number of context switches per second endured by the CPU is lower with the Droid. This lower

number shows the limitations of the CPU as it will take longer to finish the context switching test which

the table reflects. We also present context switch time in microseconds. The EVO produces a very low

switch time of 16.8µs, the Nexus One with 22.71µs, and the Droid ending it out with almost 50µs. Each

smartphone CPU performed the test with about 50% utilization.

Table 4 – Ctxswtch performance results - CPU and Context Switches

Phone Total Time

(s) Total Number of Context Switches

Context Switch Frequency (Hz)

Time per Switch (µs)

Motorola Droid 101.33 4195303 20155.71 49.61

HTC Nexus One 47.12 4195800 44031.04 22.71

HTC EVO 33.71 4196664 59527.84 16.80

We next look at the results from the cacheperf benchmark. During this tool, the Droid experienced some

kind of error as it was not outputting the final results. Given this, we can only illustrate the performance

of the Nexus One and EVO. Cacheperf is a memory performance tool which quantifies the performance

of the cache/memory subsystem. It also measures the CPU and TLB access/latency values.

Table 5 – Cacheperf performance of the cache/memory along with TLB latency values

Numsim performance analysis was executed on each of the three phones which is a raw floating point

performance evaluation on the CPU. The benchmark operates in a close loop using 7 difference

matrix/vector scenarios (Bm1-Bm7) where Bm stands for Benchmark. The results are shown as floating

point operations per second or MFLOPS. The vector equations using in the numsim test are shown

below for each benchmark scenario while the results of each are shown in

Nexus One

CPU + L1 Access: 13.95 ns

Cache Subsystems:

Level | Size | Line Size | Cache Miss Latency |Cache Replace Time

1 256 KB 128 bytes 241.28 ns 248.00 ns

TLB Subsystem:

Level | Size | Page Size | TLB Miss Latency

1 80 4 KB 14.17 ns

2 1536 8 KB 78.34 ns

EVO

CPU + L1 Access: 13.95 ns

Cache Subsystems:

Level | Size | Line Size | Cache Miss Latency |Cache Replace Time

1 256 KB 128 bytes 258.24 ns 241.54 ns

TLB Subsystem:

Level | Size | Page Size | TLB Miss Latency

1 80 4 KB 14.60 ns

2 1536 8 KB 77.16 ns

Page 18: CSCE 5610 Project Report - Gozick

Bm1: Vector Copy D[i] = A[i]

Bm2: Vector Add D[i] = A[i] + B[i]

Bm3: Vector Multiply D[i] = A[i] * B[i]

Bm4: Vector Divide D[i] = A[i] / B[i]

Bm5: Vector Add-Multiply D[i] = A[i] + B[i] * C[i]

Bm6: Vector Add-Divide D[i] = A[i] + B[i] / C[i]

Bm7: Matrix Vector Product of a 5-Diagonal Sparse Matrix

Table 6 – Numsim performance results for each of the benchmark scenarios testing different vector equations (MFLOP)

Phone Bm1 Bm2 Bm3 Bm4 Bm5 Bm6 Bm7

Droid 11.90 4.66 4.43 0.968 5.11 1.69 5.80

Nexus One

26.70 7.35 9.79 2.02 8.26 3.24 9.27

EVO 24.80 7.56 8.97 2.02 8.29 3.08 9.52

With results from Table 6 we can see that the EVO and Nexus One both perform very well in floating

point arithmetic operations. The Droid seems to lag around in this area which could be dependent on its

lesser processor. Throughout other benchmarks, the Nexus One and EVO both seem to have great

performance in this area.

Problems and Conclusion Mobile phones are continuing to be upgraded in hardware and in architecture. The combination of

both results in greater performance and greater potential for optimizing a mobile device to operate

more like a desktop computer. This seems to be the way we are headed as more advanced phones

continue to be released. For this project we first looked at benchmarks which were provided by Android

developers optimized to run on the Java Virtual Machine as an external application to the Android

Operating System. Two of these benchmarks were very well known programs which were ported to the

Android OS for means of evaluating both hardware and software capabilities. These two programs were

Linpack and Nbench. We mentioned that not only do hardware and architecture specifications influence

benchmarks but also the software does too. As the Android OS keeps upgrading, and different phones

using different OS versions, we will see an increase and sometimes a decrease in performance levels.

We can use these benchmarks for future analyze on mobile devices as hardware requirements increase

significantly. We perform MFLOP tests in multiple ways on multiple benchmarks which help grasp the

capabilities and performance levels of the CPU. From class, we discussed about memory and cache

performance relating to average access time and latency values. The DHTDroid toolset helped grasp

these ideas performing tests on a familiar embedded device used daily by most people, a smartphone.

The DHTDroid toolset offered a great analysis experience as we were able to stress not only the CPU,

but also the memory and cache including the TLB. We were able to capture cache latency values of the

Page 19: CSCE 5610 Project Report - Gozick

memory hierarchy and compare that with a number of different Android smartphones. Originally getting

the toolset to execute on the phone was a problem as administrator (root) capabilities weren’t available

on some of the phones. It took a while to realize this and find an appropriate place in the file system

with executable directory permissions. As soon as a solution was found for these problems, we were

able to smoothly run the DHTDroid toolset and analyze the results for each. When analyzing the

specifications of each phone, specifically the ARMv6 and ARMv7 architecture, we can clearly see a huge

performance upgrade from the ARMv6 G1 and Hero smartphones to the ARMv7 Droid, Nexus One, EVO,

and Nexus S smartphones. In every performance evaluation, the ARMv7 phones outperformed the

ARMv6 by a significant margin. This concludes that ARM has deployed many architectural design

implementations as we discussed in the relevant architecture section of the Introduction. Looking at the

Quadrant results, we can easily see how far that embedded mobile architectures have come a long way

and the design processes does not seem to be slowing down with increasing processor speeds, dual core

processors, and even quad core processors already in the works. ARM has already released a design of

the Cortex A15, a dual core processor architecture which is said to consume less power and have higher

performance levels than the overused Cortex A8 processor. Using these benchmark utilities, we were

able to compare and finalize an analysis on each using phones with different hardware specifications.

Page 20: CSCE 5610 Project Report - Gozick

Annotated References

[1] P. Zheng and L. M. Ni, “Spotlight: the rise of the smart phone,” IEEE Distributed Systems Online, vol.

7, no. 3, March 2006.

This paper discusses an overview and introduction of the smartphone and what it takes to classify a

mobile device as a smartphone. A brief discussion about each operating system capable of

optimizing a smartphone is covered as well as hardware requirements and limitations.

[2] A. Gahran, “One-third of US youth have smartphones,” CNN, December 17, 2010. available:

http://articles.cnn.com/2010-12-17/tech/youth.cellphones.gahran_1_prepaid-phone-plans-teen-

texting-mobile-users?_s=PM:TECH

CNN reports on the rise in popularity of the smartphone and the correlation to the young adults in

the United States. Other countries are also compared with the age that teenagers start using a

smartphone and what their main uses are during their daily use of the embedded device. Mainly this

report covers useful statistics covering smartphones and their users.

[3] “Smartphones generate 65 per cent of all mobile traffic worldwide,” Mobile Communications

International Magazine, Informatm, Issue. 168, pp. 11, December 2010.

A brief magazine article discussion cell phone networks and data traffic cause by smartphones.

These statistics were useful for this report to portray to the readers of the rise in popularity of the

smartphone and the convenience of operating it as a daily device. As smartphones continue to

increase in popularity, data traffic directly reflect this with a continuous increase in network traffic.

[4] “ARM Architecture Reference Manual: Performance Monitors v2 Supplement,” ARM, 2009.

This reference manual gives an introduction to the ARM architecture along with its accessible

functions and performance evaluations of newer and older architectures. We used this to discuss

smartphone architecture to help the user have an idea what is being used today.

[5] S. Swanson and M. B. Taylor, “GreenDroid: exploring the next evolution in smartphone application

processors,” IEEE Communications Magazine, pp. 112-119, April 2011.

GreenDroid is a project at University of California that focused on innovating and expanding on

microprocessor technology. Since they believe this is where the future is headed, they try to

improve on the architecture used today to be useful in a dual core and even a smaller environment

as hardware sizes reduce. They try and solve the silicon infrastructure of processors to produce an

economically and performance oriented processor using dark silicon which they dub conservation

cores.

[6] “Mobile Extreme Convergence: A streamlined architecture to deliver mass-market converged mobile

devices,” Freescale Semiconductor Inc., rev. 5, 2009.

Page 21: CSCE 5610 Project Report - Gozick

Freescale is a semiconductor company who did research on the architectures needed to deliver a

product which would appeal to the market today. They analyzed different embedded products

including a mobile phone and focused that on the growth of the industry, the demand of

performance and the limitations present in today’s architectures to produce their own design. They

have extended the ARM architecture with their own to design an improved architecture called

Mobile Extreme Convergence (MXC) which is said to reduce power consumption, improve memory

access times, and reach CPU speeds around that of 500MHz.

[7] D. Heger, “DHTDroid v3.2 Benchmark – Quantifying Android OS & HW Performance,”

DHTechnologies, April, 5, 2011.

DHTDroid is a benchmarking utility toolset created to see how the Android operating system

interacts with the ARM architecture. The goal of this project was to implement a set of Android

based system benchmarks that generate an operating system and hardware abstraction vector that

can be compared across multiple smartphones with different hardware. The DHTDroid tool-set

consists of 12 individual macro-benchmarks that stress test the CPU, the TLB, the cache, the

memory, the I/O, and the network capabilities.

[8] Linpack for Android, GreenComputing. Available: http://www.greenecomputing.com/apps/linpack/.

Linpack benchmarks have been used since the late 1970’s and early 1980’s. It was decided for

performance tests on supercomputers, but since then it has been a scale that has been used to

grade the computer system performance and a standard test on the TOP500 list, which details the

world’s most powerful computer systems. The Linpack benchmark is a measure of a system’s

floating point computing power, and recently has been ported over to the Android OS where it can

be evaluate performance of each update to the operating system.

[9] S. Weintraub, “Android 2.2 tests reveal stunning speed gains,” CNN Money: Fortune, May 12, 2010,

Available: http://tech.fortune.cnn.com/2010/05/12/android-2-2-demonstrating-incredible-speed-

gains/.

This CNN report discusses the Android operating system along with smartphone hardware increased

with the occurrence of an Android OS update, Android 2.2. Linpack was used for evaluation of the

new upgraded version of Android showing a 450% increase in performance.

Page 22: CSCE 5610 Project Report - Gozick

Appendix A NBench Results

HTC G1 BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 87.416 : 2.24 : 0.74 STRING SORT : 3.8943 : 1.74 : 0.27 BITFIELD : 1.1819e+08 : 4.04 : 0.84 FP EMULATION : 2.3572e+07 : 4.39 : 1.01 FOURIER : 9.1485 : 0.17 : 0.10 ASSIGNMENT : 149.09 : 4.01 : 1.04 IDEA : 1.0527 : 4.74 : 1.41 HUFFMAN : 143.03 : 3.97 : 1.27 NEURAL NET : 0.19878 : 0.32 : 0.13 LU DECOMPOSITION : 5.831 : 0.30 : 0.22 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 3.397 FLOATING-POINT INDEX: 0.254 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.27-00393-g6607056 ([email protected]) (gcc version 4.2.1) #1 PREEMPT Mon May 11 10:38:09 PDT 2009 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc MEMORY INDEX : 0.618 INTEGER INDEX : 1.074 FLOATING-POINT INDEX: 0.141 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. HTC Hero BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97)

Page 23: CSCE 5610 Project Report - Gozick

Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 142.45 : 3.65 : 1.74 STRING SORT : 4.8218 : 2.15 : 0.33 BITFIELD : 3.8314e+07 : 6.57 : 1.37 FP EMULATION : 14.459 : 6.94 : 1.60 FOURIER : 2010.6 : 2.29 : 1.28 ASSIGNMENT : 2.3811 : 9.06 : 2.35 IDEA : 491.44 : 7.52 : 2.23 HUFFMAN : 230.07 : 6.38 : 2.04 NEURAL NET : 0.36901 : 0.59 : 0.25 LU DECOMPOSITION : 9.9089 : 0.51 : 0.37 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 5.519 FLOATING-POINT INDEX: 0.886 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.32.17-g30929af (htc-kernel@and18-2) (gcc version 4.4.0 (GCC) ) #1 PREEMPT Wed Dec 1 15:10:40 CST 2010 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc MEMORY INDEX : 1.025 INTEGER INDEX : 1.719 FLOATING-POINT INDEX: 0.491 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. Motorola Droid BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 316.49 : 8.12 : 2.67 STRING SORT : 28.848 : 12.89 : 2.00 BITFIELD : 1.1819e+08 : 20.27 : 4.23 FP EMULATION : 40.657 : 19.51 : 4.50

Page 24: CSCE 5610 Project Report - Gozick

FOURIER : 2133 : 2.43 : 1.36 ASSIGNMENT : 7.1511 : 27.21 : 7.06 IDEA : 1232 : 18.84 : 5.59 HUFFMAN : 621.69 : 17.24 : 5.51 NEURAL NET : 0.80774 : 1.30 : 0.55 LU DECOMPOSITION : 20.266 : 1.05 : 0.76 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 16.723 FLOATING-POINT INDEX: 1.490 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.32.9_rMoD_250-1100_ (corcor67@corcor67-desktop) (gcc version 4.4.3 (GCC) ) #8 PREEMPT Sun Mar 13 22:03:01 CDT 2011 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc MEMORY INDEX : 2.254 INTEGER INDEX : 3.173 FLOATING-POINT INDEX: 0.826 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. HTC Nexus One BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 319.53 : 8.19 : 2.69 STRING SORT : 12.833 : 5.73 : 0.89 BITFIELD : 1.0573e+08 : 18.14 : 3.79 FP EMULATION : 37.708 : 18.09 : 4.18 FOURIER : 3075.1 : 3.50 : 1.96 ASSIGNMENT : 5.6452 : 21.48 : 5.57 IDEA : 1126.3 : 17.23 : 5.11 HUFFMAN : 495.69 : 13.75 : 4.39 NEURAL NET : 0.77346 : 1.24 : 0.52 LU DECOMPOSITION : 19.323 : 1.00 : 0.72 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 13.421 FLOATING-POINT INDEX: 1.632

Page 25: CSCE 5610 Project Report - Gozick

Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.32.9-27240-gbca5320 ([email protected]) (gcc version 4.4.0 (GCC) ) #1 PREEMPT Tue Aug 10 16:42:38 PDT 2010 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc MEMORY INDEX : 2.656 INTEGER INDEX : 3.985 FLOATING-POINT INDEX: 0.905 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. HTC EVO BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 326.07 : 8.36 : 2.75 STRING SORT : 20.765 : 9.28 : 1.44 BITFIELD : 1.0899e+08 : 18.70 : 3.91 FP EMULATION : 38.539 : 18.49 : 4.27 FOURIER : 3141.2 : 3.57 : 2.01 ASSIGNMENT : 5.7634 : 21.93 : 5.69 IDEA : 1150.3 : 17.59 : 5.22 HUFFMAN : 506.6 : 14.05 : 4.49 NEURAL NET : 0.78505 : 1.26 : 0.53 LU DECOMPOSITION : 19.577 : 1.01 : 0.73 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 14.658 FLOATING-POINT INDEX: 1.659 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.37.4-cyanogenmod-01295-gdc22375 (shade@toxygene) (gcc version 4.4.3 (GCC) ) #1 PREEMPT Wed Apr 6 22:14:12 EDT 2011 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc

Page 26: CSCE 5610 Project Report - Gozick

MEMORY INDEX : 3.172 INTEGER INDEX : 4.071 FLOATING-POINT INDEX: 0.920 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder. Samsung Nexus S BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index

: : Pentium 90* : AMD K6/233* - - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - - - - - - : - - - - - - - - - - - - - : - - - - - - - - - - - - NUMERIC SORT : 146.29 : 3.75 : 1.23 STRING SORT : 7.7558 : 3.47 : 0.54 BITFIELD : 5.4413e+07 : 9.33 : 1.95 FP EMULATION : 18.394 : 8.83 : 2.04 FOURIER : 1011.8 : 1.15 : 0.65 ASSIGNMENT : 3.7009 : 14.08 : 3.65 IDEA : 580.4 : 8.88 : 2.64 HUFFMAN : 282.19 : 7.83 : 2.50 NEURAL NET : 0.37025 : 0.59 : 0.25 LU DECOMPOSITION : 9.5606 : 0.50 : 0.36 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 7.245 FLOATING-POINT INDEX: 0.697 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : ARMv7 Processor rev 3 (v7l) L2 Cache : 0 OS : Linux version 2.6.35.7-g7f1638a ([email protected]) (gcc version 4.4.3 (GCC) ) #1 PREEMPT Thu Dec 16 21:12:36 PST 2010 C compiler : arm-eabi-gcc (GCC) 4.4.0 libc : Android Bionic libc MEMORY INDEX : 3.598 INTEGER INDEX : 4.231 FLOATING-POINT INDEX:0.954 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder.