tutorial: enabling performance collection on an embedded ...• configure the yocto project kernel...

23
Tutorial: Enabling Performance Collection on an Embedded Linux* System Intel ® VTune™ Amplifier for Systems Linux* OS C++ Sample Application Code Legal Information Important This document was last updated for the Intel VTune Amplifier 2017 product release. If you are using this tutorial with a newer version of VTune Amplifier, you may see differences in analysis type names and user interface design.

Upload: others

Post on 21-May-2020

17 views

Category:

Documents


3 download

TRANSCRIPT

Tutorial: Enabling Performance Collection on an Embedded Linux* System

Intel® VTune™ Amplifier for Systems Linux* OS

C++ Sample Application Code

Legal Information

ImportantThis document was last updated for the Intel VTune Amplifier 2017 product release. If you are using this tutorial with a newer version of VTune Amplifier, you may see differences in analysis type names and user interface design.

ContentsLegal Information................................................................................ 3Overview..............................................................................................4

Chapter 1: Navigation Quick Start

Chapter 2: Enabling Collection on a Remote Embedded DeviceConfigure Yocto* Project Kernel................................................................... 8Add Intel® VTune™ Amplifier Sampling Driver and Build Operating System........ 9Load Drivers and Configure SSH Access...................................................... 12Prepare Your Sample Application................................................................13Run Advanced Hotspot Analysis................................................................. 15View Your Results.....................................................................................17

Chapter 3: Summary

Chapter 4: Key Terms

Tutorial: Enabling Performance Collection on an Embedded Linux* System

2

Legal InformationNo license (express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors which may cause deviations from published specifications. Current characterized errata are available on request.

Intel, the Intel logo, Intel Atom, Intel Core, Intel Xeon Phi, VTune and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

Copyright 2014-2018 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

3

Overview

Discover how to set up your embedded device for performance analysis using Intel® VTune™ Amplifierfor Systems. After setting up your operating system and installing sampling drivers, run a performanceanalysis to understand where your embedded application is spending time by identifying hotspots - the mosttime-consuming program units. Advanced Hotspots Analysis is useful to analyze the performance of bothserial and parallel applications. The Intel VTune Amplifier for Systems supports analysis of remote Linux*applications running on regular or embedded Linux systems, but this tutorial will focus on embeddedplatforms with Yocto Project*.

About This Tutorial This tutorial uses the sample tachyon and guides you through the basic stepsrequired to set up remote collection of performance data on a Linux target anduse the VTune Amplifier GUI to analyze the code for hotspots.

This tutorial focuses on setting up the Yocto Project 2.1 environment on a targetdevice. Review the system requirements for building Yocto Project beforebeginning this tutorial. Requirements are available from the Yocto Project Quickstart, which can be found from www.yoctoproject.org. If you are using anenvironment other than Yocto Project on your target system, you will need toprovide your own sample application, kernel version, and kernel sourcedirectory.

Estimated Duration • 35 minutes: Preparing your host and target device for use• 60 minutes: Build the Yocto Project kernel (no user action required during

this time)• 15 minutes: Preparing your sample application and analyzing it

NOTE:

Actual time will vary because a build of the Yocto Project kernel is required.

Learning Objectives After you complete this tutorial, you will be able to:

• Configure the Yocto Project kernel for your embedded device• Add the sampling drivers and build the Yocto Project operating system• Load the drivers onto the embedded device and configure SSH access• Prepare your sample application, tachyon• Run an Advanced Hotspot Analysis• View your results in Intel VTune Amplifier for Systems

More Resources • The Intel Developer Zone is a site devoted to software development tools,resources, forums, blogs, and knowledge bases, see http://software.intel.com

• The Intel Software Documentation Library is part of the Intel Developer Zoneand is an online collection of Release Notes, User and Reference Guides,White Papers, Help, and Tutorials for Intel software products, http://software.intel.com/en-us/intel-software-technical-documentation

• For troubleshooting the creation and installation of the sep drivers, see http://software.intel.com/en-us/articles/troubleshooting-issues-with-sep-in-the-embedded-tool-suite-intel-system-studio

Start Here

Tutorial: Enabling Performance Collection on an Embedded Linux* System

4

Navigation Quick Start 1 Intel® VTune™ Amplifier provides information on code performance for users developing serial and

multithreaded applications on Windows*, Linux*, Android, and OS X* operating systems. VTune Amplifierhelps you analyze algorithm choices and identify where and how your application can benefit from availablehardware resources.

VTune Amplifier for Systems AccessVTune Amplifier installation includes shell scripts that you can run in your terminal window to set upenvironment variables:

1. From the installation directory, type source amplxe-vars.sh for bourne or korn or bash shells, orsource amplxe-vars.csh if you are using the C shell.

This script sets the PATH environment variable that specifies locations of the product graphical userinterface utility and command line utility.

The default installation directory is /opt/intel/system_studio_version/vtune_amplifier_version_for_systems.

2. Type amplxe-gui to launch the product graphical interface.

5

VTune Amplifier GUI

Configure and manage projects and results, and launch new analyses from the primary toolbar. Clickthe Configure Project button on this toolbar and use the Analysis Target tab to manage result filelocations. Newly completed and opened analysis results along with result comparisons appear in theresults tab for easy navigation.

Use the VTune Amplifier menu to control result collection, define and view project properties, and setvarious options.

The Project Navigator provides an iconic representation of your projects and analysis results. Clickthe Project Navigator button on the toolbar to enable/disable the Project Navigator.

Click the (change) link to select a viewpoint, a preset configuration of windows/panes for ananalysis result. For each analysis type, you can switch among several viewpoints to focus onparticular performance metrics. Click the yellow question mark icon to read the viewpointdescription.

Switch between window tabs to explore the analysis type configuration options and collected dataprovided by the selected viewpoint.

Use the Grouping drop-down menu to choose a granularity level for grouping data in the grid.

Use the filter toolbar to filter out the result data according to the selected categories.

See AlsoClick here for more Getting Started Tutorials

1 Tutorial: Enabling Performance Collection on an Embedded Linux* System

6

Enabling Collection on aRemote Embedded Device 2

Use this tutorial to set up performance analysis of a remote embedded device by building a supportedoperating system (OS) with Intel® VTune™ Amplifier for Systems sampling drivers. Run performance analysison the embedded device to identify and analyze hotspot functions in your serial or parallel application byperforming a series of steps in a workflow. This tutorial guides you through these workflow steps while usinga sample ray-tracer application named tachyon that runs on your embedded device.

While there are several supported embedded OS versions, this tutorial focuses on the Yocto Project* 2.1environment installed on a MinnowBoard Max* device. Additional information can be found at https://www.yoctoproject.org/about. The tachyon sample code has been optimized for the Yocto Projectenvironment. If you choose to run this tutorial on an embedded system with a different Linux* OSdistribution, you will need to provide your own sample application, kernel version, and kernel sourcedirectory.

Copying the kernel and drivers from your host to your target system is a one-time setup procedure, afterwhich you can run multiple data collection sessions and view and compare the results.

NOTE:

This tutorial focuses on obtaining the baseline results for Advanced Hotspots Analysis and the tachyonsample application. For more information on the iterative process of testing, modifying, improving, andretesting your code for comparative analysis, see Tutorial: Finding Hotspots - C++ Sample Code

7

Step 1:Configure YoctoProject* Kernel

Configure a Yocto* Project kernel with the appropriate settings for your targetdevice

Step 2: Add IntelVTune AmplifierSampling Driverand BuildOperatingSystem

• Install the VTune Amplifier sampling driver

• Using the Intel System Studio Integration Layer (on host system)• Using the Yocto* SDK for Cross-Build Environment (on host system)• Using the Native Build Environment (on target system)

• Build Yocto Project kernel• Copy operating system to target device

Step 3: LoadDrivers andConfigure SSHAccess

• Load drivers on target device• Confirm ssh password-less connection between host and target system

Step 4: PrepareYour SampleApplication

• Cross compile tachyon using the Yocto SDK• Copy tachyon to your Yocto Project target

Step 5: RunAdvancedHotspot Analysis

• Use the Intel VTune Amplifier for Systems GUI to set up your remoteconfiguration

• Run Advanced Hotspot Analysis

Step 6: ViewYour Results

View your results in Intel VTune Amplifier for Systems

Next step: Configure Yocto Project Kernel

Configure Yocto* Project KernelUse the following steps to set up the Yocto Project* kernel for your target embedded device after you haveinstalled the Intel® VTune™ Amplifier for Systems on your host. The default installation directory for VTuneAmplifier for Systems is /opt/intel/system_studio_<version>/vtune_amplifier_for_systems. Inthis tutorial, a MinnowBoard Max* was used as the target device. Additional details about this device areavailable from http://wiki.minnowboard.org/MinnowBoard_MAX.

• Get Software Tools• Configure the Yocto* Project Kernel

Get Software ToolsYou need the following tools to try the tutorial steps yourself using the tachyon sample application:

• Intel® System Studio, including Intel VTune Amplifier for Systems and sample applications• tar file extraction utility• C++ compiler such as Intel® C++ Compiler or GNU* C++ compiler• Eclipse* IDE (optional)

Configure the Yocto Project KernelConfigure the Yocto Project kernel you want to install on your target embedded device. In this tutorial, theYocto Project version 2.1 for 64-bit is used. The Krogoth Kernel version and source directory informationprovided in the examples are specific to this version.

1. Follow the steps for downloading and building a Yocto image for a MinnowBoard Max device that areavailable from http://wiki.minnowboard.org/Yocto_Project. Modify the commands available from theMinnowBoard Wiki for the version of Yocto Project you want to install. The following commands arespecific to the Yocto Project 2.1 64-bit version.

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

8

a. Navigate to your home directory or another writeable directory on your host system.b. Download the Yocto Project* files.

mkdir sourcecd sourcegit clone -b krogoth http://git.yoctoproject.org/git/pokycd pokygit clone -b krogoth http://git.yoctoproject.org/git/meta-intel

c. Initialize the build environment.

source oe-init-build-envd. Configure the build environment for the MinnowBoard Max using a text editor or by running this

command:

echo "BBLAYERS += \"$HOME/source/poky/meta-intel\"" >> conf/bblayers.confe. Select the board support package using a text editor or by running this command:

echo "MACHINE = \"intel-corei7-64\"" >> conf/local.conf2. After configuring your version of Yocto Project on the host system, use bitbake linux-yocto -c

menuconfig to set up and confirm other required kernel configurations. The required kernelconfigurations include:

CONFIG_MODULES=yCONFIG_MODULE_UNLOAD=yCONFIG_PROFILING=yCONFIG_HAVE_OPROFILE=yCONFIG_SMP=yCONFIG_KPROBES=yCONFIG_RING_BUFFER=yCONFIG_FRAME_POINTER=y

NOTE:

The latest Yocto Project version includes all required kernel configurations by default. Review therequired kernel configurations for the VTune Amplifier sampling driver in the "Building and Managingthe Sampling Drivers" topic in the VTune Amplifier online help.

Next step: Add Intel VTune Amplifier Sampling Driver and Build Operating System

Add Intel® VTune™ Amplifier Sampling Driver and BuildOperating SystemTo perform the Advanced Hotspots analysis, VTune Amplifier sampling drivers must be incorporated into thekernel modules that are installed on the target system. There are three options for Yocto users to compilethe sampling drivers - two on the host system and one on the target system. Select the appropriate optionbased on your development environment.

• Using the Intel System Studio Integration Layer (on host system)• Using the Yocto* SDK for Cross-Build Environment (on host system)• Using the Native Build Environment (on target system)

NOTE:

An example of building and installing a Yocto project operating system is available at the Intel®Developer Zone https://software.intel.com/en-us/forums/topic/507002.

Enabling Collection on a Remote Embedded Device 2

9

Using the Intel System Studio Integration Layer (on host system)The Intel System Studio integration layer allows the Intel System Studio products to be fully integrated witha target operating system by building the drivers and corresponding target packages into the operatingsystem image automatically. Use this option when you have full control over the kernel sources and signaturefiles.

1. On the host system, copy the integration layer from the Intel System Studio installation folder to theYocto Project development folder. For example:

cp -r /opt/intel/system_studio_2017.0.0/wr-iss-2017 $HOME/source/poky/2. Run the post-installation script. For example:

cd $HOME/source/pokywr-iss-2017/yp-setup/postinst_yp_iss.sh /opt/intel/system_studio_2017.0.0

3. Add the path to the wr-iss-<version> to the bblayer.conf file. For example:

vi $HOME/source/poky/build/conf/bblayers.confBBLAYERS = "$HOME/source/poky/wr-iss-2017\"

Your file should look similar to the following:

BBLAYERS ?= " \ /opt/source/poky/meta \ /opt/source/poky/meta-poky \ /opt/source/poky/meta-yocto-bsp \ /opt/source/poky/meta-intel \ /opt/source/poky/wr-iss-2017 \ "

4. Add the Intel VTune Amplifier recipe to conf/local.conf. For example:

vi "build/conf/local.conf"IMAGE_INSTALL_append = " intel-iss-vtune-target"

5. Run the following command to build the Yocto image and SDK toolchain:

bitbake core-image-sato

NOTE:

Building the Yocto image can take an hour or more, depending on if it has been built previously.

6. Run the following command to build the SDK toolchain:

bitbake core-image-sato –c populate_sdk7. Flash the final image to your target device using the instructions available from http://

wiki.minnowboard.org/Yocto_Project.

Using the Yocto* SDK for Cross-Build Environment (on host system)Use this option to build the drivers and install the VTune Amplifier collectors manually with a command line.This option is commonly used when you are developing application using the Yocto SDK and the Yocto Projectimage is already available on the target system. This option requires root access to the configured kernelsource and can be used if the compiler toolchain is not available on the target system.

Before You Begin

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

10

To build the VTune Amplifier kernel modules, you need the kernel source that is used to build your targetLinux kernel. If you can't find the kernel source in your Yocto* SDK installation folder, the kernel-devsrcpackage is not included for building the SDK. To enable the kernel-devsrc package, add the following inthe conf/local.conf file before building the Yocto* SDK.

TOOLCHAIN_TARGET_TASK_append = " kernel-devsrc"1. On the host system, run the following command to build the SDK toolchain:

bitbake core-image-sato –c populate_sdk2. Install the Yocto SDK image on your host system.3. Run the following commands on the host system to compile and load the VTune Amplifier sampling

driver for a 64-bit system. Change the paths to match your directory structure and install locations. Inthe example commands below, the Yocto SDK image was installed in /opt/poky/2.1 and Intel VTuneAmplifier was installed in the default location /opt/intel/vtune_amplifier_<version>_for_systemssource /opt/poky/2.1/environment-setup-*-linuxcd /opt/poky/2.1/sysroots/corei7-64-poky-linux/usr/src/kernel; make scriptscd /opt/intel/vtune_amplifier_for_systems/target/linux/cp vtune_amplifier_target_x86_64.tgz /tmptar zxvf /tmp/vtune_amplifier_target_x86_64.tgzcd /tmp/vtune_amplifier_2017_for_systems.0.0.0/./build-driver -ni --c-compiler="$CC" --kernel-src-dir=/opt/poky/2.1/sysroots/corei7-64-poky-linux/usr/src/kernel --kernel-version=4.4.3-yocto-standard --make-args="PLATFORM=x64 ARITY=smp"

4. Copy /tmp/vtune_amplifier_<version>_for_systems.<version> to the /opt/intel/vtune_amplifier_<version>_for_systems.<version> directory on the target system.

Using the Native Build Environment (on target system)Use this option to build the drivers and install the VTune Amplifier collectors manually with a command line.This option requires root access to the configured kernel source and can be used if the compiler toolchain isavailable on the target system.

Before You Begin

By default, the compiler toolchain is not included in the target Linux environment. Usually the followingpackages should be added in your conf/local.conf file when you build your target Linux image in theYocto Project build environment.

IMAGE_INSTALL_append = " autoconf automake binutilsbinutils-symlinks cpp cpp-symlinks gcc gcc-symlinks g++g++-symlinks gettext make libstdc++ libstdc++-dev filecoreutils"

The kernel source should also be included in your Linux target system to build the kernel modules.

IMAGE_INSTALL _append = " kernel-dev"Once you have all these packages installed on your target Linux system, we should be able to build theVTune Amplifier drivers directly on your target.

1. Flash the final image to your target device using the instructions available from http://wiki.minnowboard.org/Yocto_Project.

2. Copy the /opt/intel/vtune_amplifier_<version>_for_systems/target/linux/vtune_amplifier_target_x86_64.tgz (64-bit) or /opt/intel/vtune_amplifier_<version>_for_systems/target/linux/vtune_amplifier_target_x86.tgz(32-bit) file from the host system to the /opt/intel directory on the target system.

3. Run the following commands on the target system to compile the sampling driver for a 64-bit system:

Enabling Collection on a Remote Embedded Device 2

11

$cd /usr/src/kernel; make scripts (this is a one-time step on the target system)

$tar zxvf /opt/intel/vtune_amplifier_target_x86_64.tgz$cd /opt/intel/vtune_amplifier_<version>_for_systems.<version>/sepdk/src$./build-driver -ni --kernel-src-dir=/usr/src/kernel/ --kernel-version=4.4.3-yocto-standard --make-args="PLATFORM=x64 ARITY=smp"

Next step: Load Drivers and Configure SSH Access

Load Drivers and Configure SSH AccessAfter building the Yocto Project* kernel, copy it to your target device and use the following sections to loadthe Intel® VTune™ Amplifier sampling drivers and set up SSH access from the host system to the targetsystem.

• Load the Intel VTune Amplifier Sampling Drivers• Configure SSH Access

Load the Intel VTune Amplifier Sampling DriversOn the target device, run the following command to load the sampling drivers:

/opt/intel/vtune_amplifier_<version>_for_systems.<version>/sepdk/src/insmod-sepFor example:

/opt/intel/vtune_amplifier_2017_for_systems.0.0/sepdk/src/insmod-sepIf you are using a 64-bit target system, create a link between /lib64/ld-linux-x86-64.so.2and /lib/ld-linux-x86-64.so.2 so the default search path for the application functions properly. Thisstep only needs to be completed once and can be ignored if it was previously completed.

ln -s /lib/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2

TIP:

At this point you can collect performance data on the target system directly using the samplingenabling collector (SEP). Refer to the Sampling Enabling Collector User's Guide for more information.The guide is available in your installed VTune Amplifier documentation at <install-dir>/documentation/en/SEP_Users_Guide.pdf.

Configure SSH Access

NOTE:

If you already have password-less access configured for your target system, these steps are notrequired.

Configure ssh to work in password-less mode so it does not prompt for a password on each invocation. To dothis, use the key generation utility on the host system.

1. Generate the key with an empty passphrase:

host> ssh-keygen2. Copy the key to the target system:

host> ssh-copy-id user@target

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

12

If the ssh-copy-id command does not work, use the following alternative command to copy the key:

host> cat ~/.ssh/id_dsa.pub | ssh user@target "cat >> ~/.ssh/authorized_keys"You will need the target user password to complete this operation. If this command completessuccessfully, you will not require it afterwards.

Make sure that only the owner (root) has read/write/execute permissions to the $HOME/.ssh/ directoryand that such a directory exists. In these examples target can be a hostname or IP address.

3. After you set the password-less mode, run a command to verify that a password is not requiredanymore. For example:

host> ssh user@target lsFor more information, see the "Configuring SSH Access for Remote Collection" topic in the Intel VTuneAmplifier help.

Next step: Prepare Your Sample Application

Prepare Your Sample ApplicationThe Intel® VTune™ Amplifier for Systems release includes sample code called tachyon for you to use on thetarget system. The tachyon sample code included with your distribution is modified for the Yocto*environment. After extracting the sample code, build the application and copy it to your target.

1. Extract Sample Code2. Build Application

• Build Application Using the Intel C++ Compiler• Build Application Using the GNU* C++ Compiler• Build Application Using Eclipse*

3. Copy Application to Target Device

Extract Sample Code1. On the host Linux* system, change directories so you can untar the sample code:

cd /~yocto2. Extract the tachyon sample application:

tar xvzf /opt/intel/vtune_amplifier_for_systems/samples/en/C++/tachyon_vtune_amp_xe.tgz

Build ApplicationUse the Intel C++ Compiler, GNU C++ Compiler, or Eclipse to build the sample application.

Build Application using the Intel C++ Compiler

1. Source important environmental variables:

source /opt/poky/2.1/environment-setup-corei7-64-poky-linux-icc

NOTE:

This option is only available if you have integrated Intel System Studio with Yocto Project. For moreinformation, see https://software.intel.com/en-us/articles/using-intel-system-studio-with-the-yocto-project.

2. Compile the tachyon code:

make CXX="$CXX” build_find_hotspotsBuild Application Using GNU C++ Compiler

Enabling Collection on a Remote Embedded Device 2

13

1. Source important environmental variables:

source /opt/poky/2.1/environment-setup-corei7-64-poky-linux2. Compile the tachyon code:

make CXX="$CXX” build_find_hotspotsBuild Application Using Eclipse

1. Source important environmental variables:

source /opt/poky/2.1/environment-setup-corei7-64-poky-linux2. Open Eclipse from the command window. This allows Eclipse to inherit the environment variables set in

the previous step.3. In Eclipse, select File > Import to import the existing Tachyon project.4. In the Import window, select C/C++ > Existing Code as Makefile Project and click Next.5. Set a project name and click Browse and navigate to the location where you extracted the Tachyon

sample code. Click Finish to import the project.6. Select the new project you created and select Project > Properties or right-click the project name

and select Properties.7. In the Properties window, select C/C++ Build to show the build settings.

a. On the Builder Settings tab, uncheck the Use default build command checkbox and enter thefollowing in the Build command field: make CXX="${CXX}"

b. On the Behavior tab, enter the following in the Build (incremental build) field:build_find_hotspots

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

14

8. Click Apply to apply the changes and OK to close the Properties window.9. Select Project > Build Project to build the Tachyon sample application.

Copy Application to Target DeviceCopy the tachyon binary, the dat library and the libtbb.so folder to an appropriate location on your targetsystem.

For example:

scp tachyon_find_hotspots dat [email protected]:home/root/scp tbbforexamples/linux/4.0.1/lib/intel64/libtbb.so* [email protected]:/usr/lib

Next step: Run Advanced Hotspot Analysis

Run Advanced Hotspot AnalysisThe following steps show you how to launch the Intel® VTune™ Amplifier for Systems GUI and create a newproject.

1. Run amplxe-gui. Refer to the steps in Navigation Quick Start to set the appropriate environmentvariables if you have not already done so.

2. Click New Project and enter an identifying project name such as tachyon1 so that you can distinguishthis project from other projects. Keep or change the default project file Location: and click CreateProject.

Enabling Collection on a Remote Embedded Device 2

15

3. Set up the analysis target.

a. Select remote Linux (SSH) for the target system.b. Specify the user name and the host name or IP address of the remote system you are profiling via

SSH.c. Enter the full path for the target binary in the Application field. In this example the path is /

home/root/tachyon_find_hotspots.d. Enter any the path to the data file in the Application parameters field. In this example, the path

is /home/root/dat/balls.dat.e. Select Automatically stop collection after (sec) and enter 15.

When collecting data remotely, the VTune Amplifier looks for the collectors on the target device in itsdefault location: /opt/intel/vtune_amplifier_201x_for_systems.<package_num>. It alsotemporary stores performance results on the target system in the /tmp directory. If you followed thesteps detailed in Prepare Your Target Device and Install Intel VTune Amplifier Drivers, then thecollectors were installed in the default location. If you installed the target package to a differentlocation and need to specify another temporary directory, make sure to configure your settings from theAnalysis Target tab for your project.

• Use the VTune Amplifier installation directory on the remote system option to specify thepath to the VTune Amplifier on the remote system. If default location is used, the path is providedautomatically.

• Use the Temporary directory on the remote system option to specify a non-default temporarydirectory.

• Alternatively, use the -target-install-dir and -target-temp-dir options from the commandline.

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

16

4. Click Choose Analysis to switch to the Analysis Type tab.5. Select the Advanced Hotspots analysis type. You will notice communication with the remote system

before the Analysis Type screen appears.

6. Click the Start button to launch the Advanced Hotspots Analysis session.

The VTune Amplifier sets up the password-less SSH connection to your target device and launches the targetapplication. It collects Advanced Hotspots data with default settings, and then copies those results back tothe host.

Next step: View Your Results

View Your ResultsAfter the target device sends the performance results to the host system, the results appear in Intel VTuneAmplifier for Systems with the Summary window active.

Enabling Collection on a Remote Embedded Device 2

17

Review the information on the Summary window to see a high-level overview of any performanceoptimization opportunities. For example, the Top Hotspots section lists the functions that took the mosttime to complete. Click a function name to switch to the Bottom-up window and view more detailedinformation about that function.

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

18

The Bottom-up window shows function data in the grid at the top of the window and a timeline of threadactivity at the bottom of the window. Use this window to identify the most time-consuming functions andanalyze their call flow at the bottom-level - from a function to its parent functions. You can zoom in and filteron a section of the timeline for a more detailed view.

Double-click a function name to switch to the source code and assembly code view. VTune Amplifier willautomatically navigate to the source code line related to the function you selected.

Enabling Collection on a Remote Embedded Device 2

19

Next step: Prepare your own embedded applications for analysis using the VTune Amplifier to view hotspots.

2 Tutorial: Enabling Performance Collection on an Embedded Linux* System

20

Summary 3 You have completed the Enabling Performance Collection on an Embedded Linux* Device tutorial. Here

are some important things to remember when setting up your remote embedded device analysis using theIntel® VTune™ Amplifier for Systems:

Step Tutorial Recap Key Tutorial Take-aways

1. Configure theYocto Project*kernel

You configured a stable Yocto Projectkernel with the requiredconfigurations.

• Download and extract an appropriatetoolchain from the Yocto Project web siteand create an installation area.

• Set up a Yocto Project kernel for yourtarget device.

2. Add VTuneAmplifiersampling driversand buildoperating system

You compiled the VTune Amplifiersampling drivers, built the YoctoProject operating system, and copiedthe kernel to the target embeddeddevice.

• Build a Yocto Project kernel for yourtarget, including the appropriate VTuneAmplifier drivers. Adding the drivers canbe done on the host or target system.

• Copy the operating system and driversfrom the host system to the targetembedded device.

3. Load samplingdrivers andconfigure SSHaccess

You loaded the VTune Amplifierdrivers on the target embeddeddevice and set up password-less SSHaccess between the host and targetsystems.

• Configure ssh so there is no passwordrequest for file transfers between yourserver and target.

4. Prepare yoursampleapplication

You extracted and compiled thetachyon code.

• Unarchive tachyon in the /~yoctodirectory.

• Compile the application on the host andcopy it to the target or compile theapplication on the target.

5. Run AdvancedHotspot Analysis

You ran the VTune Amplifier GUI toconfigure and launch AdvancedHotspot Analysis on the tachyoncode on your target device. It ran onyour target and the results were sentvia ssh back to your server.

• Launch the GUI using the amplxe-guicommand.

• Use the Analysis Target tab to chooseand configure your analysis target.

• Use the Analysis Type tab to choose,configure, and run the Advanced HotspotAnalysis.

6. View yourresults

You viewed the Advanced Hotspotsanalysis on the tachyon applicationin the VTune Amplifier for SystemsGUI.

• You can also use the VTune Amplifiercommand-line interface by running theamplxe-cl command to test your codefor hotspots and regressions. For detailssee the Command-line Interface Supportsection in the VTune Amplifier onlinehelp.

21

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors foroptimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, andSSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, oreffectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certainoptimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer tothe applicable product User and Reference Guides for more information regarding the specific instructionsets covered by this notice.

Notice revision #20110804

3 Tutorial: Enabling Performance Collection on an Embedded Linux* System

22

Key Terms 4Baseline : A performance metric used as a basis for comparison of the application versions before and afteroptimization. Baseline should be measurable and reproducible.

CPU Time : The amount of time a thread spends executing on a logical processor. For multiple threads, theCPU time of the threads is summed. The application CPU time is the sum of the CPU time of all the threadsthat run the application.

CPU Usage: A performance metric when the VTune Amplifier identifies a processor utilization scale,calculates the target CPU usage, and defines default utilization ranges depending on the number of processorcores.

Utilization Type

Defaultcolor

Description

Idle All CPUs are waiting - no threads are running.

Poor Poor usage. By default, poor usage is when the number of simultaneouslyrunning CPUs is less than or equal to 50% of the target CPU usage.

OK Acceptable (OK) usage. By default, OK usage is when the number ofsimultaneously running CPUs is between 51-85% of the target CPU usage.

Ideal Ideal usage. By default, Ideal usage is when the number of simultaneouslyrunning CPUs is between 86-100% of the target CPU usage.

Elapsed Time : The total time your target ran, calculated as follows: Wall clock time at end ofapplication – Wall clock time at start of application.

Finalization : A process during which the Intel® VTune™ Amplifier converts the collected data to a database,resolves symbol information, and pre-computes data to make further analysis more efficient and responsive.

Hotspot: A section of code that took a long time to execute. Some hotspots may indicate bottlenecks andcan be removed, while other hotspots inevitably take a long time to execute due to their nature.

Advanced Hotspots Analysis: A non-default analysis type used to understand the application flow ofcontrol and to identify hotspots, that works directly with the CPU without the influence of the bootedoperating system. VTune Amplifier creates a list of functions in your application ordered by the amount oftime spent in a function. It also detects the call stacks for each of these functions so you can see how the hotfunctions are called. VTune Amplifier uses a low overhead (about 5%) user-mode sampling and tracingcollection that gets you the information you need without slowing down the application executionsignificantly.

A target is an executable file you analyze using the Intel® VTune™ Amplifier.

Host system: The system on which you install Intel VTune Amplifier and from which you launch yourapplication analysis and view those results.

Target System: The supported, embedded device on which you install sampling drivers and run theapplication on which you are running performance analysis.

Viewpoint : A preset result tab configuration that filters out the data collected during a performanceanalysis and enables you to focus on specific performance problems. When you select a viewpoint, you selecta set of performance metrics the VTune Amplifier shows in the windows/panes of the result tab. To select therequired viewpoint, click the (change) link and use the drop-down menu at the top of the result tab.

23