with rtai, mpich2, mpe, jumpshot, sar and hopefully soon oprofile or vtune dawn nelson csc523

20
With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523

Upload: stephen-campbell

Post on 26-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

With RTAI, MPICH2, MPE, Jumpshot,

Sar and hopefully soon OProfile or VTune

Dawn Nelson

CSC523

We already know that parallel programming makes good use of multiple processors, but we don’t always know if our programs are making the most efficient use of MPI.

Knowing how to use something doesn’t necessarily mean you know how it works.

There are many different ways to get data from point A to point B.

If you don’t know which way is better, try it, time it, profile it.

When adding in calls to the RTAI API, you can find out if you are adding overhead or benefit.

You can determine other bottlenecks or inefficiencies.

Within your code For overall timing, add gettimeofday() call in

your program at the beginning and end to get overall time.

Use MPI_WTime() to time MPI calls. Add MPE tags for user programmed events.

MPE already keeps track of send, receive, wait, barrier and other MPI calls.

Without changing code Use the system call sar (yum install sysstat). Use an open source profiler such as OProfile. Use a purchased profiler such as VTune.

MPE tags in your code look like this: #include <mpe.h> int event,event1a, event1b; MPE_Log_get_state_eventIDs( &event1a, &event1b ); MPE_Describe_state(event1a,event1b,"slave compute“,"orange");

MPE_Start_log(); //start of code section to profile …… //your code block here MPE_Log_event(event1b, 0, "slave_compute_end"); //logging environment ends with MPI_Finalize();

A clog2 file is generated. Use Jumpshot to graphically display the

file:>jumpshot filename.clog2

Matrix Multiply, using 4 nodes, master-slave scenario

Matrix Multiply from the midterm with MPI_Barrier() call

Real Time computing is done at the highest priority (SCHED_FIFO 99)

Real Time computing is non-preemptible.

Real Time computing blocks interrupts and interference from the OS and from users.

Reduces Jitter and Latency: processes finish at the same time.

For high compute to IO ratios, Real Time improves overall processing speed.

RTAI-LXRT Architecture

RTAI-MPI Architecture

Soft LXRT Real Time implementation.

Matrix Multiply without RTAI

Jitter is the dissimilar completion times of MPI workers, as it effects the timing of the overall process.

Jitter is caused by interference of interrupts by the operating system, other processes, other users.

By running a process 100 times and collecting the overall time, jitter can be measured.

Jitter was measured using an MPI application with and without injected interference from a cpu hog.

Jitter was measured using an RTAI-MPI application with and without injected interference from a cpu hog.

MPE shows chronologically when MPI calls are done and completed but not the effects of programs on the cpu, memory, interrupts, swap, I/O, tcp

The sysstat command sar will collect information about the above listed items.

Open source kSar java program will parse the text file generated by sar and graph everything.

Notice, big drop in amount of interrupts while lxrt-soft program is running

RTAI lxrt-soft real time

No RTAI

Finish testing the I/O hog program and implement it.

Current kSar graph data was obtained from my home system and not the C107 cluster. The kSar program was not working for importing a text file, as well as not working remotely via ssh from home to cluster.

Implement OProfile or VTune for additional profiling on master and one or more worker nodes.

Obtain sar and kSar data for master and one or more worker nodes on cluster.

Obtain MPE data while using a cpu hog and an interrupt hog.

Obtain sar and kSar data while using a cpu hog and an interrupt hog.

Determine what the interrupt levels mean: i014 etc.

Perform system benchmarking Write the corresponding paper for this

project. Determine the scalability of a real time

and parallel application. There is a little bit of overhead using high priority and RTAI.

http://oprofile.sourceforge.netprofiling http://www.bitmover.com/lmbench/ http://www.cyberciti.biz/tips/identifying-linux-bottlenecks-

sar-graphs-with-ksar.html http://www.cyberciti.biz/tips/linux-resource-utilization-to-

detect-system-bottlenecks.html http://www.cyberciti.biz/tips/top-linux-monitoring-

tools.html http://www.howtoforge.com/sysstat_monitoring_centos http://www.opersys.com/ftp/pub/LTT/Documentation/p-

a06_yaghmour.pdf http://www.opersys.com/LTT/documentation.html http://www.linux-mag.com/cache/7728/1.html http://www.rtai.org