report writing

How to write a good report

With material from Simon Peyton Jones’s “Research Skills”

(http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk.htm

)

Tony Field

Lesson 101: Get your story right

Many projects contain good ideas, but do not distil what they are.

Figure out what your story is What’s the problem (motivation)? What’s the key idea (your solution/approach)? What’s the likely benefit/tradeoff? What did you build (deliverable)? Did it work? What did you conclude (insight)? Is what you’ve produced usable/useful (utility)?

Hint: Sketch your story on a piece of paper using simple, clear words. Be 100% explicit.

Remember that your project is an investigation - finding out that your idea didn’t work is not the same as failing your project

The IDEA

The deliverable

The benefit

The problem

The conclusion

The utility

Example story sketch A

The Haskell GHC RTS has a garbage collector Real-time applications need real-time collectors Classical real-time collectors rely on read

barriers, which are expensive We can use GHC’s dynamic dispatch to build a

really cheap barrier by “hijacking” an object’s entry code

We already pay up front for the dynamic dispatch; the extra work to build the barrier is minimal

The payoff is that there is no barrier when the collector is off, and only then when an object has yet to be copied

We’ve built an implementation in GHC vX. It works! Via benchmarking, the barrier overhead is found to

be small – much smaller than existing schemes In principle, the idea can be used in any

implementation based on dynamic dispatch, e.g. the JVM

The IDEA

The deliverable

The benefit

The problem

The conclusion

The utility

Example story sketch B

The Haskell GHC RTS has a garbage collector Real-time applications need real-time collectors Classical real-time collectors rely on read

barriers, which are expensive We can use GHC’s dynamic dispatch to build a

really cheap barrier by “hijacking” an object’s entry code

We already pay up front for the dynamic dispatch; the extra work to build the barrier is minimal

The payoff is that there is no barrier when the collector is off, and only then when an object has yet to be copied

We’ve built an implementation in GHC vX. It only works with one or two simple examples

For those codes that work the overheads are enormous

More work is needed, but it looks like the idea doesn’t pay. But there is an alternative...

Report Structure

Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)

The abstract

An extremely condensed summary of the project, comprising a small number of sentences

Make minimal references to the background

Focus on the project content Hint: write the abstract last

Example

AbstractAn technique is described for incorporating Baker's incremental garbage collection algorithm into the run-time system of the Glasgow Haskell Compiler (GHC) at very low cost. The technique exploits the fact that GHC objects are always accessed by jumping to “entry code” rather than by dereferencing the object explicitly. The idea is to hijack the entry code pointer when an object is in the transient state of being evacuated but not scavenged. An attempt to access the object in this state causes it to “self-scavenge” transparently before executing its normal entry code. This supports the normally expensive read barrier, but at very low cost as the barrier is only active when the object is in the transient state; in particular, there is no barrier overhead at all when the collector is off. An implementation in v4.01 of GHC is described, together with performance results for a subset of the “nofib” benchmark suite. These experiments show that the Baker collector can be implemented in GHC with very short program pause times and with negligible overhead on execution time compared with stop-the-world schemes.

Report Structure


The introduction

Now tell your story Keep it simple and to the point Avoid philosophical rambling Avoid excessive background Use examples to explain the problem and

your idea State your contributions and where in the

report they can be found

Minimise the background – cut to the

chase

‘For example’s help to clarify the problem. Where possible list

code.

Example introduction

1. Introduction

A significant drawback of automatic garbage collection is the detrimental effect it can have on the responsiveness of a program. While the system is collecting garbage it is unavailable to user programs for useful work, and this can cause a program to appear `dead’ for extended periods of time. This problem is caused by the fact that garbage collection algorithms typically collect all of the system‘s free store in one go, an operation that takes time at least proportional to the number of live objects.

This “unbounded pause” problem has been exacerbated by the tendency towards larger memories and it has discouraged the use of garbage-collected programming languages for many interactive or real-time applications. In the former, for example, the acid test is the responsiveness of the system when performing intensive I/O, e.g. when tracking a mouse or performing continuous real-time animation.

An algorithm that avoids this problem, at least in the context of copying collection schemes [5], is Baker‘s incremental algorithm [3] (an excellent general overview is also given in [7]). Baker's scheme interleaves program execution with small incremental copying phases, causing the program to pause periodically for short periods rather than stopping completely to perform the copy in one go.

The major problem with the implementation of Baker‘s scheme, at least in software, is the very high cost associated with the read barrier, which is a check required at each pointer load to ensure that the referenced object has been copied if it needs to be.

In this report a technique is developed for supporting Baker’s algorithm at very low cost in systems that already make use of dynamic dispatch. The idea is to use the dispatch mechanism to change the behaviour of an object depending upon its state. Blah blah…

The contributions

The contributions drive the rest of the report: the report substantiates the claims you have made

The reader thinks “gosh, if they can really deliver this, that’s be exciting; I’d better read on”

No “rest of the report is...”

Not:

Instead: use forward references from the narrative in the introduction.

“The rest of the report is structured as follows. Chapter 2 presents the background. Chapter 3 ... Finally, Chapter 8 concludes...”.

Example contributions

Make the contributions clear, e.g.

by a bulleted list

A simple technique is presented that exploits GHC’s underlying dynamic-dispatch mechanism to support the read barrier invariant (Chapter 2). The key advantage is that there is no barrier overhead associated with object references when either i. the garbage collector is off, or ii. the garbage collector is on and the object has already been scavenged.

The idea is extended to treat stack activation records in a similar way, so that stacks, too, can be collected incrementally (Chapter 3).

A prototype implementation for the Glasgow Haskell Compiler (GHC) is described in detail, including some of the key design tradeoffs (Chapter 4).

Using benchmarks from the “nofib” suite, the average overhead on program execution time is shown to be less than 4% for the benchmarks tested; sub-millisecond average pause times are also achieved for all benchmarks (Chapter 5).

Report Structure


Background

Make it clear that you're aware of the important key work in the subject

Provide enough detail to make the report as self-contained as possible, but no more

Avoid 'routine' background, e.g. An intro to the C programming language

Don't quote material that’s irrelevant or that you haven't read

Report Structure


The details

Say what you did, how and why (“method”); this is highly project-specific

Justify your decisions regarding design, tools, techniques etc.

Emphasise the key challenges that you encountered and how you overcame them

Intuition first…

Explain your ideas as if you were speaking to someone using a whiteboard. Only then go into the details.

Conveying the intuition is primary, not secondary

Once your reader has the intuition, they can follow the details (but not vice versa)

Even if they skip the details, they still take away something valuable

ALWAYS use examples and diagrams to help

Presenting your ideas

Do not recapitulate your personal journey of discovery. This route may be soaked with your blood, but that is not interesting to the reader.

However, “failed attempts” can often add insight (First Attempt… Second Attempt…)

Omit obvious implementation details. Focus instead on those that really matter.

Report Structure


Evaluation

You must evaluate your successes and failures, bearing in mind what others have already done

Make sure that each of your claimed contributions in the introduction is backed up with evidence

Quantitative evaluation is about performing experiments and making sense of hard numerical results

Qualitative evaluation is about making judgements – it is more “slippery” and subjective

Quantitative evaluation

Detail your experiment(s) The reader must be able to reproduce your

results Present results in tables and/or graphs Discuss the results and provide insight Explain interesting artefacts. Don’t leave it

to the reader to interpret your results (this is the hard bit!)

Make it absolutely clear what you did

The reader must be able to reproduce your

results

Example: Detailing experiments

The experiments were conducted on a 2.8GHz Pentium IV with 1GB RAM, a 512K level-2 cache, a 16KB instruction cache and a 16KB, 4-way set-associative level-1 data cache with 32-byte lines. The system ran Mandrake Linux 10.1 with a 2.6.15.1 kernel in single-user mode. All results reported are averages taken over five runs and are expressed as overheads with respect to reference data.

Each application was run in two configurations: one with a heap large enough for the application to complete without causing a garbage collection (typically around 110Mb), and one with a smaller initial heap (256kB). It is worth noting that the execution time of the application must be controlled such that the total allocation does not exceed the larger heap value (which would result in a garbage collection). The value is constrained to just below the amount of memory in the benchmark machine. With profiling turned off the difference between the two execution times tells us the total cost of garbage collection. For the new scheme the application was first executed with profiling turned on in order to determine the total number of mutator pauses and the heap allocation rate in bytes/s. The timings taken from runs without profiling were used to determine the mean pause time.

Each application was run with four garbage collection algorithms: stop-and-copy (with all optimisations turned on), a traditional implementation of Baker with a software read-barrier, and the two versions of the new scheme, EA and EB. The Baker implementation was set to scavenge at each block allocation, as per EB. In each case the mark-ratio was 2 for the runs of EA and 2048 for the runs of EB.

The artefactThe insight

Example: Explaining artefacts

Table 5.3 demonstrates some unusual behaviour. Averaged over all 36 benchmarks the incremental-B schemes are actually running faster than stop-and-copy, which is unexpected. Similarly, some applications (Bernoulli, Gamteb, Pic and Symalg) exhibit substantial speed-ups (35-79%) across all variants of the incremental scheme. There are also significant outliers in the other direction (Paraffins and scs, for example). Wave4main curiously exhibits extreme speed-ups across all -B variants and extreme slow-downs across all -A variants.

Because the effects are seen across the board it suggests that it is an artefact of the incremental nature of the garbage collector, rather than the idiosyncrasies of any particular application.

The extreme behaviours are attributable to two different aspects of incremental collector behaviour. The extreme speed-ups are an artefact of long-lived data and the fact that stop-and-copy collection happens instantaneously with respect to mutator execution...

The extreme slow-downs are an artefact of incremental collection cycles that are forced to completion. The stop-and-copy collector sizes the nursery based on the amount of live data at the end of the current cycle, while the incremental collector can only use that of the previous cycle...

Qualitative evaluation

Try to relate experience, rather than rely on speculation, e.g. “Users were able to use the system by just being told

the rules of the game…” “The use of shadows gave much better depth

perception and made it easier to…” “The need to cast all numerical values to doubles was

a universally unpopular feature…” Hint: be quantitative wherever possible

E.g. size of code, time to solve a given problem, number of mouse clicks…

Report Structure


Conclusions and future work

The conclusion is not a summary of the project, although you might provide a brief reminder

Summarise those things that you have learnt by the end of the project, that may not have been obvious at the beginning

Use the further work to suggest fixes to your approach and to highlight new ideas that warrant a new investigation

Example conclusions

8. Conclusions and Future Work

A technique has been described for incorporating Baker’s incremental garbage collection into the GHC run-time. Each closure creates its own ‘personal’ read-barrier by manipulating its entry code-pointer during execution and garbage collection. The technique derives its efficiency from the twin facts that these manipulations are very cheap and that they are performed on a per-closure basis, rather than a per-pointer-load basis, as in most implementations of the read-barrier. Experiments suggest that the new scheme carries a very low overhead, averaging less than 4% in execution time for the benchmarks tested, with average pause times being sub-millisecond in all cases. Etc.

We have also seen that the block-based memory allocator of the GHC run-time system provides an excellent level of granularity at which to perform incremental scavenging. By scavenging on each block allocation we can keep the scavenger going for longer, at the expense of an increase in pause time. This substantially reduces the overheads in the proposed scheme and provides an additional mechanism (the block size) by which to control the behaviour of the garbage collector. Etc.

8.1 Further Work

In principle, the scheme can be applied to any system that supports pure dynamic dispatch. The Java Virtual Machine (JVM), for example, uses dynamic dispatch to implement virtual method invocations, although other method calls are supported using more conventional function calling mechanisms. An interesting approach would be to modify the JVM so that all method calls are implemented by dynamic dispatch so that the technique described here can be used to support incremental collection with minimal additional overhead. There is then an interesting performance trade-off: is it cheaper to build a permanent read barrier in the current JVM, or to pay the price of full virtualisation and then get the read barrier (almost) for free? This is left as an open question that is worthy of future investigation. Etc.

Report Structure


Bibliography

Cite your sources properly, e.g.[4] A. Printezis and A. Garthwaite, “Visualising the Train garbage collector”, Proceedings of the Third International Symposium on Memory Management (ISMM’02), ACM SIGPLAN Notices, pp. 100-105, Berlin, June 2002. ACM Press.

[5] The libunwind project, http://www.hpl.hp.com/research/linux/libunwind/

The reader should be able to locate all your sources of inspiration directly

Language and style

Submission

Submit two hard copies by the deadline (NO extensions)

More pages does not imply more marks – popular binder sizes are 75-100 and 100-130 Print double sided, single-spaced Do not narrow or expand the margins

Do not use 6pt font or 16pt font

Where appropriate, use an appendix for: a user guide, experimental data, traces, detailed proof, etc.

Do not include program listings in the report Always use a spell checker (Always use LaTeX!)

Visual structure

Give strong visual structure to your report using sections and sub-sections bullets italics laid-out code

Find out how to draw beautiful pictures, and use them

Always refer explicitly to figures in the text

Visual structure

Make it engaging

NO YESIt can be seen that... We can see that...

The code must then be analysed by the programmer

The programmer must then analyse the code

It might be thought that this would be a type error

You might think this would be a type error

Use the active voice (subject-verb-object) and engage the reader “We” = you

and the reader

Active verb

“You” = the reader

Use simple, direct language

NO YES

This pattern is exemplified pictorially in Figure 5.5

This is shown in Figure 5.5

The object under study was displaced horizontally

The ball moved sideways

Endeavour to ascertain Find out

It could be considered that the speed of storage reclamation left

something to be desired

The garbage collector was really slow

Avoid “noise words”

NO YES

The code is then simply re-compiled The code is then re-compiled

The actual memory used was never more than 256KB

The memory used was never more than 256KB

In actual fact, and completely contrary to what might have been expected, it turned out that the second approach proved to be much the more efficient

of the two

Surprisingly, the second approach was much more efficient

However, of course, this would break the type system

However, this would break the type system

Some other sources

Research Skills (Simon Peyton Jones) (http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk.htm) (Thanks to Simon for providing the template for this talk)

Scientific Communication http://undergraduate.csse.uwa.edu.au/units/CITS7200/

Advice on Research and Writing http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/web/how-to.html

How to write plain Englishhttp://www.plainenglish.co.uk/howto.pdf

report writing

Technology

barrier overhead

ghc objects

idea didnt work

problem realtime applications

ghc vx

useful work

areally cheap barrier

use of garbage