Download - Report writing
How to write a good report
With material from Simon Peyton Jones’s “Research Skills”
(http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk.htm
)
Tony Field
Lesson 101: Get your story right
Many projects contain good ideas, but do not distil what they are.
Figure out what your story is What’s the problem (motivation)? What’s the key idea (your solution/approach)? What’s the likely benefit/tradeoff? What did you build (deliverable)? Did it work? What did you conclude (insight)? Is what you’ve produced usable/useful (utility)?
Hint: Sketch your story on a piece of paper using simple, clear words. Be 100% explicit.
Remember that your project is an investigation - finding out that your idea didn’t work is not the same as failing your project
The IDEA
The deliverable
The benefit
The problem
The conclusion
The utility
Example story sketch A
The Haskell GHC RTS has a garbage collector Real-time applications need real-time collectors Classical real-time collectors rely on read
barriers, which are expensive We can use GHC’s dynamic dispatch to build a
really cheap barrier by “hijacking” an object’s entry code
We already pay up front for the dynamic dispatch; the extra work to build the barrier is minimal
The payoff is that there is no barrier when the collector is off, and only then when an object has yet to be copied
We’ve built an implementation in GHC vX. It works! Via benchmarking, the barrier overhead is found to
be small – much smaller than existing schemes In principle, the idea can be used in any
implementation based on dynamic dispatch, e.g. the JVM
The IDEA
The deliverable
The benefit
The problem
The conclusion
The utility
Example story sketch B
The Haskell GHC RTS has a garbage collector Real-time applications need real-time collectors Classical real-time collectors rely on read
barriers, which are expensive We can use GHC’s dynamic dispatch to build a
really cheap barrier by “hijacking” an object’s entry code
We already pay up front for the dynamic dispatch; the extra work to build the barrier is minimal
The payoff is that there is no barrier when the collector is off, and only then when an object has yet to be copied
We’ve built an implementation in GHC vX. It only works with one or two simple examples
For those codes that work the overheads are enormous
More work is needed, but it looks like the idea doesn’t pay. But there is an alternative...
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
The abstract
An extremely condensed summary of the project, comprising a small number of sentences
Make minimal references to the background
Focus on the project content Hint: write the abstract last
Example
AbstractAn technique is described for incorporating Baker's incremental garbage collection algorithm into the run-time system of the Glasgow Haskell Compiler (GHC) at very low cost. The technique exploits the fact that GHC objects are always accessed by jumping to “entry code” rather than by dereferencing the object explicitly. The idea is to hijack the entry code pointer when an object is in the transient state of being evacuated but not scavenged. An attempt to access the object in this state causes it to “self-scavenge” transparently before executing its normal entry code. This supports the normally expensive read barrier, but at very low cost as the barrier is only active when the object is in the transient state; in particular, there is no barrier overhead at all when the collector is off. An implementation in v4.01 of GHC is described, together with performance results for a subset of the “nofib” benchmark suite. These experiments show that the Baker collector can be implemented in GHC with very short program pause times and with negligible overhead on execution time compared with stop-the-world schemes.
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
The introduction
Now tell your story Keep it simple and to the point Avoid philosophical rambling Avoid excessive background Use examples to explain the problem and
your idea State your contributions and where in the
report they can be found
Minimise the background – cut to the
chase
‘For example’s help to clarify the problem. Where possible list
code.
Example introduction
1. Introduction
A significant drawback of automatic garbage collection is the detrimental effect it can have on the responsiveness of a program. While the system is collecting garbage it is unavailable to user programs for useful work, and this can cause a program to appear `dead’ for extended periods of time. This problem is caused by the fact that garbage collection algorithms typically collect all of the system‘s free store in one go, an operation that takes time at least proportional to the number of live objects.
This “unbounded pause” problem has been exacerbated by the tendency towards larger memories and it has discouraged the use of garbage-collected programming languages for many interactive or real-time applications. In the former, for example, the acid test is the responsiveness of the system when performing intensive I/O, e.g. when tracking a mouse or performing continuous real-time animation.
An algorithm that avoids this problem, at least in the context of copying collection schemes [5], is Baker‘s incremental algorithm [3] (an excellent general overview is also given in [7]). Baker's scheme interleaves program execution with small incremental copying phases, causing the program to pause periodically for short periods rather than stopping completely to perform the copy in one go.
The major problem with the implementation of Baker‘s scheme, at least in software, is the very high cost associated with the read barrier, which is a check required at each pointer load to ensure that the referenced object has been copied if it needs to be.
In this report a technique is developed for supporting Baker’s algorithm at very low cost in systems that already make use of dynamic dispatch. The idea is to use the dispatch mechanism to change the behaviour of an object depending upon its state. Blah blah…
The contributions
The contributions drive the rest of the report: the report substantiates the claims you have made
The reader thinks “gosh, if they can really deliver this, that’s be exciting; I’d better read on”
No “rest of the report is...”
Not:
Instead: use forward references from the narrative in the introduction.
“The rest of the report is structured as follows. Chapter 2 presents the background. Chapter 3 ... Finally, Chapter 8 concludes...”.
Example contributions
Make the contributions clear, e.g.
by a bulleted list
A simple technique is presented that exploits GHC’s underlying dynamic-dispatch mechanism to support the read barrier invariant (Chapter 2). The key advantage is that there is no barrier overhead associated with object references when either i. the garbage collector is off, or ii. the garbage collector is on and the object has already been scavenged.
The idea is extended to treat stack activation records in a similar way, so that stacks, too, can be collected incrementally (Chapter 3).
A prototype implementation for the Glasgow Haskell Compiler (GHC) is described in detail, including some of the key design tradeoffs (Chapter 4).
Using benchmarks from the “nofib” suite, the average overhead on program execution time is shown to be less than 4% for the benchmarks tested; sub-millisecond average pause times are also achieved for all benchmarks (Chapter 5).
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
Background
Make it clear that you're aware of the important key work in the subject
Provide enough detail to make the report as self-contained as possible, but no more
Avoid 'routine' background, e.g. An intro to the C programming language
Don't quote material that’s irrelevant or that you haven't read
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
The details
Say what you did, how and why (“method”); this is highly project-specific
Justify your decisions regarding design, tools, techniques etc.
Emphasise the key challenges that you encountered and how you overcame them
Intuition first…
Explain your ideas as if you were speaking to someone using a whiteboard. Only then go into the details.
Conveying the intuition is primary, not secondary
Once your reader has the intuition, they can follow the details (but not vice versa)
Even if they skip the details, they still take away something valuable
ALWAYS use examples and diagrams to help
Presenting your ideas
Do not recapitulate your personal journey of discovery. This route may be soaked with your blood, but that is not interesting to the reader.
However, “failed attempts” can often add insight (First Attempt… Second Attempt…)
Omit obvious implementation details. Focus instead on those that really matter.
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
Evaluation
You must evaluate your successes and failures, bearing in mind what others have already done
Make sure that each of your claimed contributions in the introduction is backed up with evidence
Quantitative evaluation is about performing experiments and making sense of hard numerical results
Qualitative evaluation is about making judgements – it is more “slippery” and subjective
Quantitative evaluation
Detail your experiment(s) The reader must be able to reproduce your
results Present results in tables and/or graphs Discuss the results and provide insight Explain interesting artefacts. Don’t leave it
to the reader to interpret your results (this is the hard bit!)
Make it absolutely clear what you did
The reader must be able to reproduce your
results
Example: Detailing experiments
The experiments were conducted on a 2.8GHz Pentium IV with 1GB RAM, a 512K level-2 cache, a 16KB instruction cache and a 16KB, 4-way set-associative level-1 data cache with 32-byte lines. The system ran Mandrake Linux 10.1 with a 2.6.15.1 kernel in single-user mode. All results reported are averages taken over five runs and are expressed as overheads with respect to reference data.
Each application was run in two configurations: one with a heap large enough for the application to complete without causing a garbage collection (typically around 110Mb), and one with a smaller initial heap (256kB). It is worth noting that the execution time of the application must be controlled such that the total allocation does not exceed the larger heap value (which would result in a garbage collection). The value is constrained to just below the amount of memory in the benchmark machine. With profiling turned off the difference between the two execution times tells us the total cost of garbage collection. For the new scheme the application was first executed with profiling turned on in order to determine the total number of mutator pauses and the heap allocation rate in bytes/s. The timings taken from runs without profiling were used to determine the mean pause time.
Each application was run with four garbage collection algorithms: stop-and-copy (with all optimisations turned on), a traditional implementation of Baker with a software read-barrier, and the two versions of the new scheme, EA and EB. The Baker implementation was set to scavenge at each block allocation, as per EB. In each case the mark-ratio was 2 for the runs of EA and 2048 for the runs of EB.
The artefactThe insight
Example: Explaining artefacts
Table 5.3 demonstrates some unusual behaviour. Averaged over all 36 benchmarks the incremental-B schemes are actually running faster than stop-and-copy, which is unexpected. Similarly, some applications (Bernoulli, Gamteb, Pic and Symalg) exhibit substantial speed-ups (35-79%) across all variants of the incremental scheme. There are also significant outliers in the other direction (Paraffins and scs, for example). Wave4main curiously exhibits extreme speed-ups across all -B variants and extreme slow-downs across all -A variants.
Because the effects are seen across the board it suggests that it is an artefact of the incremental nature of the garbage collector, rather than the idiosyncrasies of any particular application.
The extreme behaviours are attributable to two different aspects of incremental collector behaviour. The extreme speed-ups are an artefact of long-lived data and the fact that stop-and-copy collection happens instantaneously with respect to mutator execution...
The extreme slow-downs are an artefact of incremental collection cycles that are forced to completion. The stop-and-copy collector sizes the nursery based on the amount of live data at the end of the current cycle, while the incremental collector can only use that of the previous cycle...
Qualitative evaluation
Try to relate experience, rather than rely on speculation, e.g. “Users were able to use the system by just being told
the rules of the game…” “The use of shadows gave much better depth
perception and made it easier to…” “The need to cast all numerical values to doubles was
a universally unpopular feature…” Hint: be quantitative wherever possible
E.g. size of code, time to solve a given problem, number of mouse clicks…
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
Conclusions and future work
The conclusion is not a summary of the project, although you might provide a brief reminder
Summarise those things that you have learnt by the end of the project, that may not have been obvious at the beginning
Use the further work to suggest fixes to your approach and to highlight new ideas that warrant a new investigation
Example conclusions
8. Conclusions and Future Work
A technique has been described for incorporating Baker’s incremental garbage collection into the GHC run-time. Each closure creates its own ‘personal’ read-barrier by manipulating its entry code-pointer during execution and garbage collection. The technique derives its efficiency from the twin facts that these manipulations are very cheap and that they are performed on a per-closure basis, rather than a per-pointer-load basis, as in most implementations of the read-barrier. Experiments suggest that the new scheme carries a very low overhead, averaging less than 4% in execution time for the benchmarks tested, with average pause times being sub-millisecond in all cases. Etc.
We have also seen that the block-based memory allocator of the GHC run-time system provides an excellent level of granularity at which to perform incremental scavenging. By scavenging on each block allocation we can keep the scavenger going for longer, at the expense of an increase in pause time. This substantially reduces the overheads in the proposed scheme and provides an additional mechanism (the block size) by which to control the behaviour of the garbage collector. Etc.
8.1 Further Work
In principle, the scheme can be applied to any system that supports pure dynamic dispatch. The Java Virtual Machine (JVM), for example, uses dynamic dispatch to implement virtual method invocations, although other method calls are supported using more conventional function calling mechanisms. An interesting approach would be to modify the JVM so that all method calls are implemented by dynamic dispatch so that the technique described here can be used to support incremental collection with minimal additional overhead. There is then an interesting performance trade-off: is it cheaper to build a permanent read barrier in the current JVM, or to pay the price of full virtualisation and then get the read barrier (almost) for free? This is left as an open question that is worthy of future investigation. Etc.
Report Structure
Title Abstract Introduction Background The details Evaluation/Discussion Conclusions and further work Bibliography (Appendices)
Bibliography
Cite your sources properly, e.g.[4] A. Printezis and A. Garthwaite, “Visualising the Train garbage collector”, Proceedings of the Third International Symposium on Memory Management (ISMM’02), ACM SIGPLAN Notices, pp. 100-105, Berlin, June 2002. ACM Press.
[5] The libunwind project, http://www.hpl.hp.com/research/linux/libunwind/
The reader should be able to locate all your sources of inspiration directly
Language and style
Submission
Submit two hard copies by the deadline (NO extensions)
More pages does not imply more marks – popular binder sizes are 75-100 and 100-130 Print double sided, single-spaced Do not narrow or expand the margins
Do not use 6pt font or 16pt font
Where appropriate, use an appendix for: a user guide, experimental data, traces, detailed proof, etc.
Do not include program listings in the report Always use a spell checker (Always use LaTeX!)
Visual structure
Give strong visual structure to your report using sections and sub-sections bullets italics laid-out code
Find out how to draw beautiful pictures, and use them
Always refer explicitly to figures in the text
Visual structure
Make it engaging
NO YESIt can be seen that... We can see that...
The code must then be analysed by the programmer
The programmer must then analyse the code
It might be thought that this would be a type error
You might think this would be a type error
Use the active voice (subject-verb-object) and engage the reader “We” = you
and the reader
Active verb
“You” = the reader
Use simple, direct language
NO YES
This pattern is exemplified pictorially in Figure 5.5
This is shown in Figure 5.5
The object under study was displaced horizontally
The ball moved sideways
Endeavour to ascertain Find out
It could be considered that the speed of storage reclamation left
something to be desired
The garbage collector was really slow
Avoid “noise words”
NO YES
The code is then simply re-compiled The code is then re-compiled
The actual memory used was never more than 256KB
The memory used was never more than 256KB
In actual fact, and completely contrary to what might have been expected, it turned out that the second approach proved to be much the more efficient
of the two
Surprisingly, the second approach was much more efficient
However, of course, this would break the type system
However, this would break the type system
Some other sources
Research Skills (Simon Peyton Jones) (http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk.htm) (Thanks to Simon for providing the template for this talk)
Scientific Communication http://undergraduate.csse.uwa.edu.au/units/CITS7200/
Advice on Research and Writing http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/web/how-to.html
How to write plain Englishhttp://www.plainenglish.co.uk/howto.pdf