overview of msil

4

Click here to load reader

Upload: ganesh-samarthyam

Post on 06-May-2015

577 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Overview Of Msil

An Overview of MSIL

The .NET architecture addresses an important need - language interoperability. Instead of generating native code that is specific to one platform, programming languages can generate code in MSIL (Microsoft Intermediate Language) targeting the Common Language Runtime (CLR) to reap the rich benefits provided by .NET.

Advanced programmers occasionally peek into MSIL code when they are in doubt of what is happening under the hood (using the Ildasm tool). Therefore, it is essential that the C# programmer understands the basics of MSIL. This beginner-level article gives an overview of MSIL and debugging with the Ildasm tool.

System RequirementsThe programming examples in this article use C# as the source language for generating MSIL code, and so the reader is expected to have some basic understanding of C#. No prior exposure to MSIL is necessary. In addition, the reader is assumed to know what a stack data structure is. It is preferable that the reader has access to the Ildasm tool and the C# compiler.

Article StructureThe article has three main sections:

● An Overview of MSIL: The basics of MSIL, the data types, instruction types, and the way that the instructions are executed are explained.

● Examining MSIL: This section covers MSIL using simple example programs.

● Debugging Using the Ildasm tool: Explains the use of the intermediate language disassembler (Ildasm) and the way it can be used for debugging.

.NET supports several high-level languages such as C#, VB.NET and Managed C++.NET. The MSIL is designed to accommodate a wide range of languages. In .NET, the unit of deployment is the PE (Portable Executable) file - a predefined binary standard (similar to the class files of Java). MSIL, along with metadata, is stored inside the PE files generated by the compiler. MSIL is such a simple language that it doesn't require much effort to understand. Metadata describes the types - its definition, signature, etc - that are useful at runtime.

An Overview of MSIL MSIL is a CPU independent, stack-based instruction set that can be efficiently converted to the native code of a specific platform. In this stack-based approach, the representation assumes the presence of a run-time stack and the code is generated keeping the stack in mind. The runtime environment may use the stack for evaluation of expressions, and store the intermediate values in the stack itself. Such an evaluation using a runtime stack is a form of interpretation. In practice, the MSIL is not interpreted - there is a Just-In-Time (JIT) compiler that translates the intermediate code to native code to execute in a particular platform at runtime. The stack-based code facilitates maximum portability across the platforms and is easy to verify.

The MSIL:

Page 2: Overview Of Msil

● Supports object oriented programming. ● Works in terms of the data types available in the .NET Framework, for example, System.String

and System.Int32. ● Instructions can be classified into various types such as: loading (ld*), storing (st*), method

invocation, arithmetic operations, logical operations, control flow, memory allocation, and exception handling. The following section covers basic instructions using examples.

Examining MSIL Let us start with the following simple C# code, and see how it is compiled to intermediate code. Console.WriteLine("hello world");

The MSIL code looks like this (using the Ildasm tool that is discussed later). // disassembled code using ildasm tool ldstr "hello world"call void [mscorlib]System.Console::WriteLine(string)

Now let us examine how it works:

The ldstr (standing for 'load string') instruction indicates that the string constant "hello world" be pushed onto the evaluation stack.

● The call instruction is for calling a method. Here, the call is made for the static WriteLine method of the Console class that is available in mscorlib.dll, in the System namespace. The WriteLine method takes a string as the argument and its return type is void.

It executes as follows:

● The ldstr instruction pushes the reference to the constant "hello world" into the stack. ● The call method calls the WriteLine method, which looks for a string argument, and pops it

from the stack. Now the stack contains nothing. The WriteLine method now executes to print the message " hello world " on the screen and returns.

As you can see, understanding the MSIL code is far from difficult! If you have prior exposure to any assembly language, it will be very easy for you to learn MSIL.

From this simple program, let us move on to a program illustrating branching and arithmetic instructions. // C# source code int i = 10;if(i!=20)i = i*20;Console.WriteLine(i);

// disassembled MSIL code using ildasm tool IL_0000: ldc.i4.s 10IL_0002: stloc.0IL_0003: ldloc.0IL_0004: ldc.i4.s 20IL_0006: beq.s IL_000dIL_0008: ldloc.0IL_0009: ldc.i4.s 20IL_000b: mulIL_000c: stloc.0IL_000d: ldloc.0

Page 3: Overview Of Msil

IL_000e: call void [mscorlib]System.Console::WriteLine(int32)

You can see that lots of MSIL code has been generated for this simple C# code, but it is simple once you understand what the instructions do. You can see that the instructions are preceded by IL_xxxx: - these are labels used so that it is possible to 'jump' from one part of the code to another.

The ldc.i4.s (stands for 'load constant'.'four byte integer'.'single byte argument') instruction pushes the integer constant 10 onto the stack.

The stloc.0 (stands for 'store in location'.'zeroeth variable') instruction pops the integer constant 10 from the stack and stores it in the variable number 0 (local variables are remembered by counting them from 0).

The ldloc.0 (stands for 'load from location'.'zeroeth variable') instruction loads the value of the variable from location zero (i.e. variable i in the source code) and push it onto the stack.

The ldc.i4.s instruction pushes the integer constant 20 onto the stack.

The beq.s (stands for 'branch if equal to'.' single byte argument') instruction pops two items from the stack and checks if they are equal and if so, it transfers the control to the instruction at the location identified by the label IL_000d.

The ldloc.0 instruction pushes the value of variable i onto the stack.

The ldc.i4.s instruction pushes the integer constant 0 onto the stack.

The mul (stands for 'multiply') instruction pops two items from the stack, multiplies the values, and pushes the result back to the stack. Now the result of the multiplication is at the top of the stack.

The stloc.0 instruction pops the top value from the stack (the result of the multiplication in this case) and stores it in variable i.

The ldloc.0 instruction pushes the value of i onto the stack

The call (stands for 'call the method') instruction calls the WriteLine method that takes an integer as an argument. The WriteLine method pops the value from the stack and displays it on the screen.

Debugging Using ILDASM ToolMicrosoft's .NET SDK is shipped with an IL disassembler, Ildasm.exe (usually located in the directory \Program Files\Microsoft.Net\FrameworkSDK\Bin). A disassembler loads your assemblies and shows the MSIL code with other details in the assembly.

This tool can be handy in debugging code once you become proficient at understanding MSIL code. How can MSIL help in debugging?

Bugs happen in code when there is a mismatch between what we expect the code to do and what the code actually does. If we can dig down to a lower level and see what the machine is actually doing with our code, it is easier to spot the mismatch. That is the idea behind using ILDASM for debugging. Let us look at an example and see how we can debug the code. The following innocent looking code doesn't work as you'd expect. It doesn't print " yes, o1 == o2 " as we'd expect, even though the code is straightforward.int i = 10; object o1 = i, o2 = i;if(o1 == o2)Console.WriteLine("yes, o1 == o2");

Page 4: Overview Of Msil

Now let us dig a little deeper and see what the machine is actually doing by looking at the MSIL code generated by the Ildasm tool:IL_0000: ldc.i4.s 10IL_0002: stloc.0IL_0003: ldloc.0IL_0004: box [mscorlib]System.Int32IL_0009: stloc.1IL_000a: ldloc.0IL_000b: box [mscorlib]System.Int32IL_0010: stloc.2IL_0011: ldloc.1IL_0012: ldloc.2IL_0013: bne.un.s IL_001fIL_0015: ldstr "yes, o1 == o2"IL_001a: call void [mscorlib]System.Console::WriteLine(string)IL_001f: ret

There lies the clue. Can you see that the boxing operation from int to object type is taking place twice? As the value type is converted to a reference type, the object is allocated on the heap. Since boxing is done twice, the two objects o1 and o2 are allocated in two different places on the heap. We have found where things went wrong, and this means we can make a simple correction to our code:int i = 10; object o1 = i, o2 = o1;if(o1 == o2) Console.WriteLine("yes, o1 == o2");

Now when we look at the resulting MSIL code (again disassembling using the Ildasm tool), the boxing is done only once, and both the references are pointing to the same object now. So, the program now works as expected. IL_0000: ldc.i4.s 10IL_0002: stloc.0IL_0003: ldloc.0IL_0004: box [mscorlib]System.Int32IL_0009: stloc.1IL_000a: ldloc.1IL_000b: stloc.2IL_000c: ldloc.1IL_000d: ldloc.2IL_000e: bne.un.s IL_001aIL_0010: ldstr "yes, o1 == o2"IL_0015: call void [mscorlib]System.Console::WriteLine(string)IL_001a: ret

The example shown here is simple, but it shows how the tool can be employed effectively for debugging code.

Article ReviewIn this article we have explained the basics of MSIL, and using this knowledge, looked into how the Ildasm tool can be used to help debug your code. This is only a beginner-level article, and so interested readers are encouraged to look further into MSIL and the Ildasm tool.

All rights reserved. Copyright Jan 2004.