reverse software engineering as a project-based learning tool

17
Paper ID #33764 Reverse Software Engineering as a Project-Based Learning Tool Ms. Cynthia C. Fry, Baylor University CYNTHIA C. FRY is currently a Senior Lecturer of Computer Science at Baylor University. She worked at NASA’s Marshall Space Flight Center as a Senior Project Engineer, a Crew Training Manager, and the Science Operations Director for STS-46. She was an Engineering Duty Officer in the U.S. Navy (IRR), and worked with the Naval Maritime Intelligence Center as a Scientific/Technical Intelligence Analyst. She was the owner and chief systems engineer for Systems Engineering Services (SES), a computer systems design, development, and consultation firm. She joined the faculty of the School of Engineering and Computer Science at Baylor University in 1997, where she teaches a variety of engineering and computer science classes, she is the Faculty Advisor for the Women in Computer Science (WiCS), the Director of the Computer Science Fellows program, and is a KEEN Fellow. She has authored and co- authored over fifty peer-reviewed papers. Mr. Zachary Michael Steudel Zachary Steudel is a 2021 graduate of Baylor University’s computer science department. In his time at Baylor, he worked as a Teaching Assistant under Ms. Cynthia C. Fry. As part of the Teaching Assistant role, Zachary designed and created the group project for the Computer Systems course. Zachary Steudel worked as a Software Developer Intern at Amazon in the Summer of 2019, a Software Engineer Intern at Microsoft in the Summer of 2020, and begins his full-time career with Amazon in the summer of 2021 as a software engineer. Mr. Joshua Craig Hunter, Baylor University Joshua Hunter is a Sophomore Computer Science student at Baylor University working as Computer Sci- ence and Calculus tutor. Joshua worked alongside Zachary Steudel to design and create the group project for the Computer Systems course in the Fall of 2020. Joshua is a member of the Theta Tau professional Engineering and Computer Science organization and will be working as a Software Engineering intern at L3 Harris this summer. c American Society for Engineering Education, 2021

Upload: others

Post on 28-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reverse Software Engineering as a Project-Based Learning Tool

Paper ID #33764

Reverse Software Engineering as a Project-Based Learning Tool

Ms. Cynthia C. Fry, Baylor University

CYNTHIA C. FRY is currently a Senior Lecturer of Computer Science at Baylor University. She workedat NASA’s Marshall Space Flight Center as a Senior Project Engineer, a Crew Training Manager, and theScience Operations Director for STS-46. She was an Engineering Duty Officer in the U.S. Navy (IRR),and worked with the Naval Maritime Intelligence Center as a Scientific/Technical Intelligence Analyst.She was the owner and chief systems engineer for Systems Engineering Services (SES), a computersystems design, development, and consultation firm. She joined the faculty of the School of Engineeringand Computer Science at Baylor University in 1997, where she teaches a variety of engineering andcomputer science classes, she is the Faculty Advisor for the Women in Computer Science (WiCS), theDirector of the Computer Science Fellows program, and is a KEEN Fellow. She has authored and co-authored over fifty peer-reviewed papers.

Mr. Zachary Michael Steudel

Zachary Steudel is a 2021 graduate of Baylor University’s computer science department. In his time atBaylor, he worked as a Teaching Assistant under Ms. Cynthia C. Fry. As part of the Teaching Assistantrole, Zachary designed and created the group project for the Computer Systems course. Zachary Steudelworked as a Software Developer Intern at Amazon in the Summer of 2019, a Software Engineer Intern atMicrosoft in the Summer of 2020, and begins his full-time career with Amazon in the summer of 2021 asa software engineer.

Mr. Joshua Craig Hunter, Baylor University

Joshua Hunter is a Sophomore Computer Science student at Baylor University working as Computer Sci-ence and Calculus tutor. Joshua worked alongside Zachary Steudel to design and create the group projectfor the Computer Systems course in the Fall of 2020. Joshua is a member of the Theta Tau professionalEngineering and Computer Science organization and will be working as a Software Engineering intern atL3 Harris this summer.

c©American Society for Engineering Education, 2021

Page 2: Reverse Software Engineering as a Project-Based Learning Tool

Reverse Engineering as a Project-Based Learning Tool

Abstract Although the concept of reverse software engineering is used in many fields, in the context of software engineering and security, it has come to include fields such as binary code patching, malware analysis, debugging, legacy compatibility, and network protocols analysis, to name a few.[1] Despite its broad use in software engineering, however, there is little work in computer science education that considers how reverse engineering can be taught effectively.[2] This may be a result of the compressed timetable of a four-year college education in computer science, where the need for the courses in the core curriculum, as well as the upper-level computer science electives, constantly find themselves in tension with regard to the short timetable necessary to produce a qualified computer scientist. Additionally, the constant changes in the discipline demand an ever-changing and updating curriculum. So, it is understandably difficult to find where and how the topic of reverse software engineering might be introduced within the curriculum; however, it has also become clear that it is a necessary inclusion. This paper will document a long-term research effort on the effectiveness of using a very simple reverse software engineering project in a sophomore-level computer systems course. We will report on the development of a series of class exercises that are inserted incrementally into a course. The goal of these projects is to lead students to a deeper understanding of computer systems, the continuing need for low-level understanding of software, and the development of critical thinking and problem-solving skills in the discernment and analysis of an unknown binary file. Introduction Reverse software engineering is “the practice of analyzing a software system, either in whole or in part, to extract design and implementation information.”[3] For the purposes of this paper, the term will refer to the process of determining the behavior or an unknown executable or binary file. In a sophomore-level course in computer systems, CSI 2334, “Introduction to Computer Systems,” at Baylor University a group project is introduced with the goals of:

• to lead students to a deeper understanding of computer systems, • to understand the need for low-level understanding of software, • to learn how to value and work with a team, and • to develop critical thinking and problem-solving skills.

These project goals support several of the class’ learning objectives, namely that students should be able to:

• work effectively as a member of a small team, and • illustrate their understanding of code hardening.

Page 3: Reverse Software Engineering as a Project-Based Learning Tool

This project is introduced roughly half-way through the semester, where the class is told that an unknown binary has been found on the server, and their team must determine and change its behavior. Students are placed in their two-person teams. The topic of reverse software engineering, “reversing”, is discussed in sufficient detail, along with instructions on initial steps to take in the project. These steps involve doing research on the variety of online tools that are available to help determine behavior of binary files. Students are also encouraged to write a version of “Hello World” in C++, running the .exe file through some of these freely available online tools to develop an understanding of the “topography” of compiled code. Once the online tools are well understood and the students have worked with their unknown executable, they must identify any malicious code segments (if found) and modify the behavior of the code, depending on the functionality of the unknown executable. The teams document their journey, submit a final report and scrubbed executable, and present their findings. Leading up to the project introduction, there are several smaller, individual projects (mini projects, or MPs) that provide help in doing some preliminary research and understanding of architecture and the limitations of the hardware of a computer:

• MP0, “Click Here” – Students are notified that an unknown executable has just been downloaded to their machine. They must do some initial research to determine whether the file is safe to open, what might happen when the file is opened, and how to determine functionality before executing a file. A research paper is submitted.

• MP1, “The Bank Problem” – Students must research why a series of ten thousand charges, read in as single-precision, floating-point numbers, provides different results when a sum is calculated without order, in ascending order, and in descending order. Students must do research to discover the nature of the errors involved, and which of the sums is the most accurate. A research paper is submitted.

• MP2, “Stack versus Heap” – Students must investigate the relationship between stack memory and the heap. They are asked to write a program that dynamically re-allocates an array of memory in the heap, and investigate where and how dynamic memory is allocated versus statically allocated memory. To help in this, they are asked to trace through the disassembly of this code and report on their findings. A research paper is submitted.

These mini projects are designed to help students develop the researching skills, critical thinking skills, and communication skills, all in preparation for the group project. Development of the Project Before the semester begins, the group project is designed and developed. Typically, a small game or utility program is written in C/C++. After implementing the core functionality of the program, malicious elements are added to it. Typical malicious elements added to a semester’s project program include:

• Fork bombs: Code that spawns new processes continuously, causing pop-ups and slowdown for the user’s computer. Below is C code used to create a fork bomb, pulled

Page 4: Reverse Software Engineering as a Project-Based Learning Tool

from the source code of the Fall 2019 group project. This code spawns 10,000 separate processes on the user’s computer, each printing the message “YOU_MADE_A_MISTAKE”. This is a controlled fork bomb, as typically a truly malicious bomb would run infinitely and spawn copies of itself such that the user cannot terminate it.

Figure 1

• Memory leaks: C/C++ instructions that allocate chunks of memory from the computer’s RAM and never deallocates them, causing slowdown if allowed to continue. Below is C code that allocates approximately 15 kilobytes of memory from the user’s RAM. Most modern computer systems have at least 4 gigabytes of RAM, so this will likely go undetected if not carefully looked for. This code was used to cause a memory leak in the Fall 2019 project.

Figure 2

• File spawning: Code that spawns files on the user’s computer that clog the hard drive. A partial snippet of code used in the Spring 2020 group project to spawn 50 files on the user’s system can be seen below. These files each contained hundreds of words of text

Page 5: Reverse Software Engineering as a Project-Based Learning Tool

from Lord of the Rings.

Figure 3

Once the malicious portions of the program have been implemented, the final development step is to obfuscate the code. Code obfuscation is the purposeful muddling of a program’s source code to make it hard to read and understand for humans. The goal in obfuscating the group project executable in CSI 2334 is to motivate the use of different tools and analysis techniques from the students. If an un-obfuscated executable is handed to the students, they can easily disassemble and glean its functionality given a short period of time. Since the learning objectives for the project are to become more proficient at design and analysis in a team environment, obfuscation is necessary. By obfuscating the executable that is to be given to them, the process of dismantling and analyzing becomes more complex and requires more thought, collaboration, and careful documentation. When developing software, code readability and maintainability is generally of top priority. Developers have tenets which they follow to allow other developers to edit and add features to their code in the future. In code obfuscation for this project, steps are taken to walk these good coding practices back and make it a challenge to follow the purpose of a segment of code. There are many ways that code can be obfuscated, ranging from easy to relatively complex. Some of the techniques used in obfuscating code are outlined below through examples taken from previous semesters’ projects.

• Function and variable re-naming: This is the simplest of obfuscation techniques. In some reverse engineering tools, function and variable names can be gathered by decompiling the program executable with the tool. To hide details about code, this technique is employed. This entails naming all functions and variables to some meaningless text. Below are three function prototypes used in the Fall 2019 project executable, which is a game of Snake. These are the function and parameter names before any obfuscation is done. As can be seen, the function and parameter names give away a lot of detail about how each function works.

Page 6: Reverse Software Engineering as a Project-Based Learning Tool

Figure 4 Below are the same prototypes with the functions and parameters re-named. For example, “showScore” is now “paintWalls”, which will serve to temporarily throw off efforts of a novice reverse engineer in documenting and understanding the purpose of each function.

Figure 5

• “Rabbit-hole” function injection: This is a slightly more complex and time-consuming obfuscation technique. This entails injecting large amounts of function calls within necessary program code to confuse the reverse engineer in their process of determining the functionality of the segment. By adding useless function calls which call other functions, which also call other functions, a reverse engineer may traverse this tree of function calls attempting to parse their purpose for a long period of time before realizing that they have no effect on the core of the program. Below is a snippet of code from the beginning of the driver function for the Snake game in the Fall 2019 project. Already, we can see that many confusingly named functions are called here with some leading the reverse engineer down a rabbit-hole. “printWall” is the useless function call that we will focus on.

Page 7: Reverse Software Engineering as a Project-Based Learning Tool

Figure 6 Below is the code for the “printWall” function. This runs a useless loop, then called the “message” function.

Figure 7

Below is the code for the “message” function, which branches out and calls three other functions, each of which making other function calls. A reverse engineer could spend hours walking through this rabbit-hole and making documentation before realizing that none of it contributes to the processing for the application.

Figure 8

• Stack obfuscation: This final obfuscation method is the most complicated of the three outlined here. In this method, the programmer writes in-line assembly code to modify the

Page 8: Reverse Software Engineering as a Project-Based Learning Tool

program stack by pushing or popping values. If this is mixed carefully with malicious code, it can be hard for the reverse engineer that is analyzing the assembly code of the executable to correctly quarantine. The stack is a linear structure, so if a programmer pushes something on it and later does not pop it off, the process may accidentally pop a bad value for the instruction pointer at a later time. Therefore, if a reverse engineer accidentally removes a push or a pop in the code from stack obfuscation, but does not remove the corresponding pop or push, they may find that this causes crashing in the patched executable. Below is a small example of stack obfuscation used in the Fall 2020 project. This project was written in C, however the “asm” system call allows us to inject assembly-language instructions.

Figure 9

Examination of the Project Code Once the students are introduced to the project, the first order of business is to research the freely available online tools that will assist them in the discovery of the behavior of the project executable. There is a wide variety of different tools branded under each of the categories (disassemblers, decompilers, hex editors, and debuggers) that the students can use to develop an understanding of the executable. Some of the more common reverse software engineering tools examined include:

• Snowman, • Binary Ninja, • OllyDbg, • IDA, and • Ghidra.

These applications span the major categories of reverse engineering tools. Each of these tools will be analyzed to briefly highlight the important components/features of each application. Students are asked to do this as part of their final report. Snowman Snowman is a native code-to-C/C++ decompiler and disassembler, which supports x86, AMD64, and ARM architectures.[4] This reverse engineering tool is relatively new, with the most recent version of the decompiler being released in 2018. Snowman is a stand-alone application, but is also supported as an IDA (IDA Disassembler and Debugger) Plug-in.

Page 9: Reverse Software Engineering as a Project-Based Learning Tool

The Snowman application, as a stand-alone unit, is very easy to install. The interface is set up to have a simplistic, very user friendly, and straight forward design. There are 2 main windows that can be slightly customized, as well as a series of simple instructions in the menu bar. To run the disassembler and decompiler for Snowman, the file bar in the menu can be opened to import an executable file and it immediately starts processing the input file. Right off the bat, Snowman, without knowing the functionality, is a great tool to use because of the simplistic interface alone. Overall, Snowman is a very interesting and effective tool. Snowman seems to be a very algorithmic tool, which leads to similar behavior happening throughout the entire reconstruction. While it might not be practical for an in-depth understanding of unknown executables, due to it having a very inefficient reconstruction, it serves its purpose in reconstructing the executable to a high-level decompilation, as well as the partnered disassembly. Key things to note about Snowman include the plethora of random variables allocated in the beginning of each function call, the many reinterpret_cast’s that clarify the true type of the variable it is thought to be, the various exception checks that make the code very dense, as well as the random function calls that have masked implementation within the reconstructed code. Binary Ninja Binary Ninja is a reverse engineering disassembler/decompiler, developed by Vector 35 inc. in 2016.[5] It performs an in depth analysis of a binary file and reassembles low-language code, high level code, disassembly, and a graphic representation of the flow of function calls. Binary Ninja also supports a multitude of CPU architectures including x86 32-bit, x86 64-bit, ARMv7, Thumb2 and ARMv8. Binary Ninja can be used in one of two ways, first as downloadable software with its own interface, or as an in-browser web interface. For the purposes of understanding how this software works, the free version built in-browser will be examined. Within the interface there is a lot of functionality, whilst also being super user friendly, simplistic, and easy to understand. The main parts of the interface include a disassembly view and a graphic flow chart view. There is a management task bar to be able to switch in between low-level, medium-level, and high level-language views within the flow chart, as well as ways to show Opcode bytes and memory addresses if necessary. Binary Ninja as a reverse engineering tool is very effective and not only accurately provides a decompilation and disassembly of the executable, but also makes it easy to find the necessary functions as well and understand the implementation. The flow design is super creative and makes comprehending the smaller chunks of decompiled code easy to understand. A place where it is lacking however, is the ability to zoom. When tracing through some of the larger chunks of code, it is hard to keep it all on one screen at a time. Thus, a way to scale the flow graph to a smaller size would make the implementation of Binary Ninja all the better. OllyDbg OllyDbg is a x86 debugger that was released in September of 2013. OllyDbg supports analysis of binary code and has various other functionalities related to reverse engineering. This debugger is

Page 10: Reverse Software Engineering as a Project-Based Learning Tool

able to trace registers, API calls, and various types of variables as needed in stepping through a reversed piece of software. It is also able to recognize and partition procedures, as well as locates routines from various libraries. It is supported by various different architectures, although x64 is not yet supported. This application is useful for programmers for malware analysis purposes. OllyDbg can be downloaded for free as a stand-alone application, although it is shareware. While OllyDbg is not the easiest way to understand a full-length executable, it still has a big purpose in the process of reverse engineering. The debugger helps trace through function calls and jumps that cannot be easily understood through regular view of disassembly code. Also, it includes Opcode, relative positions of memory, and various a few other small things within the registers, the stack and the hex dumb/ASCII representation that make fixing problems found within the reverse engineering process a lot more simplistic to decipher. However, one key challenge that OllyDbg presents is actually finding the necessary disassembly. Unlike other Dissemblers, there is no search function so finding the code that is actually relevant to be examined can oftentimes be difficult to find. IDA IDA, or Interactive Disassembler, is a disassembler that generates assembly language code from a machine executable.[6] IDA is a very diverse platform that supports a wide range of CPU architectures and operating systems, compared to a lot of other reverse engineering tools that cater to only Linux and Windows. IDA was written initially by Ilfak Guilfanov in the early 2000’s, and has since then expanded to become a combination of not only a disassembler, but also a decompiler, hex editor, and debugger. IDA is previously known as the best all-in-one tool that predates Ghidra. However, although it seems like a superior application to the others mentioned, it comes with some serious drawbacks: complex and non-user-friendly implementation, as well as a very big price tag. Other facets of IDA include the debugger and the hex editor. Both of these are present in the IDA application and can be used to trace through the AL code. The combination of all these things make IDA an important tool in terms of reverse engineering an executable. Also, with the capability to add extensions to the application to make decompilation possible, as well as a variety of other modifications, IDA as a stand alone application could reverse engineer most executables with enough work. The only downside of IDA, is of course complexity. With more functionality comes more confusion, therefore in order to properly and accurately use this tool, it will take a while to learn. Ghidra Ghidra is an open-source reverse engineering tool that was developed and released by the National Security Agency, or NSA.[7] It was first publicly shown at a cryptosystems conference in March or 2019, and then later released to GitHub in April, 2019. Ghidra is built as a decompiler, built similar to IDA, it has the ability to add separate functionality through plugins, including disassemblers, hex editors, and debuggers. Ghidra supports various architectures including x86 16-bit, x86 32-bit, x86 64-bit, ARM, DEX, and many others.

Page 11: Reverse Software Engineering as a Project-Based Learning Tool

Before the release of Ghidra in 2019, IDA was the long reigning king of reverse engineering tools. However, due to Ghidra’s open source and free software, Ghidra rose to the forefront of the reverse engineering spotlight. However, unlike IDA, Ghidra is a decompiler, where IDA is a disassembler and debugger. Depending on the needs for reverse engineering, either application might cater better towards the task at hand. Ghidra’s main ability is to very accurately reverse engineer executables, while also retrieving very specific pieces of information from binary files, that are usually masked by all other compilers. Ghidra’s functionality does not stop at just the decompiled code. The list of different tools available in Ghidra are extensive and all serve different purposes that are beyond the need in this paper. However, there are some really interesting aspects of Ghidra that might help illuminate why Ghidra is at the forefront of reverse engineering technology. This includes the graphic flowchart, function call graph, and the wide range of instruction info. Ghidra in itself has almost all of the aspects of the other reverse engineering tools stored into one. Also, with having the ability to install packages within the system, it is very likely this is the only tool most engineering projects will need. However, although Ghidra is very extensive in the amount of information that is provided, the complexity is very dense, and can result in some level of confusion, similar to IDA. One downside to Ghidra is the flowchart for the decompilation (similar to Binary Ninja) not being available. This doesn’t change anything for the case of the simple factorial executable, but at a larger scale function calls could be slightly confusing to trace through. The inclusion of the function call graph makes up for this, but maybe an update in the future will contain this.

Figure 10 This is a simple function that does nothing, then calls another function in the rabbit-hole. When opening the program executable in Ghidra for reverse engineering, Ghidra reconstructs this function nearly perfectly in C:

Page 12: Reverse Software Engineering as a Project-Based Learning Tool

Figure 11 It does not take long for a reverse engineer to read this decompiled C code and determine that this is a useless function, allowing for them to move on in the process of analyzing the executable’s code. IDA Freeware does provide similar output; however, it can be argued that IDA requires a greater level of user expertise to effectively manage. It does not take much effort on behalf of the engineer to discover and utilize Ghidra’s most powerful features. This is an example of how obfuscation techniques used each semester must become more complex to match the capability of available tools. In the Fall 2020 group project, stack manipulation was put to greater use to better obfuscate functions in the program executable. An example is below:

Figure 12 Here, we can see the use of the asm() function to insert assembly-language instructions within the C code of this function. What this serves to do is complicate the function for the reverse engineer viewing the corresponding code in Ghidra or other tools. In fact, this is what the “badfunc” function looks like in Ghidra when decompiled:

Page 13: Reverse Software Engineering as a Project-Based Learning Tool

Figure 13 Ghidra struggles to decompile functions in the executable that have stack manipulation and stack obfuscation. In this case, the reverse engineer can no longer rely on reading the C code of the executable but must instead pore over the disassembled assembly instructions, a much longer and more laborious process. This is just one example of how obfuscation techniques must be adapted to the available reverse engineering tools to create a more thought-provoking group project experience for novice reverse engineering students in CSI 2334. Summary of SRE Tools Overall, there are very clear differences and similarities between each of the reverse engineering tools, however these are not the only tools out there. There are so many reverse engineering tools that do similar things to the ones mentioned, with their own functionality and implementation. However, some of the tools that were tested perform very similarly to one of the tools listed above, and thus have been excluded from discussion to avoid redundancy. These tools include: WinDbg, CFF explorer, HIEW, ApkTool, and Hopper. The tools selected to receive a thorough examination of their implementation and functionality, are believed to be the very best tools available currently. These tools all have different purposes, strengths, and weaknesses, and to better demonstrate the comparison between these software applications, Table 1 summarizes the findings:

Tool Type Cost OS Plugin Support Intuitive Strengths Weaknesses

Snowman

https://derevenets.com/

Decompiler Dissembler

Freeware Windows Linux

No Yes -Easy to Install -Free -Intuitive -Can be IDA plugin

-Not the most elegant decompilation -Masks function naming

Binary Ninja

https://binary.ninja/

Decompiler Dissembler

Free version available Personal - $299 Commercial - $1199

Windows Linux Mac OS FreeBSD On-Website

Somewhat Yes -Easy to Install -Flowchart design -Wide variety of OS -Intuitive -Low medium and high level code available -Free version -Elegant design

-Price -Masks function naming -No zoom option, making full view tracing confusing

OllyDbg http://www.ollydbg.de/

Dissembler Debugger

Freeware Windows Somewhat Somewhat -Slightly confusing to install -Free -Intuitive -Plugin-Support -A lot of different information provided for debugging

-Not a lot of updates to the application -No x64 support -AL code segmentation /grouping not super obvious,

Page 14: Reverse Software Engineering as a Project-Based Learning Tool

-ASCII comments to provide understanding of memory

although there is some

IDA https://www.hex-rays.com/products/ida/

Dissembler Debugger Hex-Editor

Free version available Personal - $589 Commercial - $1129

Windows Linux Mac OS

Yes No -Easy to install, but confusing to set up -Function naming, and parameter data types provided -Supports most architectures -Flowchart design -Often updated -Wide variety of OS -Wide range of functionality -Super complex plugin support

-Price -Very confusing, and non-intuitive. -Wide learning curve

GHIDRA https://ghidra-sre.org/

Decompiler Freeware Windows Linux Mac OS

Yes Somewhat -Easy to install -Free -Function naming, and parameter data types provided -Wide variety of OS -Searching functionality -Flowchart / zoom functionality -Thorough instruction info -Function flow chart

-No decompiler Flowchart -Slight learning curve

Table1: Comparison of Features of SRE Tools. Response to Evolution of Reverse Engineering Tools One of the challenges faced when developing the project each semester is the proverbial “arms race” between obfuscation techniques and available reverse engineering tools. Each year, reverse engineering tools become more sophisticated and capable, therefore obfuscation techniques must become more complex to match. One notable example of this challenge is the release of Ghidra. Ghidra is a sophisticated, open-source reverse engineering tool made public by the United States NSA in March of 2019.[8]. Alongside being released in 2019, it receives feature requests and new updates every few days. As of now, this tool has become the most popular one used by student groups in CSI 2334. What Ghidra provides over the existing competition is ease-of-use and further details that other free software, such as IDA Freeware, does not provide. Ghidra allows for easy disassembly and decompilation of an executable so that one can read the assembly code and decompiled C/C++ code side-by-side and make inline edits. Ghidra allows for users to easily document and navigate through the functions of the decompiled program, as well, making function re-naming extremely important.

Page 15: Reverse Software Engineering as a Project-Based Learning Tool

Many student groups have listed Ghidra as their only tool used in the project. It is encouraged for students to explore available tools and use a range of them to develop their ability to synthesize available resources, so due to the popularity of Ghidra with students, it has now become necessary to add greater stack obfuscation in the project programs each semester. This technique exploits a weakness in Ghidra. For example, below is one useless rabbit-hole function used in the Spring 2020 group project. Results and Modifications Needed for the Future When looking at the results of previous semesters, it was clear that more rigorous obfuscation was required for the Fall 2020 semester project. Most student-groups were making use of Ghidra and were able to easily find and isolate malicious elements such as fork bombs without too much trouble. Ghidra’s main strength is the ease for reverse engineers to find and view functions’ assembly code alongside their decompiled C code. To thwart this effort, a small weakness was exploited, that being Ghidra’s inability to decompile segments of code if there is an abundance of assembly-language stack manipulation mixed with the C source code. This weakness was mentioned and detailed in the previous section. Regardless of the stack manipulation in the Fall 2020 project, most students were still able to find and isolate the malicious code in the program. Specifically, 18 of the 24 groups were able to find and isolate all malicious code in the program. 10 of 24 groups were able to even add some substantial features such as new calculator functions. To create a larger challenge for students in the future, one modification needed if this project is to be re-used is to increase the amount of mixed-in stack obfuscation and manipulation. In the final executable for Fall 2020, only 3 locations in the source code had stack manipulation. This means that the majority of the code was still decompilable within Ghidra. A strategic mixing of necessary code and stack manipulation within a wider scope of the source code would help to increase the challenge for student reverse engineers. Spring 2021 As for the Spring 2021 group project, it is planned that a small ASCII-based dungeon crawler will be created in C. This game will be more complex than the previous calculator program used for the Fall 2020 project. Through this, more subtle malicious elements will be added. Rather than outright crashing and loudly spawning new processes, small malicious elements will be mixed within the game code. Examples of a small malicious element might be a tiny bomb that opens a couple processes, or some file spawning code that silently spawns 5 files in a directory on the user’s computer. This will drive students to take a deeper dive into the disassembled code and force them to document their efforts at greater detail. Alongside these more subtle malicious elements, a better mix of stack manipulation will be implanted into the program code. This stack manipulation will be mixed within more areas of the code than the Fall 2020 project in order to thwart Ghidra at more steps. Students will have to rely on their assembly-language expertise and their documentation to understand and modify the executable. A greater amount of mixed stack manipulation will force students to take more care

Page 16: Reverse Software Engineering as a Project-Based Learning Tool

when editing the source code, as they will have to distinguish between stack manipulation instructions and necessary code. Summary and Conclusions Overall, it was clear that this project provided a relevant outlet for students to strengthen their research and collaboration skills, as well as their understanding of the relationship between hardware and software. The improvement of both individual and group research and communication skills showed a steady increase over the semester. Given that most students entered the CompSys course with no knowledge of reverse software engineering, the project successfully simulated a short professional software engineering project that required stream-lined learning and strong communication, based on the students’ experience in the curriculum. These conclusions apply to the nearly 650 Baylor Computer Science students who have taken this course through Fall 2020. The modifications to how the project was designed and obfuscated, as a result of the Ghidra tool, was seen in the ~60 students who took the course in the Fall 2020 semester, as well as those registered for the Spring 2021 semester. Our conclusions from semester to semester have been much the same, with the students indicating (through their pre-test, formative assessment, and summative assessment) a much wider and deeper appreciation for the importance of understanding the difference between what is written and designed in a high-level language versus what the machine actually executes.

Page 17: Reverse Software Engineering as a Project-Based Learning Tool

Bibliography 1 I. Klimek, M. Keltika, F. Jakab, “Reverse engineering as an education tool in computer science,” ICETA 2011 • 9th IEEE International Conference on Emerging eLearning Technologies and Applications • October 27-28, 2011, Stará Lesná, The High Tatras, Slovakia 2 J. Aycock, A. Goeneveldt, H. Kroepfl, T. Copplestone, “Exercises for teaching reverse engineering,” ITiCSE 2018: Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science EducationJuly 2018 Pages 188–193. https://doi.org/10.1145/3197091.3197111 3 Cipresso T., Stamp M. (2010) Software Reverse Engineering. In: Stavroulakis P., Stamp M. (eds) Handbook of Information and Communication Security. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04117-4_31 4 Snowman decompiler download site: https://derevenets.com/. Last accessed on March 8, 2021. 5 Binary Ninja Reversing Platform site: https://binary.ninja/. Last accessed on March 3, 2021. 6 IDA Binary Code Analysis Solutions by Hex-Rays home: https://www.hex-rays.com/products/idahome/. Last accessed March 7, 2021. Free version available for download at https://www.hex-rays.com/products/ida/support/download_freeware/ 7 Ghidra software reverse engineering (SRE) suite of tools: https://ghidra-sre.org/. Last accessed on March 8, 2021. 8 National Security Agency, last accessed December 29, 2020, https://www.nsa.gov/resources/everyone/ghidra/.