Reverse Engineering WebAssembly: Learn from Real-World Examples

boecatfile1986
Aug 15, 2023
5 min read

In practice, people don't reverse engineer stuff by going over every single instruction manually. Instead, they use tools like IDA Pro that perform most of the tedious, repetitive work automatically, such as detecting loops and if/else branches, or identifying local variables.

I don't think there is such a flag. Stripping debug symbols obviously makes dynamic debugging more difficult, but you already mentioned that one. There are obfuscators that make the code jump around more than necessary, but you can't realistically stop a skilled and determined reverse engineer. In particular, if you are thinking about embedding a private key in your binary, then simply don't do that.

Reverse Engineering WebAssembly

Download File

I think webassembly is significantly easier to reverse-engineer than obfuscated x86 assembly as wasm enforces control flow integrity and makes it impossible to manipulate locals outside of the frame they are from, while x86 allows you to clobber the return stack, all locals on the stack and allows you to jump into the middle of an instruction, making is significantly harder for tools to decompile in a way that is guaranteed to preserve semantics even in the face of heavy obfuscation. Wasm code also can't detect debuggers unlike x86 code which can detect them if the debugger doesn't deliberately prevent this. For example segments are reset whenever there is an interrupt like a debug interrupt. Or you can read the actual code to check if it hasn't been replaced with a software breakpoint. Various OSes also have api's that detect debuggers. Wasm doesn't have any of this. If the x86 assembly is not obfuscated using any tricks, there isn't much difference with wasm though when using tools like a decompiler.

I am a software engineer at Google, where I work on the V8 JavaScript engine, specifically on WebAssembly. I am broadly interested in programming languages, software engineering, systems, and security. I want to make software development enjoyable, and would like the resulting programs to be correct, secure, and performant. Before, I was a PhD student in the amazing Software Lab group at University of Stuttgart, advised by Prof. Dr. Michael Pradel. I defended my thesis (with distinction) in July 2022. I worked on static and dynamic program analysis (Wasabi, type prediction from binaries); compilers and programming languages (during internships at Oracle Labs and Google); software security (Wobfuscator project, internship at Google, bachelor thesis); fuzzing (internship at Microsoft Research, Fuzzm) and automated testing (finding bugs in debuggers, master thesis). I enjoy both research and practice, finding out new things and making sure they are useful to others.

During my PhD, I applied several of the above topics to WebAssembly. I believe WebAssembly is well-suited for research due to its clean design and little accumulated cruft, while at the same time tremendously important in practice as a universal bytecode for the web and increasingly beyond. One of my research projects was the analysis of WebAssembly's binary security, that is, if and how memory vulnerabilities in source languages such as C can be exploited when compiled to a WebAssembly binary (USENIX Security 2020). During an internship at Google, I also looked into WebAssembly host security, that is, protecting the system from malicious WebAssembly binaries. In the internship, I implemented W^X in the WebAssembly compiler of V8. I am the main author of Wasabi, a dynamic analysis framework for WebAssembly (ASPLOS 2019, best paper award), for which I developed my own binary parser and static instrumenter. To aid reverse engineering of WebAssembly binaries, I employed neural networks for recovering high-level types from the low-level bytecode of functions. Together with Aaron Hilbig, we also collected WasmBench, large set of more than 8000 real-world WebAssembly binaries for analysis, as test inputs, and as training data for machine learning-based approaches. In several further projects with collaborators, I also worked on fuzzing and static analysis of WebAssembly binaries.

Dynamically Analyzing WebAssembly with Wasabi. Daniel Lehmann and Michael Pradel.Half-day tutorial session at PLDI 2019 on using our framework, e.g., for extracting a call graph or in reverse engineering. Phoenix, AZ, USA. June 23, 2019.There is an accompanying website with materials (tasks, required setup, solutions) and slides.

I am completely noob in reverse engineering, and I've just started to learn it.Now I have this question in my mind, that does a reverse engineer use any computer architecture knowledge for doing his/her work? I mean in any field (software/hardware RE).

It depends on what exactly do you mean by computer architecture and area you applying your reverse engineering skills to.Reverse engineering Java code, Android apks, native binaries of any OS, and (for example) router or ECU firmwares are completely different things, and in some areas you will not need the data learned during the computer architecture course directly.

That said - it depends on your target. If you are reverse engineering a Java application or a web application, then clearly you won't be dealing with assembly languages and other low level concepts (though it's possible you encounter bytecode or Web Assembly). Reverse Engineering is more of a mindset than specific knowledge of computer internals.

The most important thing for reverse engineering isn't necessarily understanding how computer architecture works, but rather how computers and computer applications work in general. Back to the login example, clearly the user credentials have to be validated somehow, either resident on the machine or over a network. You use this knowledge to "dig deeper" and acquire an understanding of the internals of some program/machine that would otherwise be a blackbox.

Now, if you are reverse engineering compiled PE or ELF binaries, then you'll likely need to understand assembly language at some level. I often need to identify cryptographic routines within compiled binaries and without understanding assembly, the stack, the heap, system calls, control flow, and etc, I would not be able to understand the code I am analyzing and I would not have much success with whatever tools I'm using (or developing) to aid my analysis.

Generally speaking Reverse Engineering is the reverse of engineering, so instead of making a plan and building a product, you start with a product and try to reconstruct the plan that was used for it (or something as close to as you can get).

The primary benefits to WASM are execution speed and language independence. By targeting WASM, you can create applications in multiple languages. Popular languages today are C/C++, C#/Blazor, Go, and Rust. Commercial companies might be interested in delivering their technology via WASM as that is a form of obfuscation, making it several levels harder to reverse engineer. However, disassembly of WASM modules is easy, so this is not a bullet-proof protection method but it definitely is only for the most determined.

At this point, I was more or less stuck. I could see that the WASM wouldreturn a -2 or -1 in some cases, and I could see the code paths that woulddo this, but the functions were really difficult to reverse (it took me afew hours to figure out one of them was strlen), so the CTF ended. Iasked the organizers what the solve was, and they told me about theProgress header. 2ff7e9595c

Reverse Engineering WebAssembly: Learn from Real-World Examples

Reverse Engineering WebAssembly

Recent Posts

Comments