Analysis and transformation tools


Some tools based on the ROSE library.

Source transformation tools

Name Location


identityTranslator ROSE

Simplest possible source-to-source translator built using ROSE.

dotGenerator and dotGeneratorWholeASTGraph ROSE

Generate dot graph dump of AST

pdfGenerator ROSE

Generate pdf dump of AST

autoPar ROSE

Automatic Parallelization using OpenMP

Declaration move tool ROSE

Re-scoping variable declarations

Binary analysis tools

Binary analysis tools all show a Unix-style man page when invoked with "--help". This list gives the location of the tool and a brief description of what it does. The locations are:

Name Location


astDiffBinary ROSE

Computes various edit distance metrics to measure the distance between two binary functions.

bat-ana Megachiropteran

This tool runs the initial steps needed by almost all other analysis tools: parsing ELF and PE containers, initializing simulated virtual memory; deciding which addresses are instructions and decoding them; partitioning decoded addresses into basic blocks and functions; generating the global control flow graph (CFG) and address usage map (AUM); optionally running post-partitioning analyses such as may-return, stack-delta, calling-convention, etc. The results are saved in a binary analysis state file (*.rba) that can be read by other tools.

bat-cc Megachiropteran

This tool runs the ROSE calling-convention analysis and either reports the results or inserts them into the binary analysis state file.

bat-cfg Megachiropteran

Prints various kinds of control flow graphs, such as the global CFG, function CFGs, or region CFGs. It can show the CFG in human readable format or as GraphViz output and has numerous switches for controlling the style.

bat-cg Megachiropteran

Prints information about the function call graph. It can show human-readable information about individual functions (the function's callers and callees), or it can produce GraphViz output of the entire call graph.

bat-dis Megachiropteran

Disassembly lister. This tool reads a binary analysis state file and produces assembly listings. It has numerous switches to control the format of the output. Note that ROSE listings are intended for human consumption and it generally doesn't work to feed them into an assembler to produce a new binary.

bat-lsb Megachiropteran

Reads a binary analysis state file and lists information about each basic block, such as the number of instructions and the virtual address segments. Note that ROSE's definition of basic block is different than some other tools in that ROSE does not require the instructions to be adjacent to each other in memory; a basic block can have internal unconditional branches as long as no interior instruction is a successor of some other basic block. The output is intended to be in a format that's easily used by other tools.

bat-lsf Megachiropteran

Reads a binary analysis state file and lists information about each function. The output is intended to be in a format that's easily used by other tools.

bat-mem Megachiropteran

Reads a binary analysis state file and shows information about the simulated virtual memory.

bat-stack-deltas Megachiropteran

Reads a binary analysis state file, runs the stack delta analysis for functions where it hasn't already been run (in parallel), and reports the results in a format that's easy for other tools to read. The output consists of memory address intervals for instruction sequences and how their incoming and outgoing stack pointers relate to the initial stack pointer at the start of the function. This is useful for unwinding call frames when there's no frame pointer register.

BinaryCloneDetection ROSE

This is a suite of tools to incrementally build a database and query it for results in order to find similar functions across executables based on their symbolic behavior.

binaryToSource ROSE

Generates a low-level C source code for any architecture for which ROSE has instruction semantics.

bROwSE-server ROSE

This was our first foray into embedding the ROSE library in a web server. The server analyzes the binary and the user connects to it with a web browser to see results. The server supports analyzing any binary format and architecture supported by ROSE, interactive adjustments to the virtual memory map, disassembling and partitioning, program indexed by functions showing various function properties that can be sorted, graph-based disassembly listings using the CFG, traditional linear assembly listings, cross references for constants, decoded strings, hexdumps of virtual memory, symbolic data-flow results, lists of magic numbers, etc.

checkExecutionAddresses ROSE

Compares a dynamically-generated execution trace of a Linux program with the statically-generated global control flow graph and reports differences.

debugSemantics ROSE

Instruction semantics trace tool. This tool runs instruction semantics in various domains and reports all operations and states per basic block. It's main purpose is to have an easy and extensible way for users to check whether semantics are operating as they expect, and to be able to report bugs that the ROSE developers can reproduce.

detectConstants ROSE

Demonstrates a few ways that constants can be found in binaries, such as by traversal of the AST or by examining machine states after a data-flow analysis.

dumpMemory ROSE

A tool to print information about virtual memory and to extract data in a few different formats. As with most binary analysis tools, this one can examine data in files, or simulated virtual memory initialized from ELF and PE files, raw memory dumps, or stopped or running Linux processes.

dwarfLines ROSE

This tool reads DWARF debugging information from an executable compiled with "-g" and emits two mappings: one that shows which source code files and lines correspond to each instruction, and vice versa.

findDeadCode ROSE

This is a simple demonstration of how the global control flow graph can be used to propagate reachability information. It uses a custom implementation to propagate reachability, which is a good example of how to traverse CFGs, but it's been superseded by a better analysis built into the ROSE library.

findPath ROSE

Given end-points of an execution path and addresses and/or branches to be avoided, this tool determines whether any execution path exists and the input conditions necessary to drive that path. Most of this tool is superseded by a path feasibility analysis that's part of the ROSE library.

findSimilarFunctions ROSE

Given two closely related executables, this tool finds the best mapping of functions from one executable to the other. This is for the common case when function's don't have names. It correlates functions by computing a difference metric (several are implemented) to create a bipartite graph, then solving the minimum weight perfect matching problem with the Kuhn-Munkres algorithm. This tool is superseded by a matching analysis that's part of the ROSE library.

generatePaths ROSE

This tool generates source code of arbitrary size with various kinds of control flow paths in order to test algorithms that operate on control flow graphs.

linearDisassemble ROSE

This is a simple linear sweep disassembler: it starts at some specified address(es) and decodes instructions one after another with no regard for control flow.

magicScanner ROSE

Scans binaries for magic numbers and reports them like the Unix "file" command. The differences between this tool and "file" is that this tool scans every address instead of just the beginning of the file, and it scans the simulated virtual memory rather than the file (i.e., addresses are virtual memory addresses rather than file offsets). Like most ROSE-based binary analysis tools, the virtual memory can be constructed from raw files, ELF or PE containers, Motorola S-Records, or running or stopped Linux processes.

maxBijection ROSE

This is a low-level tool that computes a minimum-cost 1:1 mapping between two sets of integers. It's used as part of a work flow to find where code should be mapped in memory if it is not position independent and no starting address is known.

nativeExecutionTrace ROSE

Generates a simple execution trace by single-stepping a Linux executable within the ROSE debugger.

recursiveDisassemble ROSE

The full-fledged ROSE disassembler with lots of command-line switches to control specimen loading into simulated virtual memory, the disassembly and partitioning process, analyses run by the partitioner, output of virtual memory information, statistics about the CFG and instruction cache, a detailed list of all CFG information, output of various kinds of control flow graphs and function call graphs, detailed information about what's at each virtual address, list of addresses that were not used during parsing, an index of functions, a list of instruction addresses for input to other tools, lists of string literals in various encodings, all details about PE or ELF containers, and assembly listings.

rxml ROSE

This tool generates an XML representation of various ROSE data structures, such as the entire Partitioner binary analysis state, or the binary components of an AST. Various free tools are able to convert XML to JSON.

simulator2 ROSE

This is a suite of tools to "execute" a program by using concrete instruction semantics and a Linux system call translation layer that can be modified by the user. It can handle amd64, m68k, or x86 instruction sets (those instructions for which ROSE knows the semantics) and Linux x86 and amd64 system calls or raw hardware; it can initialize its simulation memory from some combination of files, ELF or PE containers, Motorola S-Records, and running or stopped Linux executables; it can trace system calls similar to the "strace" command, instructions, and memory access; it partially supports multi-threaded applications; user can adjust instruction, memory, and syscall behavior through callbacks; it's able to selectively disable certain system calls; it can trace file and socket I/O; it has an interactive debugger for examining the specimen being simulated.

stringDecoder ROSE

Finds strings of various formats using the string analysis in the ROSE library. This analysis is able to search for ASCII, Unicode, etc. with variable width character encoding, termination or run-length encoding, etc. It can search for multiple encodings simultaneously and reports the string literal and encoding information. Another way it differs from the Unix "strings" command is that it searches the simulated virtual address space rather than a file. Like most ROSE binary analysis tools, the virtual memory can be constructed in a variety of ways.

symbolicSimplifier ROSE

This implements a read-eval-print loop (REPL) for symbolic expressions, where the "eval" step sends the expression through ROSE's built-in simplification layer. This layer is not a full simplifier like what might be found in SMT solvers, but rather tuned for those situations that commonly occur when emulating instructions symbolically. It's purpose is to give the user a way to interactively discover how the simplifier works, and to report bugs that the ROSE team can reproduce.

trace ROSE

Compares a dynamically-generated execution trace of a Linux program with the statically-generated global control flow graph and reports differences.

x86-call-targets ROSE

A low-level function that disassembles all bytes and reports the target addresses for all x86 CALL instructions it finds. This is part of a work flow to find the best address at which to map non-PIC code.

x86-function-vas ROSE

A low-level tool that reports addresses of functions. This is part of a work flow to find the best address at which to map non-PIC code.

xml2json ROSE

Translates the XML produced by boost::serialization to JSON and is able to handle XML inputs that are many gigabytes.

This page is generated from $ROSE/docs/Rose/Tools/roseBasedTools.dox

Collaboration diagram for Analysis and transformation tools:


 Declaration move tool
 dotGenerator and dotGeneratorWholeASTGraph
 Identity Translator
 PDF Generator