Classes | Typedefs | Functions | Variables
Rose::BinaryAnalysis::PointerDetection Namespace Reference


Pointer detection analysis.

This analysis attempts to discover which memory addresses store pointer variables and whether those pointer variables point to code or data. The goal is to detect the storage location of things like "arg1", "arg2", and "var2" in the following C code after it is compiled into a binary:

int f1(bool (*arg1)(), int *arg2) {
int *var2 = arg2;
return arg1() ? 1 : *var2;

Depending on how the binary is compiled (e.g., which compiler optimizations where applied), it may or may not be possible to detect all the pointer variables. On the other hand, the compiler may generate temporary pointers that don't exist in the source code. Since binary files have no explicit type information (except perhaps in debug tables upon which we don't want to depend), we have to discover that something is a pointer by how it's used. The property that distinguishes data pointers from non-pointers is that they're used as addresses when reading from or writing to memory.


The algorithm works by performing a data-flow analysis in the symbolic domain with each CFG vertex also keeping track of which memory locations are read. When the data-flow step completes, the algorithm scans all memory locations (across all CFG vertices) to get a list of addresses. Each address expression includes a list of all instructions that were used to define the address. For instance, given this simpler code:

; int deref(int *ptr, int index) { return ptr[index]; }
L0: push ebp
L1: mov ebp, esp
L3: mov eax, [ebp+8]
L6: mov ecx, [ebp+12]
L9: mov eax, [eax + ecx*4]
Lc: leave
Ld: ret

L9 reads from memory address eax + ecx * 4, and that address was calculated by previous instructions:

Other addresses in addition to the one read by L9 are:

A second step (not requiring a second data-flow, but using information gathered by the first data flow), looks at addresses that were read by instructions that defined an address. For instance, L3, L6, and L9 are the instructions that defined the address used by L9, and all three of them read some memory:

Since L9 reads from the same address whose definers we are processing, we discard the information from L9, keeping only the two reads from L3 and L6. Both of these reads match the width of the stack pointer, therefore we keep both (this is an optional setting for this analysis) and the analysis deems them "addressses of data pointers". Incidentally, the width of the stack pointer is used as the width of data pointers, and the width of the instruction pointer is used as the width of code pointers. The result is that eight bytes on the stack are deemed addresses of data pointers. They are:

(add[32] esp_0[32] 0x00000004[32])
(add[32] esp_0[32] 0x00000005[32])
(add[32] esp_0[32] 0x00000006[32])
(add[32] esp_0[32] 0x00000007[32])
(add[32] esp_0[32] 0x00000008[32])
(add[32] esp_0[32] 0x00000009[32])
(add[32] esp_0[32] 0x0000000a[32])
(add[32] esp_0[32] 0x0000000b[32])

An astute observer will notice that the algorithm has detected that both "ptr" and "index" are detected as pointers. Although they are not "pointers" per se in the C language, they are indeed both pointers by some definition of assembly language: they're both used as indexes into a global memory address space.

The analysis also detects other pointers that are not evident from the C source code: EBP's stored location just below the original top-of-stack is a pointer, and the return address stored at the top of the stack is a pointer.


Like most binary analysis functionality, binary pointer detection is encapsulated in its own namespace. The main class, Analysis, performs most of the work. A user instantiates an analysis object giving it a certain configuration at the same time. He then invokes one of its analysis methods, such Analysis::analyzeFunction, one or more times and queries the results after each analysis. The results are returned as symbolic address expressions relative to some initial state.

The "testPointerDetection.C" tester has an example use case:


class  Analysis
 Pointer analysis. More...
class  PointerDescriptor
 Description of one pointer. More...
class  Settings
 Settings to control the pointer analysis. More...


using PointerDescriptors = std::list< PointerDescriptor >
 Set of pointers. More...


void initDiagnostics ()
 Initialize diagnostics. More...


Sawyer::Message::Facility mlog
 Facility for diagnostic output. More...

Typedef Documentation

Set of pointers.

Definition at line 217 of file PointerDetection.h.

Function Documentation

void Rose::BinaryAnalysis::PointerDetection::initDiagnostics ( )

Initialize diagnostics.

This is normally called as part of ROSE's diagnostics initialization, but it doesn't hurt to call it often.

Variable Documentation

Sawyer::Message::Facility Rose::BinaryAnalysis::PointerDetection::mlog

Facility for diagnostic output.

The facility can be controlled directly or via ROSE's command-line.