ROSE 0.11.145.147
|
Various tools for data-flow analysis.
Binary data-flow analysis differs from source code data-flow analysis in that source code has well-known, named variables and the analysis is often simplified by considering only those variables (e.g., whole arrays, not array element granularity), and by ignoring aliasing issues. But because binaries have no named variables but rather one large array of memory with computed addresses, a binary data-flow analysis must be at address granularity and account for aliasing. In fact, because of the rich set of addressing modes, it is very common that within a single instruction some of the inputs and/or outputs depend on other inputs.
Another difference is in the number of basic operations that need to be supported. Source-level languages typically have at most a few dozen basic operations for which data-flow must be explicitly encoded in the analysis, while binary analysis needs basic data-flow information about hundreds of CPU instructions across the supported architectures. In fact, there are so many unique instructions that the implementation here takes a different approach: it leverages the fact that ROSE has instruction semantics and is able to discern data-flow within an instruction (or even across an entire basic block) by executing the instruction (or block) in a special data-flow semantic domain. The domain is an aggregate domain consisting of a main user-specified domain (often symbolic), a domain to track data-flow, and any number of additional user-specified domains. This approach works to discover multiple data-flow within the executed item (instruction or basic block); most CPU instructions have multiple data flows (e.g., "PUSH" has at least three independent data flows: incrementing of the instruction pointer, moving data between stack and register, incrementing the stack pointer).
Two additional minor things to be aware of in binary data-flow are (1) the difference in size between registers and memory locations, and (2) the use of idioms to perform common tasks. For instance, a read from memory to a 32-bit register is actually four separate 8-bit reads at the data-flow level since memory holds 8-bit values. An example of an idiom is the "XOR EAX, EAX" instruction to set the EAX register to zero (this is important for taint analysis, since it effectively removes any taint from EAX).
The core component of the binary data-flow is an Engine class template. Its purpose is to collect all the types and objects necessary for a data-flow analysis and to iteratively solve a data-flow equation.
Additional components provide functionality that's useful during the core solving process, but not necessary. For instance, one can use abstract locations (AbstractLocation) to represent variables. An abstract location is either a register or a memory address. Registers are represented by a RegisterDescriptor while addresses are represented by a semantic value in some user-specified domain (often symbolic, see Rose::BinaryAnalysis::InstructionSemantics::BaseSemantics::SValue).
See Engine for how data-flow is solved. In short, data-flow consists of values (taken from a lattice) that are stored in variables (abstract locations, memory, registers). A set of variables and their values (a.k.a., state) is stored at each vertex in a control flow graph. The solver iterates over the control flow graph. It invokes a user-defined transfer function at each vertex. The transfer function takes an incoming state and produces an outgoing state. The incoming state for a vertex is calculated by merging the outgoing states of its predecessor vertices. When the iteration reaches a fixed point the analysis is complete.
Definition at line 72 of file DataFlow.h.
#include <Rose/BinaryAnalysis/DataFlow.h>
Classes | |
class | DefaultVertexUnpacker |
Functor to return instructions for a cfg vertex. More... | |
class | Engine |
Data-flow engine. More... | |
class | Exception |
Data-flow exception base class. More... | |
class | NotConverging |
Exceptions when a fixed point is not reached. More... | |
class | PathAlwaysFeasible |
Trivial path feasibility predicate. More... | |
class | SemanticsMerge |
Basic merge operation for instruction semantics. More... | |
Public Types | |
typedef InstructionSemantics::DataFlowSemantics::DataFlowGraph | Graph |
Data-flow graph. | |
typedef AbstractLocation | Variable |
Variable participating in data flow. | |
typedef std::list< Variable > | VariableList |
List of variables. | |
typedef Sawyer::Container::Map< size_t, Graph > | VertexFlowGraphs |
Map from CFG vertex to data-flow graph. | |
Public Member Functions | |
DataFlow (const InstructionSemantics::BaseSemantics::DispatcherPtr &userDispatcher) | |
Constructor. | |
VariableList | getUniqueVariables (const VertexFlowGraphs &) |
Get list of unique variables. | |
Graph | buildGraph (const std::vector< SgAsmInstruction * > &) |
Compute data-flow. | |
template<class CFG , class VertexUnpacker > | |
VertexFlowGraphs | buildGraphPerVertex (const CFG &cfg, size_t startVertex, VertexUnpacker vertexUnpacker) |
Compute data-flow per CFG vertex. | |
template<class CFG > | |
VertexFlowGraphs | buildGraphPerVertex (const CFG &cfg, size_t startVertex) |
Compute data-flow per CFG vertex. | |
Static Public Member Functions | |
static void | initDiagnostics () |
Initialize diagnostics. | |
Static Public Attributes | |
static Sawyer::Message::Facility | mlog |
typedef InstructionSemantics::DataFlowSemantics::DataFlowGraph Rose::BinaryAnalysis::DataFlow::Graph |
Data-flow graph.
Vertices are abstract locations (register descriptors and/or memory addresses) and edges indicate the flow of data from abstract location to another. The edge IDs are in the order they were added; each edge also has a sequence number that is initially equal to the edge ID, although edge IDs can change if edges are erased from the graph).
Definition at line 79 of file DataFlow.h.
Variable participating in data flow.
A variable in binary data-flow analysis is an abstract location referencing either a register or memory cell. The address for memory locations is an arbitrary semantic expression (Rose::BinaryAnalysis::InstructionSemantics::BaseSemantics::SValue).
Definition at line 86 of file DataFlow.h.
typedef std::list<Variable> Rose::BinaryAnalysis::DataFlow::VariableList |
List of variables.
Definition at line 89 of file DataFlow.h.
typedef Sawyer::Container::Map<size_t, Graph> Rose::BinaryAnalysis::DataFlow::VertexFlowGraphs |
Map from CFG vertex to data-flow graph.
This data structure holds a data-flow graph for each control flow graph vertex. The data-flow graph represents data-flow from one abstract location to another. The union of all abstract locations across all control flow vertices is the set of variables present in the specimen being analyzed. The map is keyed by CFG vertex IDs.
Definition at line 96 of file DataFlow.h.
|
inline |
Constructor.
Constructs a new data-flow analysis framework using the specified user-level instruction semantics. The semantics should consist of an instruction dispatcher appropriate for the architecture being analyzed, and an associated semantic domain that will be used to calculate memory addresses. The SymbolicSemantics domain is typically used.
Definition at line 125 of file DataFlow.h.
|
static |
Initialize diagnostics.
This is called by Rose::Diagnostics::initialize.
Graph Rose::BinaryAnalysis::DataFlow::buildGraph | ( | const std::vector< SgAsmInstruction * > & | ) |
Compute data-flow.
Computes and returns a graph describing how data-flow occurs for the specified instructions (which must have linear control flow, e.g., from a single basic block). The vertices of the returned graph contain the abstract locations that are referenced (read or written) and the edges indicate the data-flow. The edge values are integers imparting an ordering to the data-flow.
Referenced by buildGraphPerVertex().
|
inline |
Compute data-flow per CFG vertex.
This method traverses the control flow graph to discover the abstract locations that are referenced by each vertex and the flow edges between those abstract locations.
The algorithm currently implemented here is to use an aggregate semantic domain consisting of a user-supplied semantic domain in conjunction with a data-flow discovery domain, and to process each CFG vertex one time in some arbitrary depth-first order. This isn't a rigorous analysis, but it is usually able to accurately identify local and global variables, although typically not through pointers or array indexing.
Definition at line 174 of file DataFlow.h.
References buildGraph(), Sawyer::Container::Map< K, T, Cmp, Alloc >::insert(), Rose::StringUtility::plural(), and Sawyer::Container::Map< K, T, Cmp, Alloc >::size().
Referenced by buildGraphPerVertex(), and Rose::BinaryAnalysis::TaintedFlow::computeFlowGraphs().
|
inline |
Compute data-flow per CFG vertex.
This method traverses the control flow graph to discover the abstract locations that are referenced by each vertex and the flow edges between those abstract locations.
The algorithm currently implemented here is to use an aggregate semantic domain consisting of a user-supplied semantic domain in conjunction with a data-flow discovery domain, and to process each CFG vertex one time in some arbitrary depth-first order. This isn't a rigorous analysis, but it is usually able to accurately identify local and global variables, although typically not through pointers or array indexing.
Definition at line 204 of file DataFlow.h.
References buildGraphPerVertex().
VariableList Rose::BinaryAnalysis::DataFlow::getUniqueVariables | ( | const VertexFlowGraphs & | ) |
Get list of unique variables.
This returns a list of unique variables by computing the union of all variables across the index. Uniqueness is determined by calling AbstractLocation::mustAlias. The variables are returned in no particular order.
Referenced by Rose::BinaryAnalysis::TaintedFlow::computeFlowGraphs(), and Rose::BinaryAnalysis::TaintedFlow::vertexFlowGraphs().
|
static |
Definition at line 117 of file DataFlow.h.