ROSE 0.11.145.147
|
Binary instruction semantics.
Entities in this namespace deal with the semantics of machine instructions, and with the process of "executing" a machine instruction in a particular semantic domain. Instruction "execution" is a very broad term and can refer to execution in the tranditional sense where each instruction modifies the machine state (registers and memory) in a particular domain (concrete, interval, sign, symbolic, user-defined). But it can also refer to any kind of analysis that depends on semantics of individual machine instructions (def-use, tainted-flow, etc). It can even refer to the transformation of machine instructions in ROSE internal representation to some other representation (e.g., to ROSE RISC or LLVM assembly) where the other representation is built by "executing" the instruction.
ROSE's binary semantics framework has four major components: the dispatchers, RISC operators, states, and values. Each component has a base class to define the interface and common functionality, and subclasses to provide implementation details. A semantics framework is constructed at runtime by instantiating objects from these subclasses and connecting the objects together to form a lattice.
At the top of the lattice is a dispatcher (base class Dispatcher) that "executes" machine instructions by translating (or lowering) them to sequences of RISC-like operations. The subclasses of Dispatcher implement various instruction set architectures (ISAs).
The dispatcher points to an object that defines the RISC-like operators. This object is instantiated from a subclass of RiscOperators, and defines the few dozen RISC-like operators in terms of modifications to a state, or collection of values. Therefore, the RiscOperators needs to point to the current state. Depending on the subclass, it might also point to a lazily-initialized initial state. It is common for an analysis to swap new states in and out of the RiscOperators while the analysis runs.
The aforementioned states are objects instantiated from subclasses of State, which points to at least two substate objects: a MemoryState that describes the values stored at memory addresses, and a RegisterState that describes the values stored in registers. Depending on the State subclass, a state may also contain additional data. The MemoryState and RegisterState are base classes, and their subclasses provide various mechanisms for storing the memory and registers. For instance, memory might be stored as a chronological list or a map, and registers might be stored as an array or map.
Up to this point, we haven't nailed down the definition of "value". A semantic value is also an abstract concept whose interface is declared in the SValue base class, the subclasses of which define the details. A value could be a vector of bits (concrete); an interval defined by two concrete endpoints; a sign consisting of one of the values positive, negative, zero, top, or bottom; a symbolic expression composed of constants, variables, and operations; or pretty much anything you want as long as it implements the API defined in the SValue base class. Many of the objects mentioned above need to be able to create new values, and therefore they point to a proto-typical value instance which forms the bottom of the lattice.
Not all combinations of dispatcher, operators, states, and values are possible, although they are intended to be mostly interchangeable. For instance, you could combine an x86 dispatcher with symbolic operators using chronological memory and generic register states and a symbolic value type, but it probably doesn't make sense to have all the same components but replacing the symbolic value type with a concrete value type. To help keep things organized, collections of compatible types are placed in namespaces such as SymbolicSemantics. These collections of compatible semantic types are called semantic domains. Mixing types between semantic domains sometimes works, depending on the domain.
As mentioned, a "semantic domain" is a collection of compatible semantic types contained in a namespace. ROSE provides a number of general-purpose domains but users are also expected to specialize these for specific purposes. Even within ROSE, many of the analyses specialize these general-purpose domains in order to do something more specific without needing to re-implement large portions of the infrastructure.
You can find the full list of general-purpose domains by looking for sub namespaces of this namespace whose names end with the word "Semantics". Some important examples:
Most of the instruction semantics objects are allocated on the heap and are reference counted. This is beneficial to the user because an analysis might create millions of objects and it would otherwise be a burden if the analysis author had to know when it was safe to delete objects. It also allows an analysis to return results to a higher level and not worry about who now owns those objects.
There are two ways to allocate such objects: (1) you must know the name of the derived class from which to instantiate an object, or (2) you must have an instance of an object of the class you wish to instantiate. The former method is used when you're constructing a semantics framework since that's the moment you know the names of the classes, and the latter method is used when the framework is running and the class names might not be known but an object is already available. The former method uses static member functions, and the latter method uses virtual member functions (C++ implementation of the OO term virtual constructor).
Additional information can be found under Specialization.
Let's say you have an analysis that needs to process x86 instructions symbolically. The first thing you need to do is instantiate a semantics framework – the lattice of objects that are instantiated from the particular semantics component subclasses. You'll need a InstructionSemantics::DispatcherX86 object to handle the instructions, which invokes the RISC-like operations defined by an instance of SymbolicSemantics::RiscOperators, which can use a chronological memory state and a generic register state, and whose values are symbolic (SymbolicSemantics::SValue). You'll need to tie all these objects together into a lattice with the disptatcher at the top. The constructors for the various components generally take arguments which are the lower layers of the lattice, therefore you'll need to build the lattice from the bottom up; that is, start by constructing a proto-typical value (i.e., a value from which new values can be created), then the register and memory states, which are then joined together into a single state object. Then create the RISC-like operations object and give it an initial state, and finally create the dispatcher that points to the RISC-like operations.
Most semantic domains (SymbolicSemantics included) have a simplified RiscOperators constructor that uses default types for some of the lower components, but we'll show the full monty here:
The instruction semantics architecture is designed to allow users to specialize nearly every part of it, which is useful when creating an analysis that needs to override some small parts of the entire semantics framework. Lets say you need to write an analysis that uses a concrete domain (like a simulator) and you want it to report every memory address to which a value is written. Such a domain would be identical in every respect to the ROSE-provided ConcreteSemantics domain except the RISC-like operation for writing a value to memory needs to additionally print the address and value.
Since you're essentially creating a new domain derived from ROSE's concrete domain, you should create a namespace for your domain. Let's call it MySemantics
. Since the value type, memory state, register state, combined state are all the same, create typedefs within MySemantics
that just alias the types in ROSE's concrete domain. Notice there's no alias for a dispatcher; this is because dispatchers are domain-agnostic–any dispatcher will work with any semantic domain.
The only class you need to change is the RiscOperators class in ROSE's concrete domain. Therefore, within your namespace, define a new class named MySemantics::RiscOperators
that inherits from ROSE's ConcreteSemantics::RiscOperators, and override the writeMemory
method so it prints the address and value before delegating to the base class. You'll also need to define three classes of constructors detailed in Writing Constructors for Reference-counted Classes.
Finally, your analysis can instantiate a semantics framework using the components from the new MySemantics
. The code to do this looks almost identical to the example instantiation we already saw, except the word Symbolic
would be changed to MySemantics
wherever it appears.
Here are some additional details to help you implement subclasses of reference-counted classes: You should implement three versions of each constructor: the real C++ constructor, the static allocating constructor, and the virtual constructor. Fortunately, the amount of extra code needed is not substantial since the virtual constructor can call the static allocating constructor, which can call the real C++ constructor. You'll need to override each overload of the three versions of constructors from the base class. The three versions in more detail are:
When writing a subclass the author should implement the three versions for each constructor inherited from the super class. The author may also add any additional constructors that are deemed necessary, realizing that all subclasses of his class will also need to implement those constructors.
The subclass may define a public virtual destructor that will be called by the smart pointer implementation when the final pointer to the object is destroyed.
Here is an example of specializing a class that is itself derived from something in ROSE semantics framework.
Namespaces | |
namespace | BaseSemantics |
Base classes for instruction semantics. | |
namespace | ConcreteSemantics |
A concrete semantic domain. | |
namespace | IntervalSemantics |
An interval analysis semantic domain. | |
namespace | LlvmSemantics |
A semantic domain to generate LLVM. | |
namespace | MultiSemantics |
Semantic domain composed of subdomains. | |
namespace | NullSemantics |
Semantic domain that does nothing, but is well documented. | |
namespace | PartialSymbolicSemantics |
A fast, partially symbolic semantic domain. | |
namespace | SourceAstSemantics |
Generate C source AST from binary AST. | |
namespace | StaticSemantics |
Generate static semantics and attach to the AST. | |
namespace | SymbolicSemantics |
A fully symbolic semantic domain. | |
namespace | TaintSemantics |
Adds taint information to all symbolic values. | |
namespace | TraceSemantics |
A semantics domain wrapper that prints and checks all RISC operators as they occur. | |
Classes | |
class | DispatcherCil |
class | DispatcherM68k |
Dispatches Motorola 68k instructions through the semantics layer. More... | |
class | DispatcherMips |
Dispatches MIPS instructions through the semantics layer. More... | |
class | DispatcherPowerpc |
class | DispatcherX86 |
Semantically evaluates Intel x86 instructions. More... | |
class | TestSemantics |
Provides functions for testing binary instruction semantics. More... | |
Typedefs | |
typedef boost::shared_ptr< class DispatcherCil > | DispatcherCilPtr |
Shared-ownership pointer to an CIL instruction dispatcher. | |
typedef boost::shared_ptr< class DispatcherM68k > | DispatcherM68kPtr |
Shared-ownership pointer to an M68k instruction dispatcher. | |
using | DispatcherMipsPtr = boost::shared_ptr< class DispatcherMips > |
Shared-ownership pointer to a MIPS instruction dispatcher. | |
typedef boost::shared_ptr< class DispatcherPowerpc > | DispatcherPowerpcPtr |
Shared-ownership pointer to a PowerPC instruction dispatcher. | |
typedef boost::shared_ptr< class DispatcherX86 > | DispatcherX86Ptr |
Shared-ownership pointer to an x86 instruction dispatcher. | |
Functions | |
void | initDiagnostics () |
Initialize diagnostics for instruction semantics. | |
Variables | |
Sawyer::Message::Facility | mlog |
Diagnostics logging facility for instruction semantics. | |
typedef boost::shared_ptr<class DispatcherCil> Rose::BinaryAnalysis::InstructionSemantics::DispatcherCilPtr |
Shared-ownership pointer to an CIL instruction dispatcher.
Definition at line 20 of file DispatcherCil.h.
typedef boost::shared_ptr<class DispatcherM68k> Rose::BinaryAnalysis::InstructionSemantics::DispatcherM68kPtr |
Shared-ownership pointer to an M68k instruction dispatcher.
Definition at line 23 of file DispatcherM68k.h.
using Rose::BinaryAnalysis::InstructionSemantics::DispatcherMipsPtr = typedef boost::shared_ptr<class DispatcherMips> |
Shared-ownership pointer to a MIPS instruction dispatcher.
Definition at line 26 of file DispatcherMips.h.
typedef boost::shared_ptr<class DispatcherPowerpc> Rose::BinaryAnalysis::InstructionSemantics::DispatcherPowerpcPtr |
Shared-ownership pointer to a PowerPC instruction dispatcher.
Definition at line 26 of file DispatcherPowerpc.h.
typedef boost::shared_ptr<class DispatcherX86> Rose::BinaryAnalysis::InstructionSemantics::DispatcherX86Ptr |
Shared-ownership pointer to an x86 instruction dispatcher.
Definition at line 28 of file DispatcherX86.h.