ROSE  0.11.87.0
Namespaces | Classes | Typedefs | Functions | Variables
Rose::BinaryAnalysis::InstructionSemantics2 Namespace Reference

Description

Binary instruction semantics.

Entities in this namespace deal with the semantics of machine instructions, and with the process of "executing" a machine instruction in a particular semantic domain. Instruction "execution" is a very broad term and can refer to execution in the tranditional sense where each instruction modifies the machine state (registers and memory) in a particular domain (concrete, interval, sign, symbolic, user-defined). But it can also refer to any kind of analysis that depends on semantics of individual machine instructions (def-use, tainted-flow, etc). It can even refer to the transformation of machine instructions in ROSE internal representation to some other representation (e.g., to ROSE RISC or LLVM assembly) where the other representation is built by "executing" the instruction.

Components of instruction semantics

ROSE's binary semantics framework has four major components: the dispatchers, RISC operators, states, and values. Each component has a base class to define the interface and common functionality, and subclasses to provide implementation details. A semantics framework is constructed at runtime by instantiating objects from these subclasses and connecting the objects together to form a lattice.

At the top of the lattice is a dispatcher (base class Dispatcher) that "executes" machine instructions by translating (or lowering) them to sequences of RISC-like operations. The subclasses of Dispatcher implement various instruction set architectures (ISAs).

The dispatcher points to an object that defines the RISC-like operators. This object is instantiated from a subclass of RiscOperators, and defines the few dozen RISC-like operators in terms of modifications to a state, or collection of values. Therefore, the RiscOperators needs to point to the current state. Depending on the subclass, it might also point to a lazily-initialized initial state. It is common for an analysis to swap new states in and out of the RiscOperators while the analysis runs.

The aforementioned states are objects instantiated from subclasses of State, which points to at least two substate objects: a MemoryState that describes the values stored at memory addresses, and a RegisterState that describes the values stored in registers. Depending on the State subclass, a state may also contain additional data. The MemoryState and RegisterState are base classes, and their subclasses provide various mechanisms for storing the memory and registers. For instance, memory might be stored as a chronological list or a map, and registers might be stored as an array or map.

Up to this point, we haven't nailed down the definition of "value". A semantic value is also an abstract concept whose interface is declared in the SValue base class, the subclasses of which define the details. A value could be a vector of bits (concrete); an interval defined by two concrete endpoints; a sign consisting of one of the values positive, negative, zero, top, or bottom; a symbolic expression composed of constants, variables, and operations; or pretty much anything you want as long as it implements the API defined in the SValue base class. Many of the objects mentioned above need to be able to create new values, and therefore they point to a proto-typical value instance which forms the bottom of the lattice.

Not all combinations of dispatcher, operators, states, and values are possible, although they are intended to be mostly interchangeable. For instance, you could combine an x86 dispatcher with symbolic operators using chronological memory and generic register states and a symbolic value type, but it probably doesn't make sense to have all the same components but replacing the symbolic value type with a concrete value type. To help keep things organized, collections of compatible types are placed in namespaces such as SymbolicSemantics. These collections of compatible semantic types are called semantic domains. Mixing types between semantic domains sometimes works, depending on the domain.

Major domains

As mentioned, a "semantic domain" is a collection of compatible semantic types contained in a namespace. ROSE provides a number of general-purpose domains but users are also expected to specialize these for specific purposes. Even within ROSE, many of the analyses specialize these general-purpose domains in order to do something more specific without needing to re-implement large portions of the infrastructure.

You can find the full list of general-purpose domains by looking for sub namespaces of this namespace whose names end with the word "Semantics". Some important examples:

Memory Management

Most of the instruction semantics objects are allocated on the heap and are reference counted. This is beneficial to the user because an analysis might create millions of objects and it would otherwise be a burden if the analysis author had to know when it was safe to delete objects. It also allows an analysis to return results to a higher level and not worry about who now owns those objects.

There are two ways to allocate such objects: (1) you must know the name of the derived class from which to instantiate an object, or (2) you must have an instance of an object of the class you wish to instantiate. The former method is used when you're constructing a semantics framework since that's the moment you know the names of the classes, and the latter method is used when the framework is running and the class names might not be known but an object is already available. The former method uses static member functions, and the latter method uses virtual member functions (C++ implementation of the OO term virtual constructor).

Additional information can be found under Specialization.

Instantiating a Semantics Framework

Let's say you have an analysis that needs to process x86 instructions symbolically. The first thing you need to do is instantiate a semantics framework – the lattice of objects that are instantiated from the particular semantics component subclasses. You'll need a DispatcherX86 object to handle the instructions, which invokes the RISC-like operations defined by an instance of SymbolicSemantics::RiscOperators, which can use a chronological memory state and a generic register state, and whose values are symbolic (SymbolicSemantics::SValue). You'll need to tie all these objects together into a lattice with the disptatcher at the top. The constructors for the various components generally take arguments which are the lower layers of the lattice, therefore you'll need to build the lattice from the bottom up; that is, start by constructing a proto-typical value (i.e., a value from which new values can be created), then the register and memory states, which are then joined together into a single state object. Then create the RISC-like operations object and give it an initial state, and finally create the dispatcher that points to the RISC-like operations.

Most semantic domains (SymbolicSemantics included) have a simplified RiscOperators constructor that uses default types for some of the lower components, but we'll show the full monty here:

P2::Partitioner partitioner = ....; // disassembly results
SmtSolverPtr solver = ....; // optional SMT solver
const RegisterDictionary *regdict = partitioner.instructionProvider().registerDictionary();
Base::SValuePtr prototval = Symbolic::SValue::instance();
Base::RegisterStatePtr regs = Symbolic::RegisterState::instance(protoval, regdict);
Base::MemoryStatePtr mem = Symbolic::MemoryListState::instance(protoval, protoval);
Base::StatePtr state = Symbolic::State::instance(regs, mem);
Base::RiscOperatorsPtr ops = Symbolic::RiscOperators::instance(state, solver);
Base::DispatcherPtr cpu = Base::DispatcherX86::instance(ops, 32, regdict);

Specialization

The instruction semantics architecture is designed to allow users to specialize nearly every part of it, which is useful when creating an analysis that needs to override some small parts of the entire semantics framework. Lets say you need to write an analysis that uses a concrete domain (like a simulator) and you want it to report every memory address to which a value is written. Such a domain would be identical in every respect to the ROSE-provided ConcreteSemantics domain except the RISC-like operation for writing a value to memory needs to additionally print the address and value.

Since you're essentially creating a new domain derived from ROSE's concrete domain, you should create a namespace for your domain. Let's call it MySemantics. Since the value type, memory state, register state, combined state are all the same, create typedefs within MySemantics that just alias the types in ROSE's concrete domain. Notice there's no alias for a dispatcher; this is because dispatchers are domain-agnostic–any dispatcher will work with any semantic domain.

The only class you need to change is the RiscOperators class in ROSE's concrete domain. Therefore, within your namespace, define a new class named MySemantics::RiscOperators that inherits from ROSE's ConcreteSemantics::RiscOperators, and override the writeMemory method so it prints the address and value before delegating to the base class. You'll also need to define three classes of constructors detailed in Writing Constructors for Reference-counted Classes.

Finally, your analysis can instantiate a semantics framework using the components from the new MySemantics. The code to do this looks almost identical to the example instantiation we already saw, except the word Symbolic would be changed to MySemantics wherever it appears.

Writing Constructors for Reference-counted Classes

Here are some additional details to help you implement subclasses of reference-counted classes: You should implement three versions of each constructor: the real C++ constructor, the static allocating constructor, and the virtual constructor. Fortunately, the amount of extra code needed is not substantial since the virtual constructor can call the static allocating constructor, which can call the real C++ constructor. You'll need to override each overload of the three versions of constructors from the base class. The three versions in more detail are:

When writing a subclass the author should implement the three versions for each constructor inherited from the super class. The author may also add any additional constructors that are deemed necessary, realizing that all subclasses of his class will also need to implement those constructors.

The subclass may define a public virtual destructor that will be called by the smart pointer implementation when the final pointer to the object is destroyed.

Here is an example of specializing a class that is itself derived from something in ROSE semantics framework.

// Smart pointer for the subclass
typedef boost::shared_ptr<class MyThing> MyThingPtr;
// Class derived from OtherThing, which eventually derives from a class
// defined in BinarySemantics::InstructionSemantics2::BaseSemantics--lets
// say BaseSemantics::Thing -- a non-existent class that follows the rules
// outlined above.
class MyThing: public OtherThing {
private:
char *data; // some data allocated on the heap w/out a smart pointer
// Real constructors. Normally this will be all the same constructors as
// in the super class, and possibly a few new ones. Thus anything you add
// here will need to also be implemented in all subclasses hereof. Lets
// pretend that the super class has two constructors: a copy constructor
// and one that takes a pointer to a register state.
protected:
explicit MyThing(const BaseSemantics::RegisterStatePtr &rstate)
: OtherThing(rstate), data(NULL) {}
MyThing(const MyThing &other)
: OtherThing(other), data(copy_string(other.data)) {}
// Define the virtual destructor if necessary. This won't be called until
// the last smart pointer reference to this object is destroyed.
public:
virtual ~MyThing() {
delete data;
}
// Static allocating constructors. One static allocating constructor
// for each real constructor, including the copy constructor.
public:
static MyThingPtr instance(const BaseSemantics::RegisterStatePtr &rstate) {
return MyThingPtr(new MyThing(rstate));
}
static MyThingPtr instance(const MyThingPtr &other) {
return MyThingPtr(new MyThing(*other));
}
// Virtual constructors. One virtual constructor for each static allocating
// constructor. It is of utmost importance that we cover all the virtual
// constructors from the super class. These return the most super type
// possible, usually something from BaseSemantics.
public:
virtual BaseSemantics::ThingPtr create(const BaseSemantics::RegisterStatePtr &rstate) {
return instance(rstate);
}
// Name the virtual copy constructor "clone" rather than "create".
virtual BaseSemantics::ThingPtr clone(const BaseSemantics::ThingPtr &other_) {
MyThingPtr other = MyThing::promote(other_);
return instance(other);
}
// Define the checking dynamic pointer cast.
public:
static MyThingPtr promomte(const BaseSemantics::ThingPtr &obj) {
MyThingPtr retval = boost::dynamic_pointer_cast<MyThingPtr>(obj);
assert(retval!=NULL);
return NULL;
}
// Define the methods you need for this class.
public:
virtual char *get_data() const {
return data; // or maybe return a copy in case this gets deleted?
}
virtual void set_data(const char *s) {
data = copy_string(s);
}
private:
void char *copy_string(const char *s) {
if (s==NULL)
return NULL;
char *retval = new char[strlen(s)+1];
strcpy(retval, s);
return retval;
}
};

Namespaces

 BaseSemantics
 Base classes for instruction semantics.
 
 ConcreteSemantics
 A concrete semantic domain.
 
 IntervalSemantics
 An interval analysis semantic domain.
 
 LlvmSemantics
 A semantic domain to generate LLVM.
 
 MultiSemantics
 Semantic domain composed of subdomains.
 
 NativeSemantics
 Domain related to an actual running process.
 
 NullSemantics
 Semantic domain that does nothing, but is well documented.
 
 PartialSymbolicSemantics
 A fast, partially symbolic semantic domain.
 
 SourceAstSemantics
 Generate C source AST from binary AST.
 
 StaticSemantics
 Generate static semantics and attach to the AST.
 
 SymbolicSemantics
 A fully symbolic semantic domain.
 
 TraceSemantics
 A semantics domain wrapper that prints and checks all RISC operators as they occur.
 

Classes

class  DispatcherCil
 
class  DispatcherM68k
 
class  DispatcherPowerpc
 
class  DispatcherX86
 
class  TestSemantics
 Provides functions for testing binary instruction semantics. More...
 

Typedefs

typedef boost::shared_ptr< class DispatcherCilDispatcherCilPtr
 Shared-ownership pointer to an CIL instruction dispatcher. More...
 
typedef boost::shared_ptr< class DispatcherM68kDispatcherM68kPtr
 Shared-ownership pointer to an M68k instruction dispatcher. More...
 
typedef boost::shared_ptr< class DispatcherPowerpcDispatcherPowerpcPtr
 Shared-ownership pointer to a PowerPC instruction dispatcher. More...
 
typedef boost::shared_ptr< class DispatcherX86DispatcherX86Ptr
 Shared-ownership pointer to an x86 instruction dispatcher. More...
 

Functions

void initDiagnostics ()
 Initialize diagnostics for instruction semantics. More...
 

Variables

Sawyer::Message::Facility mlog
 Diagnostics logging facility for instruction semantics. More...
 

Typedef Documentation

Shared-ownership pointer to an CIL instruction dispatcher.

See Shared ownership.

Definition at line 18 of file DispatcherCil.h.

Shared-ownership pointer to an M68k instruction dispatcher.

See Shared ownership.

Definition at line 18 of file DispatcherM68k.h.

Shared-ownership pointer to a PowerPC instruction dispatcher.

See Shared ownership.

Definition at line 25 of file DispatcherPowerpc.h.

Shared-ownership pointer to an x86 instruction dispatcher.

See Shared ownership.

Definition at line 24 of file DispatcherX86.h.

Function Documentation

void Rose::BinaryAnalysis::InstructionSemantics2::initDiagnostics ( )

Initialize diagnostics for instruction semantics.

Variable Documentation

Sawyer::Message::Facility Rose::BinaryAnalysis::InstructionSemantics2::mlog

Diagnostics logging facility for instruction semantics.