ROSE
0.11.50.0

Analysis to test the similarity of two functions.
This analysis compares two functions and returns a floating point value that represents how different they are from one another. The scale of the returned value depends on the metrics that are used, but generally a difference of zero represents two functions that are equivalent according to the metric, and values larger than zero indicate the presence of differences.
This FunctionSimilarity class provides the following things:
A characteristic value is data that measures some characteristic of a function. This analysis supports two types of characteristic values: floatingpoint Cartesian points and ordered lists of integers. Floatingpoint scalar values are a special case of Cartesian points.
A metric is a functor that takes input related to a disassembled function and produces characteristic values. The characteristic data can be either zero or more floatingpoint Cartesian points (all having the same dimensionality) or an ordered list of zero or more integers. Examples are: (1) a metric that returns a fourdimensional point describing the number of immediate and secondlevel predecessors and successors for each vertex of the function control flow graph; (2) a metric that returns an ordered list describing local variables detected by a dataflow analysis; (3) a metric that returns a histogram (multidimensional point) of the classes of instructions found in the function.
A metric category (or just category) is a collection of characteristic values from compatible metrics (usually just one metric).
Use this analysis by performing these steps. Steps 2 and 3 can be interleaved.
FunctionSimilarity
analysis objectDefinition at line 60 of file FunctionSimilarity.h.
#include <FunctionSimilarity.h>
Classes  
class  Exception 
Public Types  
enum  CValKind { CARTESIAN_POINT, ORDERED_LIST } 
Kinds of characteristic values. More...  
enum  Statistic { AVERAGE, MEDIAN, MAXIMUM } 
Ways that values can be combined. More...  
typedef std::vector< double >  CartesianPoint 
Characteristic value that's a Cartesian point. More...  
typedef std::vector< int >  OrderedList 
Characteristic value that's an ordered list of integers. More...  
typedef std::vector< CartesianPoint >  PointCloud 
Unordered collection of Cartesian points. More...  
typedef std::vector< OrderedList >  OrderedLists 
Ordered collection of ordered lists of integers. More...  
typedef size_t  CategoryId 
ID number unique within this analysis context. More...  
typedef std::pair< Partitioner2::Function::Ptr, double >  FunctionDistancePair 
Function and distance to some other function. More...  
typedef std::pair< Partitioner2::Function::Ptr, Partitioner2::Function::Ptr >  FunctionPair 
Pair of functions. More...  
typedef Matrix< double >  DistanceMatrix 
Square matrix representing distances. More...  
Public Member Functions  
void  clear () 
Rose::Progress::Ptr  progress () const 
Property: Object to which progress reports are made. More...  
CategoryId  declarePointCategory (const std::string &name, size_t dimensionality, bool allowExisting=true) 
Declare a new category of Cartesian points. More...  
CategoryId  declareListCategory (const std::string &name, bool allowExisting=true) 
Declare a new category of ordered lists of integers. More...  
size_t  nCategories () const 
Number of categories. More...  
CategoryId  findCategory (const std::string &name) const 
Find a category by name. More...  
CValKind  categoryKind (CategoryId) const 
Kind of category. More...  
size_t  categoryDimensionality (CategoryId) const 
Dimensionality of Cartesian characteristic points. More...  
void  insertPoint (const Partitioner2::Function::Ptr &, CategoryId, const CartesianPoint &) 
Insert a Cartesian point characteristic value. More...  
void  insertList (const Partitioner2::Function::Ptr &, CategoryId, const OrderedList &) 
Insert an ordered list characteristic value. More...  
size_t  size (const Partitioner2::Function::Ptr &, CategoryId) const 
Number of characteristic points in a category. More...  
const PointCloud &  points (const Partitioner2::Function::Ptr &, CategoryId) const 
Catesian points contained in a category. More...  
const OrderedLists &  lists (const Partitioner2::Function::Ptr &, CategoryId) const 
Ordered lists contained in a category. More...  
double  compare (const Partitioner2::Function::Ptr &, const Partitioner2::Function::Ptr &, double dflt=NAN) const 
Compare two functions. More...  
std::vector< FunctionDistancePair >  compareOneToAll (const Partitioner2::Function::Ptr &) const 
Compare one function with all others. More...  
std::vector< std::vector< double > >  compareManyToMany (const std::vector< Partitioner2::Function::Ptr > &, const std::vector< Partitioner2::Function::Ptr > &) const 
Compare many functions to many others. More...  
DistanceMatrix  compareManyToManyMatrix (std::vector< Partitioner2::Function::Ptr >, std::vector< Partitioner2::Function::Ptr >) const 
Compare many functions to many others. More...  
std::vector< FunctionPair >  findMinimumCostMapping (const std::vector< Partitioner2::Function::Ptr > &list1, const std::vector< Partitioner2::Function::Ptr > &list2) const 
Minimum cost 1:1 mapping. More...  
std::vector< double >  computeDistances (const std::vector< Partitioner2::Function::Ptr > &list1, const std::vector< Partitioner2::Function::Ptr > &list2, size_t nThreads) const 
Compute distances between sets of functions. More...  
void  printCharacteristicValues (std::ostream &) const 
Print characteristic values for this analysis. More...  
Statistic  categoryAccumulatorType () const 
Property: Statistic for combining category distances. More...  
void  categoryAccumulatorType (Statistic s) 
Property: Statistic for combining category distances. More...  
double  categoryWeight (CategoryId) const 
Property: category weight.  
void  categoryWeight (CategoryId, double) 
Property: category weight.  
CategoryId  declareCfgConnectivity (const std::string &categoryName) 
Control flow graph connectivity. More...  
void  measureCfgConnectivity (CategoryId, const Partitioner2::Partitioner &, const Partitioner2::Function::Ptr &, size_t maxPoints=UNLIMITED) 
Control flow graph connectivity. More...  
CategoryId  declareCallGraphConnectivity (const std::string &categoryName) 
Function calls. More...  
void  measureCallGraphConnectivity (CategoryId, const Partitioner2::Partitioner &, const Partitioner2::Function::Ptr &) 
Function calls. More...  
CategoryId  declareMnemonicStream (const std::string &categoryName) 
Instruction mnemonic stream. More...  
void  measureMnemonicStream (CategoryId, const Partitioner2::Partitioner &, const Partitioner2::Function::Ptr &) 
Instruction mnemonic stream. More...  
template<class FunctionIterator >  
std::vector< FunctionDistancePair >  compareOneToMany (const Partitioner2::Function::Ptr &needle, const boost::iterator_range< FunctionIterator > &haystack) const 
Compare one function with many others. More...  
template<class FunctionIterator >  
std::vector< FunctionDistancePair >  compareOneToMany (const Partitioner2::Function::Ptr &needle, const FunctionIterator &begin, const FunctionIterator &end) const 
Compare one function with many others. More...  
std::vector< FunctionDistancePair >  compareOneToMany (const Partitioner2::Function::Ptr &needle, const std::vector< Partitioner2::Function::Ptr > &haystack) const 
Compare one function with many others. More...  
Static Public Member Functions  
static bool  sortByIncreasingDistance (const FunctionDistancePair &a, const FunctionDistancePair &b) 
Predicate for sorting by ascending distance. More...  
static bool  sortByDecreasingDistance (const FunctionDistancePair &a, const FunctionDistancePair &b) 
Predicate for sorting by descending distance. More...  
static bool  sortByAddress (const FunctionDistancePair &a, const FunctionDistancePair &b) 
Predicate for sorting by function address. More...  
static void  initDiagnostics () 
Initializes and registers disassembler diagnostic streams. More...  
static double  cartesianDistance (const FunctionSimilarity::CartesianPoint &, const FunctionSimilarity::CartesianPoint &) 
Cartesian distance between two points. More...  
static std::vector< size_t >  findMinimumAssignment (const DistanceMatrix &) 
Find minimum mapping from rows to columns. More...  
static double  totalAssignmentCost (const DistanceMatrix &, const std::vector< size_t > &assignment) 
Total cost of a mapping. More...  
static double  maximumDistance (const DistanceMatrix &) 
Maximum value in the distance matrix. More...  
static double  averageDistance (const DistanceMatrix &) 
Average distance in the matrix. More...  
static double  medianDistance (const DistanceMatrix &) 
Median distance in the matrix. More...  
Static Public Attributes  
static const CategoryId  NO_CATEGORY = 1 
Invalid category ID. More...  
static Sawyer::Message::Facility  mlog 
Diagnostic streams.  
typedef std::vector<double> Rose::BinaryAnalysis::FunctionSimilarity::CartesianPoint 
Characteristic value that's a Cartesian point.
Definition at line 77 of file FunctionSimilarity.h.
typedef std::vector<int> Rose::BinaryAnalysis::FunctionSimilarity::OrderedList 
Characteristic value that's an ordered list of integers.
Definition at line 78 of file FunctionSimilarity.h.
typedef std::vector<CartesianPoint> Rose::BinaryAnalysis::FunctionSimilarity::PointCloud 
Unordered collection of Cartesian points.
Definition at line 81 of file FunctionSimilarity.h.
typedef std::vector<OrderedList> Rose::BinaryAnalysis::FunctionSimilarity::OrderedLists 
Ordered collection of ordered lists of integers.
Definition at line 82 of file FunctionSimilarity.h.
typedef size_t Rose::BinaryAnalysis::FunctionSimilarity::CategoryId 
ID number unique within this analysis context.
Definition at line 87 of file FunctionSimilarity.h.
typedef std::pair<Partitioner2::Function::Ptr, double > Rose::BinaryAnalysis::FunctionSimilarity::FunctionDistancePair 
Function and distance to some other function.
Definition at line 91 of file FunctionSimilarity.h.
typedef std::pair<Partitioner2::Function::Ptr, Partitioner2::Function::Ptr> Rose::BinaryAnalysis::FunctionSimilarity::FunctionPair 
Pair of functions.
Definition at line 94 of file FunctionSimilarity.h.
typedef Matrix<double> Rose::BinaryAnalysis::FunctionSimilarity::DistanceMatrix 
Square matrix representing distances.
Definition at line 100 of file FunctionSimilarity.h.
Kinds of characteristic values.
Enumerator  

CARTESIAN_POINT 
Values are Ndimensional Cartesian points. 
ORDERED_LIST 
Values are lists of integers. 
Definition at line 73 of file FunctionSimilarity.h.
Ways that values can be combined.
Definition at line 85 of file FunctionSimilarity.h.

inline 
Property: Statistic for combining category distances.
When computing the distance between two functions, we first compute the distance between the categories by comparing the characteristic values in those categories. The category distance are then combined using the specified statistic to obtain the distance between the functions.
Definition at line 167 of file FunctionSimilarity.h.

inline 
Property: Statistic for combining category distances.
When computing the distance between two functions, we first compute the distance between the categories by comparing the characteristic values in those categories. The category distance are then combined using the specified statistic to obtain the distance between the functions.
Definition at line 168 of file FunctionSimilarity.h.

inline 
Property: Object to which progress reports are made.
Definition at line 172 of file FunctionSimilarity.h.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declarePointCategory  (  const std::string &  name, 
size_t  dimensionality,  
bool  allowExisting = true 

) 
Declare a new category of Cartesian points.
All characteristics values of this category have the same specified dimensionality. The order that points are inserted into this category is irrelevant. If allowExisting
is false then the specified name must not already exist or else an std::runtime_error
is thrown; if allowExisting
is true then either a new category is created or the ID of an existing category is returned. Returns a new category ID number, which are consecutive small integers starting at zero.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareListCategory  (  const std::string &  name, 
bool  allowExisting = true 

) 
Declare a new category of ordered lists of integers.
Each characteristic value inserted into this category may be a different length, but the order that they're inserted defines how they're compared when comparing two such categories from different functions. If allowExisting
is false then the specified name must not already exist or else an std::runtime_error
is thrown; if allowExisting
is true then either a new category is created or the ID of an existing category is returned. Returns a new category ID number, which are consecutive small integers starting at zero.

inline 
Number of categories.
Category ID numbers are values zero through one less than the number of categories.
Definition at line 198 of file FunctionSimilarity.h.

inline 
Find a category by name.
Returns the ID number for the category with the specified name, or NO_CATEGORY if no such name exists wihtin this analysis context.
Definition at line 204 of file FunctionSimilarity.h.
CValKind Rose::BinaryAnalysis::FunctionSimilarity::categoryKind  (  CategoryId  )  const 
Kind of category.
size_t Rose::BinaryAnalysis::FunctionSimilarity::categoryDimensionality  (  CategoryId  )  const 
Dimensionality of Cartesian characteristic points.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareCfgConnectivity  (  const std::string &  categoryName  ) 
Control flow graph connectivity.
This metric creates a point cloud with one Cartesian point for each basic block in the function. The space is four dimensional with the following dimensions:
Each coordinate is normalized to be one of the following values:
If maxPoints
is specified, then functions having more than the specified maximum number of CFG vertices will be truncated arbititrarily to limit the number of vertices.
void Rose::BinaryAnalysis::FunctionSimilarity::measureCfgConnectivity  (  CategoryId  , 
const Partitioner2::Partitioner &  ,  
const Partitioner2::Function::Ptr &  ,  
size_t  maxPoints = UNLIMITED 

) 
Control flow graph connectivity.
This metric creates a point cloud with one Cartesian point for each basic block in the function. The space is four dimensional with the following dimensions:
Each coordinate is normalized to be one of the following values:
If maxPoints
is specified, then functions having more than the specified maximum number of CFG vertices will be truncated arbititrarily to limit the number of vertices.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareCallGraphConnectivity  (  const std::string &  categoryName  ) 
Function calls.
This metric creates a point cloud with one Cartesian point for each function called from the specified function.
void Rose::BinaryAnalysis::FunctionSimilarity::measureCallGraphConnectivity  (  CategoryId  , 
const Partitioner2::Partitioner &  ,  
const Partitioner2::Function::Ptr &  
) 
Function calls.
This metric creates a point cloud with one Cartesian point for each function called from the specified function.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareMnemonicStream  (  const std::string &  categoryName  ) 
Instruction mnemonic stream.
This metric creates an ordered list of instruction mnemonics for the entire function by ordering the function's basic block by address and then, for each basic block, appending its instruction mnemonic to the list.
void Rose::BinaryAnalysis::FunctionSimilarity::measureMnemonicStream  (  CategoryId  , 
const Partitioner2::Partitioner &  ,  
const Partitioner2::Function::Ptr &  
) 
Instruction mnemonic stream.
This metric creates an ordered list of instruction mnemonics for the entire function by ordering the function's basic block by address and then, for each basic block, appending its instruction mnemonic to the list.
void Rose::BinaryAnalysis::FunctionSimilarity::insertPoint  (  const Partitioner2::Function::Ptr &  , 
CategoryId  ,  
const CartesianPoint &  
) 
Insert a Cartesian point characteristic value.
Inserts the specified Cartesian point into the collection of characteristic values for the category. The first point inserted into a category defines the dimensionality of all points in that category. An exception is thrown if the point does not match the dimensionality of the category or if the category is not one that stores points.
When comparing two functions, the order of points in the point clouds is irrelevant.
void Rose::BinaryAnalysis::FunctionSimilarity::insertList  (  const Partitioner2::Function::Ptr &  , 
CategoryId  ,  
const OrderedList &  
) 
Insert an ordered list characteristic value.
Inserts the specified ordered list of integers into the collection of characteristic values for the category. Ordered lists may have zero or more integers. The lists within a category need not all be the same length.
When comparing two functions, the order that ordered lists were inserted into the category matters since the distance between two categories is some function of the edit distances between corresponding ordered lists. If you want to insert listtype characteristic values in any order then each list needs to be in its own category.
size_t Rose::BinaryAnalysis::FunctionSimilarity::size  (  const Partitioner2::Function::Ptr &  , 
CategoryId  
)  const 
Number of characteristic points in a category.
const PointCloud& Rose::BinaryAnalysis::FunctionSimilarity::points  (  const Partitioner2::Function::Ptr &  , 
CategoryId  
)  const 
Catesian points contained in a category.
const OrderedLists& Rose::BinaryAnalysis::FunctionSimilarity::lists  (  const Partitioner2::Function::Ptr &  , 
CategoryId  
)  const 
Ordered lists contained in a category.
double Rose::BinaryAnalysis::FunctionSimilarity::compare  (  const Partitioner2::Function::Ptr &  , 
const Partitioner2::Function::Ptr &  ,  
double  dflt = NAN 

)  const 
Compare two functions.
Compare the two specified functions after having populated their characteristic values. Returns a floatingpoint distance between the functions where zero indicates that the functions are as similar as possible judging from their characteristic values.
The distance is calculated as follows:
Given the two functions, compute the pairwise distance between each of their corresponding categories and then combine these distances according to the category weights using maximum, mean, or median.
The distance between two categories containing ordered lists is computed as follows: for each pair of characteristic values (paired according to the order they were inserted into the category), compute the Levenshtein edit distance divided by the longer of the two lists. If one category has more lists than the other then pad the smaller category with empty lists. This gives a list of floatingpoint values in the closed interval zero to one, which are then combined using maximum, mean, or median.
The distance between two categories containing Cartesian points is computed as follows: If one category has fewer points than the other then temporarily pad the smaller category with points at the origin. Compute the minimum cost 1:1 mapping from points in one category to those in the other and use the total cost as the distance between the categories.
If either function has not been analyzed, or has been analyzed but resulted in no data, then return the dflt
value.
std::vector<FunctionDistancePair> Rose::BinaryAnalysis::FunctionSimilarity::compareOneToAll  (  const Partitioner2::Function::Ptr &  )  const 
Compare one function with all others.
Compare the given function with all other functions and return a list of function+cost pairs. The returned list is unsorted.

inline 
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multithreading. It honors the global thread count usually specified with the –threads=N
switch.
Definition at line 349 of file FunctionSimilarity.h.
Referenced by compareOneToMany().

inline 
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multithreading. It honors the global thread count usually specified with the –threads=N
switch.
Definition at line 359 of file FunctionSimilarity.h.
References compareOneToMany().
std::vector<FunctionDistancePair> Rose::BinaryAnalysis::FunctionSimilarity::compareOneToMany  (  const Partitioner2::Function::Ptr &  needle, 
const std::vector< Partitioner2::Function::Ptr > &  haystack  
)  const 
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multithreading. It honors the global thread count usually specified with the –threads=N
switch.
std::vector<std::vector<double> > Rose::BinaryAnalysis::FunctionSimilarity::compareManyToMany  (  const std::vector< Partitioner2::Function::Ptr > &  , 
const std::vector< Partitioner2::Function::Ptr > &  
)  const 
Compare many functions to many others.
Given two ordered lists of functions, calculate the distances from all functions of the first list to all functions of the second list. The return value is a rectangular matrix whose rows are indexed by the functions of the first list and whose columns are indexed by the functions of the second list. See also, compareManyToManyMatrix, which returns a square matrix of function distances.
This analysis operates in parallel using multithreading. It honors the global thread count usually specified with the –threads=N
switch.
DistanceMatrix Rose::BinaryAnalysis::FunctionSimilarity::compareManyToManyMatrix  (  std::vector< Partitioner2::Function::Ptr >  , 
std::vector< Partitioner2::Function::Ptr >  
)  const 
Compare many functions to many others.
Given two ordered lists of functions, temporarily pad the shorter list with null functions to make both lists equal in length. Then calculate the distances from all functions of the first list to all functions of the second list, returning a square distance matrix. See also, compareManyToMany, which may be much faster if the two function lists have wildly different sizes.
This analysis operates in parallel using multithreading. It honors the global thread count usually specified with the –threads=N
switch.
std::vector<FunctionPair> Rose::BinaryAnalysis::FunctionSimilarity::findMinimumCostMapping  (  const std::vector< Partitioner2::Function::Ptr > &  list1, 
const std::vector< Partitioner2::Function::Ptr > &  list2  
)  const 
Minimum cost 1:1 mapping.
Compute the minimum cost 1:1 mapping of functions in the first list to those in the second. The algorithm first calls compareManyToManyMatrix to obtain a square matrix of all functions compared with all other functions (null functions are added as necessary to make the result square). It then calls findMinimumAssignment to find a 1:1 mapping between the two (padded) lists of functions. The return value represents the 1:1 mapping.
Since findMinimumAssignment only works if ROSE is configured with dlib support, this function throws an Exception if that support is missing.
std::vector<double> Rose::BinaryAnalysis::FunctionSimilarity::computeDistances  (  const std::vector< Partitioner2::Function::Ptr > &  list1, 
const std::vector< Partitioner2::Function::Ptr > &  list2,  
size_t  nThreads  
)  const 
Compute distances between sets of functions.
This is a lowlevel function to compute the distance between all pairs of functions from list1 and list2 in parallel. The return value contains the distances so that the distance between the function list1
[i] and list2
[j] is at index i * list2.size() + j
in the return value.

inlinestatic 
Predicate for sorting by ascending distance.
Definition at line 419 of file FunctionSimilarity.h.

inlinestatic 
Predicate for sorting by descending distance.
Definition at line 424 of file FunctionSimilarity.h.

inlinestatic 
Predicate for sorting by function address.
Definition at line 429 of file FunctionSimilarity.h.

static 
Initializes and registers disassembler diagnostic streams.
void Rose::BinaryAnalysis::FunctionSimilarity::printCharacteristicValues  (  std::ostream &  )  const 
Print characteristic values for this analysis.
This is a multiline output intended for debugging.

static 
Cartesian distance between two points.

static 
Find minimum mapping from rows to columns.
Finds a 1:1 mapping from rows to columns of the specified square matrix such that the total cost is minimized. Returns a vector V such that V[i] = j maps rows i to columns j.
This function will only work if ROSE has been compiled with dlib support. Otherwise it throws an Exception.

static 
Total cost of a mapping.
Given a square matrix and a 1:1 mapping from rows to columns, return the total cost of the mapping. The assignment
is like the value returned by findMinimumAssignment.

static 
Maximum value in the distance matrix.

static 
Average distance in the matrix.

static 
Median distance in the matrix.

static 
Invalid category ID.
Definition at line 88 of file FunctionSimilarity.h.