ROSE 0.11.145.192
|
Analysis to test the similarity of two functions.
This analysis compares two functions and returns a floating point value that represents how different they are from one another. The scale of the returned value depends on the metrics that are used, but generally a difference of zero represents two functions that are equivalent according to the metric, and values larger than zero indicate the presence of differences.
This FunctionSimilarity class provides the following things:
A characteristic value is data that measures some characteristic of a function. This analysis supports two types of characteristic values: floating-point Cartesian points and ordered lists of integers. Floating-point scalar values are a special case of Cartesian points.
A metric is a functor that takes input related to a disassembled function and produces characteristic values. The characteristic data can be either zero or more floating-point Cartesian points (all having the same dimensionality) or an ordered list of zero or more integers. Examples are: (1) a metric that returns a four-dimensional point describing the number of immediate and second-level predecessors and successors for each vertex of the function control flow graph; (2) a metric that returns an ordered list describing local variables detected by a data-flow analysis; (3) a metric that returns a histogram (multi-dimensional point) of the classes of instructions found in the function.
A metric category (or just category) is a collection of characteristic values from compatible metrics (usually just one metric).
Use this analysis by performing these steps. Steps 2 and 3 can be interleaved.
FunctionSimilarity
analysis objectDefinition at line 60 of file FunctionSimilarity.h.
#include <Rose/BinaryAnalysis/FunctionSimilarity.h>
Classes | |
class | Exception |
Public Types | |
enum | CValKind { CARTESIAN_POINT , ORDERED_LIST } |
Kinds of characteristic values. More... | |
enum | Statistic { AVERAGE , MEDIAN , MAXIMUM } |
Ways that values can be combined. More... | |
typedef std::vector< double > | CartesianPoint |
Characteristic value that's a Cartesian point. | |
typedef std::vector< int > | OrderedList |
Characteristic value that's an ordered list of integers. | |
typedef std::vector< CartesianPoint > | PointCloud |
Unordered collection of Cartesian points. | |
typedef std::vector< OrderedList > | OrderedLists |
Ordered collection of ordered lists of integers. | |
typedef size_t | CategoryId |
ID number unique within this analysis context. | |
typedef std::pair< Partitioner2::FunctionPtr, double > | FunctionDistancePair |
Function and distance to some other function. | |
typedef std::pair< Partitioner2::FunctionPtr, Partitioner2::FunctionPtr > | FunctionPair |
Pair of functions. | |
typedef Matrix< double > | DistanceMatrix |
Square matrix representing distances. | |
Public Member Functions | |
void | clear () |
Rose::Progress::Ptr | progress () const |
Property: Object to which progress reports are made. | |
CategoryId | declarePointCategory (const std::string &name, size_t dimensionality, bool allowExisting=true) |
Declare a new category of Cartesian points. | |
CategoryId | declareListCategory (const std::string &name, bool allowExisting=true) |
Declare a new category of ordered lists of integers. | |
size_t | nCategories () const |
Number of categories. | |
CategoryId | findCategory (const std::string &name) const |
Find a category by name. | |
CValKind | categoryKind (CategoryId) const |
Kind of category. | |
size_t | categoryDimensionality (CategoryId) const |
Dimensionality of Cartesian characteristic points. | |
void | insertPoint (const Partitioner2::FunctionPtr &, CategoryId, const CartesianPoint &) |
Insert a Cartesian point characteristic value. | |
void | insertList (const Partitioner2::FunctionPtr &, CategoryId, const OrderedList &) |
Insert an ordered list characteristic value. | |
size_t | size (const Partitioner2::FunctionPtr &, CategoryId) const |
Number of characteristic points in a category. | |
const PointCloud & | points (const Partitioner2::FunctionPtr &, CategoryId) const |
Catesian points contained in a category. | |
const OrderedLists & | lists (const Partitioner2::FunctionPtr &, CategoryId) const |
Ordered lists contained in a category. | |
double | compare (const Partitioner2::FunctionPtr &, const Partitioner2::FunctionPtr &, double dflt=NAN) const |
Compare two functions. | |
std::vector< FunctionDistancePair > | compareOneToAll (const Partitioner2::FunctionPtr &) const |
Compare one function with all others. | |
std::vector< std::vector< double > > | compareManyToMany (const std::vector< Partitioner2::FunctionPtr > &, const std::vector< Partitioner2::FunctionPtr > &) const |
Compare many functions to many others. | |
DistanceMatrix | compareManyToManyMatrix (std::vector< Partitioner2::FunctionPtr >, std::vector< Partitioner2::FunctionPtr >) const |
Compare many functions to many others. | |
std::vector< FunctionPair > | findMinimumCostMapping (const std::vector< Partitioner2::FunctionPtr > &list1, const std::vector< Partitioner2::FunctionPtr > &list2) const |
Minimum cost 1:1 mapping. | |
std::vector< double > | computeDistances (const std::vector< Partitioner2::FunctionPtr > &list1, const std::vector< Partitioner2::FunctionPtr > &list2, size_t nThreads) const |
Compute distances between sets of functions. | |
void | printCharacteristicValues (std::ostream &) const |
Print characteristic values for this analysis. | |
Statistic | categoryAccumulatorType () const |
Property: Statistic for combining category distances. | |
void | categoryAccumulatorType (Statistic s) |
Property: Statistic for combining category distances. | |
double | categoryWeight (CategoryId) const |
Property: category weight. | |
void | categoryWeight (CategoryId, double) |
Property: category weight. | |
CategoryId | declareCfgConnectivity (const std::string &categoryName) |
Control flow graph connectivity. | |
void | measureCfgConnectivity (CategoryId, const Partitioner2::PartitionerConstPtr &, const Partitioner2::FunctionPtr &, size_t maxPoints=UNLIMITED) |
Control flow graph connectivity. | |
CategoryId | declareCallGraphConnectivity (const std::string &categoryName) |
Function calls. | |
void | measureCallGraphConnectivity (CategoryId, const Partitioner2::PartitionerConstPtr &, const Partitioner2::FunctionPtr &) |
Function calls. | |
CategoryId | declareMnemonicStream (const std::string &categoryName) |
Instruction mnemonic stream. | |
void | measureMnemonicStream (CategoryId, const Partitioner2::PartitionerConstPtr &, const Partitioner2::FunctionPtr &) |
Instruction mnemonic stream. | |
template<class FunctionIterator > | |
std::vector< FunctionDistancePair > | compareOneToMany (const Partitioner2::FunctionPtr &needle, const boost::iterator_range< FunctionIterator > &haystack) const |
Compare one function with many others. | |
template<class FunctionIterator > | |
std::vector< FunctionDistancePair > | compareOneToMany (const Partitioner2::FunctionPtr &needle, const FunctionIterator &begin, const FunctionIterator &end) const |
Compare one function with many others. | |
std::vector< FunctionDistancePair > | compareOneToMany (const Partitioner2::FunctionPtr &needle, const std::vector< Partitioner2::FunctionPtr > &haystack) const |
Compare one function with many others. | |
Static Public Member Functions | |
static bool | sortByIncreasingDistance (const FunctionDistancePair &a, const FunctionDistancePair &b) |
Predicate for sorting by ascending distance. | |
static bool | sortByDecreasingDistance (const FunctionDistancePair &a, const FunctionDistancePair &b) |
Predicate for sorting by descending distance. | |
static bool | sortByAddress (const FunctionDistancePair &a, const FunctionDistancePair &b) |
Predicate for sorting by function address. | |
static void | initDiagnostics () |
Initializes and registers disassembler diagnostic streams. | |
static double | cartesianDistance (const FunctionSimilarity::CartesianPoint &, const FunctionSimilarity::CartesianPoint &) |
Cartesian distance between two points. | |
static std::vector< size_t > | findMinimumAssignment (const DistanceMatrix &) |
Find minimum mapping from rows to columns. | |
static double | totalAssignmentCost (const DistanceMatrix &, const std::vector< size_t > &assignment) |
Total cost of a mapping. | |
static double | maximumDistance (const DistanceMatrix &) |
Maximum value in the distance matrix. | |
static double | averageDistance (const DistanceMatrix &) |
Average distance in the matrix. | |
static double | medianDistance (const DistanceMatrix &) |
Median distance in the matrix. | |
Static Public Attributes | |
static const CategoryId | NO_CATEGORY = -1 |
Invalid category ID. | |
static Sawyer::Message::Facility | mlog |
Diagnostic streams. | |
typedef std::vector<double> Rose::BinaryAnalysis::FunctionSimilarity::CartesianPoint |
Characteristic value that's a Cartesian point.
Definition at line 77 of file FunctionSimilarity.h.
typedef std::vector<int> Rose::BinaryAnalysis::FunctionSimilarity::OrderedList |
Characteristic value that's an ordered list of integers.
Definition at line 78 of file FunctionSimilarity.h.
typedef std::vector<CartesianPoint> Rose::BinaryAnalysis::FunctionSimilarity::PointCloud |
Unordered collection of Cartesian points.
Definition at line 81 of file FunctionSimilarity.h.
typedef std::vector<OrderedList> Rose::BinaryAnalysis::FunctionSimilarity::OrderedLists |
Ordered collection of ordered lists of integers.
Definition at line 82 of file FunctionSimilarity.h.
typedef size_t Rose::BinaryAnalysis::FunctionSimilarity::CategoryId |
ID number unique within this analysis context.
Definition at line 87 of file FunctionSimilarity.h.
typedef std::pair<Partitioner2::FunctionPtr, double > Rose::BinaryAnalysis::FunctionSimilarity::FunctionDistancePair |
Function and distance to some other function.
Definition at line 91 of file FunctionSimilarity.h.
typedef std::pair<Partitioner2::FunctionPtr, Partitioner2::FunctionPtr> Rose::BinaryAnalysis::FunctionSimilarity::FunctionPair |
Pair of functions.
Definition at line 94 of file FunctionSimilarity.h.
typedef Matrix<double> Rose::BinaryAnalysis::FunctionSimilarity::DistanceMatrix |
Square matrix representing distances.
Definition at line 100 of file FunctionSimilarity.h.
Kinds of characteristic values.
Enumerator | |
---|---|
CARTESIAN_POINT | Values are N-dimensional Cartesian points. |
ORDERED_LIST | Values are lists of integers. |
Definition at line 73 of file FunctionSimilarity.h.
Ways that values can be combined.
Definition at line 85 of file FunctionSimilarity.h.
|
inline |
Definition at line 146 of file FunctionSimilarity.h.
|
inline |
Definition at line 149 of file FunctionSimilarity.h.
|
inline |
Property: Statistic for combining category distances.
When computing the distance between two functions, we first compute the distance between the categories by comparing the characteristic values in those categories. The category distance are then combined using the specified statistic to obtain the distance between the functions.
Definition at line 167 of file FunctionSimilarity.h.
|
inline |
Property: Statistic for combining category distances.
When computing the distance between two functions, we first compute the distance between the categories by comparing the characteristic values in those categories. The category distance are then combined using the specified statistic to obtain the distance between the functions.
Definition at line 168 of file FunctionSimilarity.h.
|
inline |
Property: Object to which progress reports are made.
Definition at line 172 of file FunctionSimilarity.h.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declarePointCategory | ( | const std::string & | name, |
size_t | dimensionality, | ||
bool | allowExisting = true |
||
) |
Declare a new category of Cartesian points.
All characteristics values of this category have the same specified dimensionality. The order that points are inserted into this category is irrelevant. If allowExisting
is false then the specified name must not already exist or else an std::runtime_error
is thrown; if allowExisting
is true then either a new category is created or the ID of an existing category is returned. Returns a new category ID number, which are consecutive small integers starting at zero.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareListCategory | ( | const std::string & | name, |
bool | allowExisting = true |
||
) |
Declare a new category of ordered lists of integers.
Each characteristic value inserted into this category may be a different length, but the order that they're inserted defines how they're compared when comparing two such categories from different functions. If allowExisting
is false then the specified name must not already exist or else an std::runtime_error
is thrown; if allowExisting
is true then either a new category is created or the ID of an existing category is returned. Returns a new category ID number, which are consecutive small integers starting at zero.
|
inline |
Number of categories.
Category ID numbers are values zero through one less than the number of categories.
Definition at line 198 of file FunctionSimilarity.h.
|
inline |
Find a category by name.
Returns the ID number for the category with the specified name, or NO_CATEGORY if no such name exists wihtin this analysis context.
Definition at line 204 of file FunctionSimilarity.h.
References NO_CATEGORY.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareCfgConnectivity | ( | const std::string & | categoryName | ) |
Control flow graph connectivity.
This metric creates a point cloud with one Cartesian point for each basic block in the function. The space is four dimensional with the following dimensions:
Each coordinate is normalized to be one of the following values:
If maxPoints
is specified, then functions having more than the specified maximum number of CFG vertices will be truncated arbititrarily to limit the number of vertices.
void Rose::BinaryAnalysis::FunctionSimilarity::measureCfgConnectivity | ( | CategoryId | , |
const Partitioner2::PartitionerConstPtr & | , | ||
const Partitioner2::FunctionPtr & | , | ||
size_t | maxPoints = UNLIMITED |
||
) |
Control flow graph connectivity.
This metric creates a point cloud with one Cartesian point for each basic block in the function. The space is four dimensional with the following dimensions:
Each coordinate is normalized to be one of the following values:
If maxPoints
is specified, then functions having more than the specified maximum number of CFG vertices will be truncated arbititrarily to limit the number of vertices.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareCallGraphConnectivity | ( | const std::string & | categoryName | ) |
Function calls.
This metric creates a point cloud with one Cartesian point for each function called from the specified function.
void Rose::BinaryAnalysis::FunctionSimilarity::measureCallGraphConnectivity | ( | CategoryId | , |
const Partitioner2::PartitionerConstPtr & | , | ||
const Partitioner2::FunctionPtr & | |||
) |
Function calls.
This metric creates a point cloud with one Cartesian point for each function called from the specified function.
CategoryId Rose::BinaryAnalysis::FunctionSimilarity::declareMnemonicStream | ( | const std::string & | categoryName | ) |
Instruction mnemonic stream.
This metric creates an ordered list of instruction mnemonics for the entire function by ordering the function's basic block by address and then, for each basic block, appending its instruction mnemonic to the list.
void Rose::BinaryAnalysis::FunctionSimilarity::measureMnemonicStream | ( | CategoryId | , |
const Partitioner2::PartitionerConstPtr & | , | ||
const Partitioner2::FunctionPtr & | |||
) |
Instruction mnemonic stream.
This metric creates an ordered list of instruction mnemonics for the entire function by ordering the function's basic block by address and then, for each basic block, appending its instruction mnemonic to the list.
void Rose::BinaryAnalysis::FunctionSimilarity::insertPoint | ( | const Partitioner2::FunctionPtr & | , |
CategoryId | , | ||
const CartesianPoint & | |||
) |
Insert a Cartesian point characteristic value.
Inserts the specified Cartesian point into the collection of characteristic values for the category. The first point inserted into a category defines the dimensionality of all points in that category. An exception is thrown if the point does not match the dimensionality of the category or if the category is not one that stores points.
When comparing two functions, the order of points in the point clouds is irrelevant.
void Rose::BinaryAnalysis::FunctionSimilarity::insertList | ( | const Partitioner2::FunctionPtr & | , |
CategoryId | , | ||
const OrderedList & | |||
) |
Insert an ordered list characteristic value.
Inserts the specified ordered list of integers into the collection of characteristic values for the category. Ordered lists may have zero or more integers. The lists within a category need not all be the same length.
When comparing two functions, the order that ordered lists were inserted into the category matters since the distance between two categories is some function of the edit distances between corresponding ordered lists. If you want to insert list-type characteristic values in any order then each list needs to be in its own category.
double Rose::BinaryAnalysis::FunctionSimilarity::compare | ( | const Partitioner2::FunctionPtr & | , |
const Partitioner2::FunctionPtr & | , | ||
double | dflt = NAN |
||
) | const |
Compare two functions.
Compare the two specified functions after having populated their characteristic values. Returns a floating-point distance between the functions where zero indicates that the functions are as similar as possible judging from their characteristic values.
The distance is calculated as follows:
Given the two functions, compute the pair-wise distance between each of their corresponding categories and then combine these distances according to the category weights using maximum, mean, or median.
The distance between two categories containing ordered lists is computed as follows: for each pair of characteristic values (paired according to the order they were inserted into the category), compute the Levenshtein edit distance divided by the longer of the two lists. If one category has more lists than the other then pad the smaller category with empty lists. This gives a list of floating-point values in the closed interval zero to one, which are then combined using maximum, mean, or median.
The distance between two categories containing Cartesian points is computed as follows: If one category has fewer points than the other then temporarily pad the smaller category with points at the origin. Compute the minimum cost 1:1 mapping from points in one category to those in the other and use the total cost as the distance between the categories.
If either function has not been analyzed, or has been analyzed but resulted in no data, then return the dflt
value.
std::vector< FunctionDistancePair > Rose::BinaryAnalysis::FunctionSimilarity::compareOneToAll | ( | const Partitioner2::FunctionPtr & | ) | const |
Compare one function with all others.
Compare the given function with all other functions and return a list of function+cost pairs. The returned list is unsorted.
|
inline |
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multi-threading. It honors the global thread count usually specified with the –threads=N
switch.
Definition at line 349 of file FunctionSimilarity.h.
References compareOneToMany().
Referenced by compareOneToMany(), and compareOneToMany().
|
inline |
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multi-threading. It honors the global thread count usually specified with the –threads=N
switch.
Definition at line 359 of file FunctionSimilarity.h.
References compareOneToMany().
std::vector< FunctionDistancePair > Rose::BinaryAnalysis::FunctionSimilarity::compareOneToMany | ( | const Partitioner2::FunctionPtr & | needle, |
const std::vector< Partitioner2::FunctionPtr > & | haystack | ||
) | const |
Compare one function with many others.
Compare the given needle
function with all other haystack
functions. The returned list is unsorted.
This analysis operates in parallel using multi-threading. It honors the global thread count usually specified with the –threads=N
switch.
std::vector< std::vector< double > > Rose::BinaryAnalysis::FunctionSimilarity::compareManyToMany | ( | const std::vector< Partitioner2::FunctionPtr > & | , |
const std::vector< Partitioner2::FunctionPtr > & | |||
) | const |
Compare many functions to many others.
Given two ordered lists of functions, calculate the distances from all functions of the first list to all functions of the second list. The return value is a rectangular matrix whose rows are indexed by the functions of the first list and whose columns are indexed by the functions of the second list. See also, compareManyToManyMatrix, which returns a square matrix of function distances.
This analysis operates in parallel using multi-threading. It honors the global thread count usually specified with the –threads=N
switch.
DistanceMatrix Rose::BinaryAnalysis::FunctionSimilarity::compareManyToManyMatrix | ( | std::vector< Partitioner2::FunctionPtr > | , |
std::vector< Partitioner2::FunctionPtr > | |||
) | const |
Compare many functions to many others.
Given two ordered lists of functions, temporarily pad the shorter list with null functions to make both lists equal in length. Then calculate the distances from all functions of the first list to all functions of the second list, returning a square distance matrix. See also, compareManyToMany, which may be much faster if the two function lists have wildly different sizes.
This analysis operates in parallel using multi-threading. It honors the global thread count usually specified with the –threads=N
switch.
std::vector< FunctionPair > Rose::BinaryAnalysis::FunctionSimilarity::findMinimumCostMapping | ( | const std::vector< Partitioner2::FunctionPtr > & | list1, |
const std::vector< Partitioner2::FunctionPtr > & | list2 | ||
) | const |
Minimum cost 1:1 mapping.
Compute the minimum cost 1:1 mapping of functions in the first list to those in the second. The algorithm first calls compareManyToManyMatrix to obtain a square matrix of all functions compared with all other functions (null functions are added as necessary to make the result square). It then calls findMinimumAssignment to find a 1:1 mapping between the two (padded) lists of functions. The return value represents the 1:1 mapping.
Since findMinimumAssignment only works if ROSE is configured with dlib support, this function throws an Exception if that support is missing.
std::vector< double > Rose::BinaryAnalysis::FunctionSimilarity::computeDistances | ( | const std::vector< Partitioner2::FunctionPtr > & | list1, |
const std::vector< Partitioner2::FunctionPtr > & | list2, | ||
size_t | nThreads | ||
) | const |
Compute distances between sets of functions.
This is a low-level function to compute the distance between all pairs of functions from list1 and list2 in parallel. The return value contains the distances so that the distance between the function list1
[i] and list2
[j] is at index i * list2.size() + j
in the return value.
|
inlinestatic |
Predicate for sorting by ascending distance.
Definition at line 419 of file FunctionSimilarity.h.
|
inlinestatic |
Predicate for sorting by descending distance.
Definition at line 424 of file FunctionSimilarity.h.
|
inlinestatic |
Predicate for sorting by function address.
Definition at line 429 of file FunctionSimilarity.h.
|
static |
Initializes and registers disassembler diagnostic streams.
void Rose::BinaryAnalysis::FunctionSimilarity::printCharacteristicValues | ( | std::ostream & | ) | const |
Print characteristic values for this analysis.
This is a multi-line output intended for debugging.
|
static |
Find minimum mapping from rows to columns.
Finds a 1:1 mapping from rows to columns of the specified square matrix such that the total cost is minimized. Returns a vector V such that V[i] = j maps rows i to columns j.
This function will only work if ROSE has been compiled with dlib support. Otherwise it throws an Exception.
|
static |
Total cost of a mapping.
Given a square matrix and a 1:1 mapping from rows to columns, return the total cost of the mapping. The assignment
is like the value returned by findMinimumAssignment.
|
static |
Invalid category ID.
Definition at line 88 of file FunctionSimilarity.h.
Referenced by findCategory().
|
static |
Diagnostic streams.
Definition at line 97 of file FunctionSimilarity.h.