ROSE 0.11.145.147

How to write good API and non-API documentation in ROSE.

This chapter is mainly for developers working on the ROSE library as opposed to users developing software that uses the library. It specifies how we would like to have the ROSE library source code documented. The style enumerated here does not necessarily need to be used for projects, tests, the tutorial, user-code, etc. Each item is also presented along with our motivation for doing it this way.

ROSE uses Doxygen for two broad categories of documentation:

Quick start

Here's an example that documents a couple of closely-related class member functions. Things to note:

| 1|  /** Most basic use of the partitioner.
| 2|   *
| 3|   *  This method does everything from parsing the command-line to generating an abstract syntax tree. If all is
| 4|   *  successful, then an abstract syntax tree is returned. The return value is a @ref SgAsmBlock node that contains all
| 5|   *  the detected functions. If the specimen consisted of an ELF or PE container then the parent nodes of the returned
| 6|   *  AST will lead eventually to an @ref SgProject node.
| 7|   *
| 8|   *  The command-line can be provided as a typical @c argc and @c argv pair, or as a vector of arguments. In the
| 9|   *  latter case, the vector should not include <code>argv[0]</code> or <code>argv[argc]</code> (which is always a
|10|   *  null pointer).
|11|   *
|12|   *  The command-line supoprts a "--help" or ("-h") switch to describe all other switches and arguments, essentially
|13|   *  generating output much like a Unix man(1) page.
|14|   *
|15|   *  The @p purpose should be a single line string that will be shown in the title of the man page and should not start
|16|   *  with an upper-case letter, a hyphen, white space, or the name of the command. E.g., a disassembler tool might
|17|   *  specify the purpose as "disassembles a binary specimen".
|18|   *
|19|   *  The @p description is a full, multi-line description written in the [Sawyer](https://github.com/matzke1/sawyer)
|20|   *  markup language where "@" characters have special meaning..
|21|   *
|22|   *  @{ */
|23|  SgAsmBlock* frontend(int argc, char *argv[],
|24|                       const std::string &purpose, const std::string &description) /*final*/;
|25|  virtual SgAsmBlock* frontend(const std::vector<std::string> &args,
|26|                               const std::string &purpose, const std::string &description);
|27|  /** @} */

General Doxygen style

Both categories of documentation (API and non-API) are written as comments in C source code and follow the same style conventions.

Doxygen documentation for non-API entities

As mentioned, one of ROSE's uses of Doxygen is for documentation not related to any specific API element (such as this page itself). This section intends to show how to document such things.

Pages or modules? Non-API documentation is generally organized into Doxygen "Related pages" and/or "Modules", with the main differences between them being that pages are relatively large non-hierarchical chapter-like things, while modules are are smaller (usually) and hierarchical. The distinction is blurry though because both support sections and subsections. Use this table to help decide:

Use "Related pages" Use "Modules"
Subject is important enough to be a chapter in a book? Subject would be an appendix in a book?
Subject should be listed in the top-level table of contents? Subject should be listed in some broader subject's page?
User would read the entire subject linearly? User would jump around in the subject area?
Subject has two levels of nesting? Subject has arbitrarily deep hierarchy?
Subject's sections should appear together in a single HTML page? Subject's sections should each be on their own HTML page?

Pages are created with Doxygen's @page directive, which takes a unique global identifier and a title. The first sentence is the auto-brief content (regardless of whether @brief is present) that will show up in the "Related pages" list. The auto-brief sentence should fit on one line, end with a period, and should not be identical to the title; it should restate the title in different words or else the table of contents looks awkward:

* @page binary_tutorial Getting started with binary analysis
* @brief Overview showing how to write binary analysis tools.

Modules, on the other hand, are created with Doxygen's @defgroup directives and the hierarchy is formed by declaring one module to be in another with @ingroup. The group is defined with a unique global identifier followed by a title. The @ingroup takes the global identifier of some other @defgroup. The first sentence is the auto-brief content regardless of whether the @brief is used:

* @defgroup installation_dependencies_boost How to install Boost
* @ingroup installation_dependencies
* @brief Instructions for installing Boost, a ROSE software dependency.

Location of documentation source? Regardless of whether one chooses to write a page or a module, the documentation needs to be placed in a C++ source file. These files should have the extension ".dox" (".docs" is acceptable too, but avoid ".doc" and ".docx") and the documentation should be written as a block comment. IDEs can be told that these files are actually C++ code, so you'll get whatever fancy comment-handling features your IDE normally provides. For example, Emacs excels at formatting C++ block comments and can reflow paragraphs, add the vertical line of stars, spell check, highlight Doxygen tags, etc.

These ".dox" files can live anywhere in the ROSE source tree, but we prefer that they're somewhere under the top-level "docs" directory along with all the non-Doxygen documentation. Once you've added the new file, you should edit "docs/Rose/rose.cfg.in", find the INPUT variable, and add your new file to the list. For Doxygen "pages", the position in the list determines the order of that page relative to other pages. Doxygen might still find your file if you fail to list it in the INPUT variable, but it will be sorted more or less alphabetically.

Doxygen documentation for API entities

The original purpose of Doxygen is to document the files, name spaces, classes, functions, and other types that compose an API. Doxygen automatically generates the document structure from C++ declarations and the API author fills in those things that cannot be done automatically, which is the majority of the text. The bullets below reference this declaration:

public: std::vector<std::string> splitString(const std::string &inputString, const std::string &separator);

Doxygen documentation for AST Nodes

AST Nodes (e.g. SgNode) are a special case because they are generated by ROSETTA, a code generator. So we can't document them inside the code. Instead we have a seperate file for each documented AST node. (A lot of files are missing, because a lot of AST nodes are missing documentation.)

These files are in $ROSE_SRC/docs/testDoxygen, and they are of the format NodeType.docs. e.g. SgConstructorInitializer.docs

The format of the files follows normal doxygen formatting, except that each comment item must specify the item being commented. e.g. start the comment of a class with "\class SgConstructorInitializer"

doxygen searches for these file automatically, so new files can be added without concern, although they should match the name of the AST Node class being commented.

Doxygen directives

Doxygen understands a subset of HTML, its own Javadoc-like directives, and Markdown. The most useful are:

Build Doxygen Docs

To generate ROSE's Doxygen documentation locally, run:

Please note that you can ignore warning messages about things not documented. When it is done, you can open it by typing "firefox $ROSE_BUILD/docs/Rose/ROSE_WebPages/ROSE_HTML_Reference/index.html"

Next steps

Doxygen, as one would expect from a documentation generator, is well documented at its website. There are also a number of quick references available.

Collaboration diagram for Writing documentation: