ROSE 0.11.145.192
Public Types | Public Member Functions | List of all members
Sawyer::Lexer::TokenStream< T > Class Template Referenceabstract

Description

template<class T>
class Sawyer::Lexer::TokenStream< T >

An ordered list of tokens scanned from input.

A token stream is an ordered list of tokens scanned from an unchanging input stream and consumed in the order they're produced.

Definition at line 91 of file Lexer.h.

#include <Sawyer/Lexer.h>

Inheritance diagram for Sawyer::Lexer::TokenStream< T >:
Inheritance graph
[legend]

Public Types

typedef T Token
 

Public Member Functions

 TokenStream (const boost::filesystem::path &fileName)
 Create a token stream from the contents of a file.
 
 TokenStream (const std::string &inputString)
 Create a token stream from a string.
 
 TokenStream (const Container::Buffer< size_t, char >::Ptr &buffer)
 Create a token stream from a buffer.
 
const std::string & name () const
 Property: Name of stream.
 
const Token & current ()
 Return the current token.
 
bool atEof ()
 Returns true if the stream is at the end.
 
const Token & operator[] (size_t lookahead)
 Return the current or future token.
 
void consume (size_t n=1)
 Consume some tokens.
 
std::pair< size_t, size_t > location (size_t position)
 Return the line number and offset for an input position.
 
std::pair< size_t, size_t > locationEof ()
 Returns the last line index and character offset.
 
std::string lineString (size_t lineIdx)
 Return the entire string for some line index.
 
virtual Token scanNextToken (const Container::LineVector &content, size_t &at)=0
 Function that obtains the next token.
 
std::string lexeme (const Token &t)
 Return the lexeme for a token.
 
std::string lexeme ()
 Return the lexeme for a token.
 
bool isa (const Token &t, typename Token::TokenEnum type)
 Determine whether token is a specific type.
 
bool isa (typename Token::TokenEnum type)
 Determine whether token is a specific type.
 
bool match (const Token &t, const char *s)
 Determine whether a token matches a string.
 
bool match (const char *s)
 Determine whether a token matches a string.
 

Member Typedef Documentation

◆ Token

template<class T >
typedef T Sawyer::Lexer::TokenStream< T >::Token

Definition at line 93 of file Lexer.h.

Constructor & Destructor Documentation

◆ ~TokenStream()

template<class T >
virtual Sawyer::Lexer::TokenStream< T >::~TokenStream ( )
inlinevirtual

Definition at line 102 of file Lexer.h.

◆ TokenStream() [1/3]

template<class T >
Sawyer::Lexer::TokenStream< T >::TokenStream ( const boost::filesystem::path &  fileName)
inlineexplicit

Create a token stream from the contents of a file.

Definition at line 105 of file Lexer.h.

◆ TokenStream() [2/3]

template<class T >
Sawyer::Lexer::TokenStream< T >::TokenStream ( const std::string &  inputString)
inlineexplicit

Create a token stream from a string.

The string content is copied into the lexer and thus can be modified after the lexer returns without affecting the token stream.

Definition at line 112 of file Lexer.h.

◆ TokenStream() [3/3]

template<class T >
Sawyer::Lexer::TokenStream< T >::TokenStream ( const Container::Buffer< size_t, char >::Ptr &  buffer)
inlineexplicit

Create a token stream from a buffer.

The token stream uses the specified buffer, which should not be modified while the token stream is alive.

Definition at line 118 of file Lexer.h.

Member Function Documentation

◆ name()

template<class T >
const std::string & Sawyer::Lexer::TokenStream< T >::name ( ) const
inline

Property: Name of stream.

Definition at line 122 of file Lexer.h.

◆ current()

template<class T >
const Token & Sawyer::Lexer::TokenStream< T >::current ( )
inline

Return the current token.

The current token will be an EOF token when all tokens are consumed.

Definition at line 129 of file Lexer.h.

◆ atEof()

template<class T >
bool Sawyer::Lexer::TokenStream< T >::atEof ( )
inline

Returns true if the stream is at the end.

This is equivalent to obtaining the current toking and checking whether it's the EOF token.

Definition at line 136 of file Lexer.h.

◆ operator[]()

template<class T >
const Token & Sawyer::Lexer::TokenStream< T >::operator[] ( size_t  lookahead)
inline

Return the current or future token.

The array operator obtains a token from a virtual array whose first element is the current token, second element is one past the current token, etc. The array is infinite in length, padded with EOF tokens.

Definition at line 144 of file Lexer.h.

◆ consume()

template<class T >
void Sawyer::Lexer::TokenStream< T >::consume ( size_t  n = 1)
inline

Consume some tokens.

Consumes tokens by shifting n tokens off the low-end of the virtual array of tokens. It is permissible to consume EOF tokens since more will be generated once the end-of-input is reached.

Definition at line 158 of file Lexer.h.

◆ lexeme() [1/2]

template<class T >
std::string Sawyer::Lexer::TokenStream< T >::lexeme ( const Token &  t)
inline

Return the lexeme for a token.

Consults the input stream to obtain the lexeme for the specified token and converts that part of the stream to a string which is returned. The lexeme for an EOF token is an empty string, although other tokens might also have empty lexemes. One may query the lexeme for any token regardless of whether it's been consumed; in fact, one can even query lexemes for tokens that have never even been seen by the token stream.

The no-argument version returns the lexeme of the current token.

If you're trying to build a fast lexical analyzer, don't call this function to compare a lexeme against some known string. Instead, use match, which doesn't require copying.

Definition at line 182 of file Lexer.h.

References Sawyer::Container::LineVector::characters().

◆ lexeme() [2/2]

template<class T >
std::string Sawyer::Lexer::TokenStream< T >::lexeme ( )
inline

Return the lexeme for a token.

Consults the input stream to obtain the lexeme for the specified token and converts that part of the stream to a string which is returned. The lexeme for an EOF token is an empty string, although other tokens might also have empty lexemes. One may query the lexeme for any token regardless of whether it's been consumed; in fact, one can even query lexemes for tokens that have never even been seen by the token stream.

The no-argument version returns the lexeme of the current token.

If you're trying to build a fast lexical analyzer, don't call this function to compare a lexeme against some known string. Instead, use match, which doesn't require copying.

Definition at line 189 of file Lexer.h.

◆ isa() [1/2]

template<class T >
bool Sawyer::Lexer::TokenStream< T >::isa ( const Token &  t,
typename Token::TokenEnum  type 
)
inline

Determine whether token is a specific type.

This is sometimes easier to call since it gracefully handles EOF tokens. If called with only one argument, the desired type, then it checks the current token.

Definition at line 200 of file Lexer.h.

◆ isa() [2/2]

template<class T >
bool Sawyer::Lexer::TokenStream< T >::isa ( typename Token::TokenEnum  type)
inline

Determine whether token is a specific type.

This is sometimes easier to call since it gracefully handles EOF tokens. If called with only one argument, the desired type, then it checks the current token.

Definition at line 204 of file Lexer.h.

◆ match() [1/2]

template<class T >
bool Sawyer::Lexer::TokenStream< T >::match ( const Token &  t,
const char *  s 
)
inline

Determine whether a token matches a string.

Compares the specified string to a token's lexeme and returns true if they are the same. This is faster than obtaining the lexeme from a token and comparing to a string since there's no string copying involved with this function.

The no-argument version compares the string with the current tokens' lexeme.

Definition at line 217 of file Lexer.h.

References Sawyer::Container::LineVector::characters().

◆ match() [2/2]

template<class T >
bool Sawyer::Lexer::TokenStream< T >::match ( const char *  s)
inline

Determine whether a token matches a string.

Compares the specified string to a token's lexeme and returns true if they are the same. This is faster than obtaining the lexeme from a token and comparing to a string since there's no string copying involved with this function.

The no-argument version compares the string with the current tokens' lexeme.

Definition at line 226 of file Lexer.h.

◆ location()

template<class T >
std::pair< size_t, size_t > Sawyer::Lexer::TokenStream< T >::location ( size_t  position)
inline

Return the line number and offset for an input position.

Returns the zero-origin line number (a.k.a., line index) for the line containing the specified character position, and the offset of that character with respect to the beginning of the line.

Definition at line 235 of file Lexer.h.

References Sawyer::Container::LineVector::location().

◆ locationEof()

template<class T >
std::pair< size_t, size_t > Sawyer::Lexer::TokenStream< T >::locationEof ( )
inline

Returns the last line index and character offset.

Definition at line 240 of file Lexer.h.

References Sawyer::Container::LineVector::location(), and Sawyer::Container::LineVector::nCharacters().

◆ lineString()

template<class T >
std::string Sawyer::Lexer::TokenStream< T >::lineString ( size_t  lineIdx)
inline

Return the entire string for some line index.

Definition at line 246 of file Lexer.h.

References Sawyer::Container::LineVector::lineString().

◆ scanNextToken()

template<class T >
virtual Token Sawyer::Lexer::TokenStream< T >::scanNextToken ( const Container::LineVector content,
size_t &  at 
)
pure virtual

Function that obtains the next token.

Subclasses implement this function to obtain the next token that starts at or after the specified input position. Upon return, the function should adjust at to point to the next position for scanning a token, which is usually the first character after the returned token's lexeme. If the scanner reaches the end of input or any condition that it deems to be the end then it should return the EOF token (a default-constructed token), after which this function will not be called again.

Implemented in Sawyer::Document::Markup::TokenStream.


The documentation for this class was generated from the following file: