Package mars.assembler.token
Class Tokenizer
java.lang.Object
mars.assembler.token.Tokenizer
A tokenizer is capable of tokenizing a complete MIPS program, or a given line from
a MIPS program. Since MIPS is line-oriented, each line defines a complete statement.
Tokenizing is the process of analyzing the input MIPS program for the purpose of
recognizing each MIPS language element. The types of language elements are known as "tokens".
MIPS tokens are defined in the
TokenType
class.
Example:
The above is tokenized ashere: lw $t3, 8($t4) #load third member of array
IDENTIFIER, COLON, OPERATOR, REGISTER_NAME, COMMA, INTEGER_5, LEFT_PAREN,
REGISTER_NAME, RIGHT_PAREN, COMMENT
.
The original MARS tokenizer was written by Pete Sanderson in August 2003.
- Author:
- Pete Sanderson, August 2003; Sean Clarke, July 2024
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic int
handleCharEscape
(StringBuilder value, SourceLocation lineLocation, int columnIndex, String line, AssemblerLog log) Handle an escape for a character or string literal.static int
hexadecimalDigitValue
(int ch) Interpret the given character as a hexadecimal digit, if possible.static boolean
isValidIdentifier
(String value) COD2, A-51: "Identifiers are a sequence of alphanumeric characters, underbars (_), and dots (.) that do not begin with a number."static SourceFile
tokenizeFile
(String filename, AssemblerLog log) Tokenize a complete MIPS program from a file, line by line.static SourceFile
tokenizeFile
(String filename, AssemblerLog log, Preprocessor preprocessor) static SourceLine
tokenizeLine
(String filename, String line, int lineIndex, AssemblerLog log, Preprocessor preprocessor) Tokenize one line of source code.static SourceLine
tokenizeLine
(String filename, String line, int lineIndex, AssemblerLog log, Preprocessor preprocessor, boolean isInExpansionTemplate) static SourceFile
tokenizeLines
(String filename, List<String> lines, AssemblerLog log) Tokenize a complete MIPS program, line by line.static SourceFile
tokenizeLines
(String filename, List<String> lines, AssemblerLog log, boolean isInExpansionTemplate) static SourceFile
tokenizeLines
(String filename, List<String> lines, AssemblerLog log, boolean isInExpansionTemplate, Preprocessor preprocessor)
-
Constructor Details
-
Tokenizer
public Tokenizer()
-
-
Method Details
-
tokenizeFile
Tokenize a complete MIPS program from a file, line by line. Each line of source code is translated into aSourceLine
, which consists of both the original code and its tokenized form.Note: Equivalences, includes, and macros are handled by the
Preprocessor
at this stage.- Parameters:
filename
- The name of the file containing source code to be tokenized.log
- The error list, which will be populated with any tokenizing errors in the given lines.- Returns:
- The tokenized source file.
-
tokenizeFile
-
tokenizeLines
Tokenize a complete MIPS program, line by line. Each line of source code is translated into aSourceLine
, which consists of both the original code and its tokenized form.Note: Equivalences, includes, and macros are handled by the
Preprocessor
at this stage.- Parameters:
filename
- The filename indicating where the given source code is from.lines
- The source code to be tokenized.log
- The error list, which will be populated with any tokenizing errors in the given lines.- Returns:
- The tokenized source file.
-
tokenizeLines
public static SourceFile tokenizeLines(String filename, List<String> lines, AssemblerLog log, boolean isInExpansionTemplate) -
tokenizeLines
public static SourceFile tokenizeLines(String filename, List<String> lines, AssemblerLog log, boolean isInExpansionTemplate, Preprocessor preprocessor) -
tokenizeLine
public static SourceLine tokenizeLine(String filename, String line, int lineIndex, AssemblerLog log, Preprocessor preprocessor) Tokenize one line of source code. If lexical errors are discovered, they are added to the given error list rather than being thrown as exceptions.- Parameters:
filename
- The filename indicating where the given source code is from.line
- The content of the line to be tokenized.lineIndex
- The line index in the source file (for error reporting).log
- The error list, which will be populated with any tokenizing errors in the given lines.preprocessor
- The current preprocessor instance, which will process token substitutions.- Returns:
- The generated tokens for the given line.
-
tokenizeLine
public static SourceLine tokenizeLine(String filename, String line, int lineIndex, AssemblerLog log, Preprocessor preprocessor, boolean isInExpansionTemplate) -
isValidIdentifier
COD2, A-51: "Identifiers are a sequence of alphanumeric characters, underbars (_), and dots (.) that do not begin with a number."DPS 14-Jul-2008: added '$' as valid symbol. Permits labels to include $. MIPS-target GCC will produce labels that start with $.
-
handleCharEscape
public static int handleCharEscape(StringBuilder value, SourceLocation lineLocation, int columnIndex, String line, AssemblerLog log) Handle an escape for a character or string literal. It is assumed thatindex
has already been incremented past the initial backslash.- Parameters:
value
- The destination where the resulting character value will be appended.lineLocation
- The filename of the current source file.columnIndex
- The index inline
of the character immediately following the initial backslash.line
- The raw form of the current line of source code.log
- The error list, which will be added to in case of an invalid character escape.- Returns:
- The new value of
index
, which corresponds to the next character in the line.
-
hexadecimalDigitValue
public static int hexadecimalDigitValue(int ch) Interpret the given character as a hexadecimal digit, if possible. For digitsA
throughF
, both uppercase and lowercase are accepted.- Parameters:
ch
- The character to interpret as a hexadecimal digit.- Returns:
- The hexadecimal digit value in the range
[0, 16)
, or-1
ifch
is not a valid hexadecimal digit.
-