Interlude - A Little Refactoring
Over eight chapters, our code has become a little unwieldy. I want to make a few (mostly cosmetic) changes, to bring things back under some sort of control.
Goal
In this chapter, we want to:
- Put a more formal interface around the current token. No more manipulating the
m_CurrentTokenBldr
andm_CharPos
variables directly. - Create a formal method for parsing and discarding "noise" words like
As
andThen
. Currently, we have repeated code in multiple methods, and that irks me. - Implement better error management. We use tables for commands, types and loops: why manage errors in a huge
Select
statement?
The Approach
We don't want to introduce any new functionality in this chapter. But since we are refactoring anyway, we will introduce a few more sanity checks in places that need them.
I will not show all changes "in-camera" in this chapter. When there is a change that spans multiple methods or files, I'll just tell you what has been done.
Get Yer Token Here
Replace the entire Scanner region in Parser.vb with the following:
Private ReadOnly Property LookAhead() As Char
Get
Dim result As Char
If m_CharPos < m_LineLength Then
result = m_ThisLine.Chars(m_CharPos)
Else
result = " "c
End If
Return result
End Get
End Property
Private ReadOnly Property CurrentToken() As String
Get
Return m_CurrentTokenBldr.ToString()
End Get
End Property
Private ReadOnly Property TokenLength() As Integer
Get
Return m_CurrentTokenBldr.Length()
End Get
End Property
Private ReadOnly Property EndOfLine() As Boolean
Get
Return m_CharPos >= m_LineLength
End Get
End Property
Private ReadOnly Property CurrentLine() As Integer
Get
Return m_linePos
End Get
End Property
Private ReadOnly Property CurrentPosition() As Integer
Get
Return m_CharPos
End Get
End Property
Private Function PeekAhead(ByVal count As Integer) As String
Dim result As String
Dim charcount As Integer = m_LineLength - m_CharPos
If charcount <= 0 Then
result = ""
Else
If charcount > count Then charcount = count
result = m_ThisLine.Substring(m_CharPos,charcount)
End if
Return result
End Function
Private Sub SkipWhiteSpace()
Do While IsWhiteSpace(LookAhead)
If EndOfLine() Then
Exit Do
Else
SkipCharacter()
End If
Loop
End Sub
Private Sub SkipRestOfLine()
m_CharPos = m_LineLength
End Sub
Private Sub Backtrack()
If TokenLength > 0 Then
m_CharPos -= TokenLength
ResetToken()
End If
End Sub
Private Sub AppendToToken()
m_CurrentTokenBldr.Append(LookAhead)
m_CharPos += 1
End Sub
Private Sub ResetToken()
m_CurrentTokenBldr = New StringBuilder()
End Sub
Private Function ScanLine() As Boolean
Dim result As Boolean
Dim line As String
line = m_InputStream.ReadLine()
If line Is Nothing Then
result = False
Else
' set up line, line length,
' increment line counter,
' and set character position back to 0
m_ThisLine = line
m_LineLength = m_ThisLine.Length
m_linePos += 1
m_CharPos = 0
result = True
End If
Return result
End Function
Private Sub ScanNumber()
ResetToken()
Do While m_CharPos < m_LineLength
If Not IsNumeric(LookAhead) Then
Exit Do
End If
AppendToToken()
Loop
End Sub
Private Sub ScanMulOrDivOperator()
ResetToken()
If IsMulOrDivOperator(LookAhead) Then
AppendToToken()
End If
End Sub
Private Sub ScanAddOrSubOperator()
ResetToken()
If IsAddOrSubOperator(LookAhead) Then
AppendToToken()
End If
End Sub
Private Sub ScanString()
m_EmptyStringFlag = False
ResetToken()
If Not LookAhead.Equals(""""c) Then
Exit Sub
End If
Do While LookAhead.Equals(""""c)
SkipCharacter()
Do While Not LookAhead.Equals(""""c)
If EndOfLine Then
ResetToken()
Exit Sub
End If
AppendToToken()
Loop
SkipCharacter()
If LookAhead.Equals(""""c) Then
m_CurrentTokenBldr.Append(LookAhead)
End If
Loop
If TokenLength = 0 Then
m_EmptyStringFlag = True
End If
End Sub
Private Sub ScanConcatOperator
ResetToken()
If IsConcatOperator(LookAhead) Then
AppendToToken()
End If
End Sub
Private Sub ScanRelOperator
ResetToken()
Do While IsRelOperator(LookAhead)
AppendToToken()
Loop
Select Case CurrentToken
Case "=","==", "===", "<>", "!=", "!==", ">", "<", ">=", "=>","<=","=<"
' Valid relational operator
Case Else
ResetToken()
End Select
End Sub
Private Sub ScanNotOperator()
ResetToken()
Do While IsNotOperatorSymbol(LookAhead)
AppendToToken()
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "not", "!"
' Valid NOT operator
Case Else
ResetToken()
End Select
End Sub
Private Sub ScanAndOperator()
ResetToken()
Do While IsAndOperator(LookAhead)
AppendToToken()
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "and", "&"
' Valid AND operator
Case Else
ResetToken()
End Select
End Sub
Private Sub ScanOrOperator()
ResetToken()
Do While IsOrOperator(LookAhead)
AppendToToken()
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "or", "|"
' Valid OR operator
Case Else
ResetToken()
End Select
End Sub
Private Sub ScanName()
ResetToken()
Do While IsNameCharacter(LookAhead)
AppendToToken()
If EndOfLine Then
Exit Do
End If
Loop
End Sub
Private Sub ScanAssignmentOperator()
ResetToken()
Do While IsAssignmentCharacter(LookAhead)
AppendToToken()
Loop
Select Case CurrentToken
Case "=", ":="
' Valid Assignment operator
Case Else
ResetToken()
End Select
End Sub
This is mostly a rearrangement, ensuring that there is no repeated code. There are a few new methods, notably SkipRestOfLine and PeekAhead. These will allow methods outside the Scanner section to play with the current character position without using the m_CharPos field.
To use the new Scanner functions, change IsNotOperator and ParseCommand methods in Parser.vb, as follows:
Private Function IsNotOperator(ByVal c As Char) As Boolean
Dim result As Boolean = False
If c = "!"c Then
result = True
ElseIf Char.ToLowerInvariant(c) = "n"c Then
If PeekAhead(3).ToLowerInvariant() = "not" Then
result = True
End If
End If
Return result
End Function
Private Function ParseCommand() As ParseStatus
Dim result As ParseStatus
If TokenLength = 0 Then
result = CreateError(1, "a valid command")
Else
Dim commandname As String = _
CurrentToken.ToLowerInvariant()
If commandname = "comment" Then
result = ParseCommentCommand()
ElseIf commandname = "end"
result = ParseEndCommand()
ElseIf m_inCommentBlock Then
' Ignore rest of line
SkipRestOfLine()
' All is good in a comment block
result = CreateError(0, "Ok")
Else
If IsValidCommand(commandname) Then
Dim parser as CommandParser = _
m_commandTable(commandname)
result = parser()
Else
result = CreateError(1, "a valid command")
End If
End If
End If
Return result
End Function
Also change ParseRemCommand in Commands.vb:
Private Function ParseRemCommand() As ParseStatus
' Ignore the rest of the line
SkipRestOfLine()
Return CreateError(0, "Ok")
End Function
With these changes, the Scanner region is almost its own class. I'm tempted to just make it so, but will resist for now.
Cut Out The Noise
Next, a formal method for parsing and ignoring noise words. This will replace the ad hoc code in Dim
command parser (noise word: As
) and the If
and ElseIf
parsers (noise word: Then
).
Add the following to Parser.vb:
Private Function ParseNoiseWord( _
ByVal word As String,
Optional ByVal silent As Boolean = True
) As ParseStatus
Dim result As ParseStatus = CreateError(0, "Ok")
If Not EndOfLine Then
' If the next token is the noise word
' followed by a whitespace, skip it silently.
' If it isn't, raise and error if required,
' but don't move the current position.
Dim peekby As Integer = word.Length + 1
Dim peektoken As String = PeekAhead(peekby)
word = word.ToLowerInvariant()
peektoken = peektoken.ToLowerInvariant()
If peektoken.StartsWith(word) _
AndAlso _
(
word.Length = peektoken.Length _
OrElse _
IsWhiteSpace( _
peektoken.Chars(peektoken.Length-1) _
) _
) Then
' The token matches, and the next character
' is empty or whitespace. Scan and ignore
' the word
ScanName()
SkipWhiteSpace()
Else
If Not silent Then
result = CreateError(1, word)
End If
End If
End if
Return result
End Function
Private Function ParseLastNoiseWord( _
ByVal word As StringBuilder _
) As ParseStatus
Dim result As ParseStatus
' The noise word should be the last word in
' a line
result = ParseNoiseWord(word, False)
If result.Code = 0 Then
If Not EndOfLine Then
result = CreateError(1, "end of statement")
End If
End If
Return result
End Function
Next, change the ParseDimCommand, ParseIfCommand and ParseElseIfCommand methods in Commands.vb:
Public Function ParseDimCommand() As ParseStatus
Dim result As ParseStatus
' Read a variable name
SkipWhiteSpace()
ScanName()
If TokenLength = 0 Then
result = CreateError(1, "a variable name.")
Else
Dim varname As String
varname = CurrentToken
If m_SymbolTable.Exists(varname) Then
' Variable name already declared
result = CreateError(3, CurrentToken)
Else
SkipWhiteSpace()
' Check and ignore "As"
ParseNoiseWord("As")
' Read type
ScanName()
Dim typename As String
typename = CurrentToken
If TokenLength = 0 OrElse _
Not IsValidType(typename) Then
result = CreateError(1, "a valid type.")
Else
Dim symbol As New Symbol( _
varname,
GetTypeForName(typename)
)
symbol.Handle = m_Gen.DeclareVariable( _
symbol.Name, _
symbol.Type
)
m_SymbolTable.Add(symbol)
SkipWhiteSpace()
result = ParseDeclarationAssignment(symbol)
If result.Code = 0 Then
If Not EndOfLine Then
result = CreateError(1, "end of statement.")
End If
End If
End If
End If
End If
Return result
End Function
Public Function ParseIfCommand() As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
result = ParseBooleanExpression()
If result.Code=0 Then
SkipWhiteSpace()
result = ParseLastNoiseWord("Then")
If result.Code=0 Then
' Store old value of Else flag for nesting
Dim oldelseflag As Boolean = m_ElseFlag
' ElseFlag should be false
' at start of a new If block
m_ElseFlag = False
Dim endpoint As Integer = m_Gen.DeclareLabel()
' If the condition just emitted is false, emit jump
' to endpoint
m_Gen.EmitBranchIfFalse(endpoint)
' Parse the "If" block
Dim ifblock As Block = New Block( _
"if", _
endpoint, _
endpoint
)
result = ParseBlock(ifblock)
' If the block was successfully parsed, emit
' the endpoint label, and restore saved else
' flag for nesting
If result.Code = 0 Then
' If there is a dangling StartPoint, emit
' it first
If ifblock.StartPoint<>ifblock.EndPoint _
AndAlso _
Not m_ElseFlag Then
m_Gen.EmitLabel(ifblock.StartPoint)
End If
m_Gen.EmitLabel(ifblock.EndPoint)
m_ElseFlag = oldelseflag
End If
End If
End If
Return result
End Function
Private Function ParseElseIfCommand As ParseStatus
Dim result As ParseStatus
Dim currBlock As Block = m_BlockStack.CurrentBlock
' The ElseIf command can only be in an If block, and the
' Else flag should not be set
If currBlock Is Nothing _
OrElse _
currBlock.BlockType <> "if" _
OrElse _
m_ElseFlag Then
result = CreateError(6, "ElseIf")
Else
' If the endpoint is the same as the startpoint, this
' is the first ElseIf. Generate new endpoint. This
' will mark the end of the If block.
If currBlock.EndPoint = currBlock.StartPoint Then
currBlock.EndPoint = m_Gen.DeclareLabel()
End If
' Emit jump to the endpoint, because the ElseIf condition
' and block should not be processed if the If condition
' was true.
m_Gen.EmitBranch(currBlock.EndPoint)
' Emit the "start" point. This marks the start of the
' ElseIf block
m_Gen.EmitLabel(currBlock.StartPoint)
SkipWhiteSpace()
' Parse the ElseIf condition
result = ParseBooleanExpression()
' If successful
If result.Code = 0 Then
' "Eat" the optional "Then"
SkipWhiteSpace()
result = ParseLastNoiseWord("Then")
If result.Code = 0 Then
' Generate new "start" point. This will mark
' the next elseif statement, or an else statement
currBlock.StartPoint = m_Gen.DeclareLabel()
' If the condition is FALSE, jump to the
' "start" point
m_Gen.EmitBranchIfFalse(currBlock.StartPoint)
End If
End If
End If
Return result
End Function
That should do it.
Put Your Errors On The Table
Since all our table-driven stuff is in Commands.vb, let's move error handling there, too.
Delete the CreateErrors method from Parser.vb. Since that was the only method in the Helper Functions region, you might as well delete the region itself.
Next, add to following to the Fields and Helper Functions regions of Commands.vb, as appropriate:
Private m_ErrorTable As Dictionary(Of Integer, String)
Private Function CreateError( _
ByVal errorcode As Integer, _
ByVal errorDetail As String, _
Optional ByVal beforeToken As Boolean = False
) As ParseStatus
Dim result As ParseStatus
Dim message As String
Dim errorpos As Integer
If beforeToken Then
' Some errors happen
' at the scan position
errorpos = m_CharPos
Else
' Others happen after a token
' has been scanned
errorpos = m_CharPos - TokenLength
End If
' Columns should be 1-based for reporting
errorpos = errorpos + 1
If m_ErrorTable.ContainsKey(errorcode) Then
message = String.Format( _
m_ErrorTable(errorcode), _
errorDetail _
)
Else
message = String.Format(
"Unknown error code {0}",
errorcode
)
End if
result = New ParseStatus(errorcode, _
message, _
errorpos, _
m_LinePos)
Return result
End Function
Public Function BlockEnd() As ParseStatus
Return CreateError(-1, "", True)
End Function
Private Function Ok() As ParseStatus
Return CreateError(0, "", True)
End Function
Private Sub InitErrors()
m_ErrorTable = New Dictionary(Of Integer, String)
' Add Error Numbers here
m_ErrorTable.Add(-1, "{0}")
m_ErrorTable.Add(0, "{0}")
m_ErrorTable.Add(1, "Expected {0}")
m_ErrorTable.Add(2, "{0}")
m_ErrorTable.Add(3, "Cannot redeclare variable '{0}'")
m_ErrorTable.Add(4, "Variable '{0}' not declared")
m_ErrorTable.Add(5, "Type mismatch for Variable '{0}'")
m_ErrorTable.Add(6, "'{0}' was unexpected at this time")
End Sub
The new CreateError method pulls the error description formatting from a table, which is initialized in the InitErrors method. In the future, adding a new error will mean adding one line to InitErrors.
The new Ok and BlockEnd methods provide a convenient way of writing error values (status codes) that are not really errors.
InitErrors needs to be called somewhere. We'll call it in the Parser constructor as usual. Modify it as follows in Parser.vb:
Public Sub New( _
ByVal newStream As TextReader, _
ByVal newGen As CodeGen _
)
m_InputStream = newStream
m_Gen = newGen
InitErrors()
InitTypes()
InitLoops()
InitCommands()
End Sub
Finally, some hard work. Find all calls to the CreateError method where the first parameter is 0, and replace the call with Ok(). Do this in both Commands.vb and Parser.vb. After this, in the ParseEndCommand method in Commands.vb, find the call to the CreateError method where the first parameter is -1, and replace it with a call to BlockEnd().
Compile, and test with all the exercises of the previous chapters. Everything should work exactly the same as before.
Conclusion
We have refactored our code to be slightly more readable, and slightly easier to maintain. Only slightly, but it will keep us going for a chapter or two more at least. And now, on to new things.