Debug Builds
Introduction
Debugging has around as long as programming itself, and I suspect will be around for a while after programming becomes obsolete. Debuggers, though, came later.
To state the obvious, a debugger is a program that can be used to debug other programs. At the very least, when a program crashes, a debugger should be able tell us something about the state of the machine when it crashed. For so called low-level debuggers, this usually means the machine language instruction that was last executed, the current values in the machine’s registers (if the machine has registers), and the contents of memory. If the debugger is a source-level or symbolic debugger, like most modern ones, it should be able to show the original line of source code in whatever language the program was written in, and also the names and current values of the variables used (as opposed to just a memory dump).
Modern debuggers provide many more facilities, such as running a program one instruction at a time (this is called single stepping), running a program until a marked point in source code (called a breakpoint) is reached and monitoring values of variables(called watching). A good debugger can be a very useful tool, not just for removing bugs, but also for understanding how a program works.
I’m sure you have realized that compiling for the CLR has many advantages. In this chapter, we see yet another advantage: the CLR has good built-in facilities for debuggers. In fact, there are several good debuggers already available for the CLR, and programs in any language (that has a properly implemented compiler) can be debugged using any of them.
Goal
The goal of this chapter is to enhance our compiler such that SIC programs can be debugged using a CLR debugger. We will test with the two debuggers available in the .NET Framework SDK: It's 2019. Those debuggers don't exist any more. You can test with any debugger of your choice.cordbg
, the command-line debugger, and DbgCLR, the GUI debugger.
The Approach
The approach in this chapter is quite different from recent ones. We will have to add all sorts of new capabilities to our CodeGen class to enable debugging. How is code generation relevant to debugging? Let’s find out.
Recall what we know about the target of our compiler. SIC compiles down to CIL instructions. Imagine for a moment that ultimately, what get executed are the CIL instructions.1 So, if we were to “single- step” through our program, would each step be a CIL instruction? Or would the debugger somehow have to know that each “step” should correspond to one line of SIC, which may translate to multiple CIL instructions?
Some debuggers might choose to take the former approach. These are the so-called “low-level” debuggers. Usually, they work at the level of machine language instructions rather than CIL. Most modern debuggers are “source-level”, which means they take the second approach.
Here’s another related fact. Recall what we learned about variables in chapter 7. Variable names are important in parsing the source code, but not important at the CIL stage. In CIL, variables are numbered in the order of declaration. CIL instructions involving variables use these numbers. Now, when we use a debugger, we want to see the original names of the variables. How would the debugger correlate the variable numbers that are available in CIL to the names used in the source language?
Our compiler needs to emit some extra information while emitting CIL. Specifically, it needs to somehow record the information that a particular set of CIL instructions correspond to a particular line of SIC source code. Also, it needs to record the symbols (in our case, the variable names) from the source code, such that these names can be used by the debugger.
Luckily for us, the Reflection Emit library that we have been using to generate CIL has these facilities built-in. By setting some parameters and calling a few extra methods, we will be able to produce an executable file which contains some extra debug information along with the actual CIL. In the process, we will also produce a file with the extension .pdb or .mdb, which will contain additional information that cannot readily be built into the EXE itself, such as the symbols. An executable file with debug information and an accompanying .pdb or .mdb file is known as a debug build.
First Step
Producing debug builds using Reflection Emit is fairly easy. In chapter 1, we saw that to generate an executable, we had to create an Assembly via the AssemblyBuilder class, create a module inside that using a ModuleBuilder object, create a class (type) inside the module using an instance of the TypeBuilder class, create a method in that class using a MethodBuilder object, and finally generate code inside the method by using an ILGenerator object. For generating a debug build, the following steps are required:
- When a module is being created, we need to specify that all code generated for the module needs to emit the relevant symbols as well.
- We have to specify a symbol writer for the module. This is an additional repository of information that gets written into the .pdb (or .mdb) file mentioned above. It specifies the location of the source code that the module was written in, the actual source language, and the vendor of said language.
- When we emit the declaration of a variable, we need to emit the symbol information as well. The name of the variable as recognized in the source code gets written to the .pdb (or .mdb) file.
- As we parse each line of source code, we need to mark a sequence point in the resulting CIL. A sequence point is something like a label. It indicates that all CIL instructions after itself until the next sequence point correspond to a single “line” of source code. When a debugger is being used to single-step through the code, it will perform all CIL instructions after a sequence point till the next sequence point as a single step.
The first three steps are all modifications to CodeGen. Let’s do that first.
Modifying CodeGen
Add the following to CodeGen.vb. Place the Imports
statement with the other imports, and the rest inside the CodeGen class.
The constructor that is shown here replaces the existing constructor. We need to pass in some additional information that is useful for debug builds.
Imports System.Diagnostics.SymbolStore
Private m_debugBuild As Boolean
Private m_symbolStore As ISymbolDocumentWriter
Public ReadOnly Property DebugBuild() As Boolean
Get
Return m_debugBuild
End Get
End Property
Public Sub New( _
ByVal sourceFileName As String, _
ByVal outputFileName As String, _
ByVal assemblyName As String, _
ByVal debugBuild As Boolean _
)
m_SaveToFile = outputFileName
m_debugBuild = debugBuild
' Compiling a CLR language produces an assembly.
' An assembly has one or more modules.
' Each module has one or more types:
' (structures or classes)
' Each type has one or more methods.
' Methods are where actual code resides.
' Create a new assembly. An assembly must have a name.
' An assembly's name consists of up to four parts:
' a simple name, a four-part version number, the default
' Culture of the assembly, and optionally a public key token.
' At the moment, we will only set the simple name. The
' version number will default to 0.0.0.0, and the culture
' will default to neutral.
Dim an As New assemblyName()
an.Name = assemblyName
m_producedAssembly = _
AppDomain.CurrentDomain.DefineDynamicAssembly( _
an, AssemblyBuilderAccess.Save _
)
' In the newly created assembly, create a
' module with the same name.
' The last parameter of DefineDynamicModule
' indicates whether debug information will
' be emitted for this assembly
m_producedmodule = m_producedAssembly.DefineDynamicModule( _
assemblyName, _
outputFileName, _
m_debugBuild _
)
' If this is a debug build, create a symbol
' store for the module.
If m_debugBuild Then
m_symbolStore = m_producedmodule.DefineDocument( _
sourceFileName, _
Nothing, _
Nothing, _
SymDocumentType.Text _
)
End If
' In the newly created module, create a class called
' "MainClass".
m_producedtype = m_producedmodule.DefineType("MainClass")
' In MainClass, create a Shared (static) method
' with Public scope, called "MainMethod".
m_producedmethod = m_producedtype.DefineMethod(
"MainMethod", _
MethodAttributes.Public _
Or _
MethodAttributes.Static, _
Nothing, _
Nothing _
)
' All IL code that we produce will be contained
' in MainMethod.
m_ILGen = m_producedmethod.GetILGenerator()
End Sub
There are three important changes here. To begin with, we pass the source file name, the target file name, the assembly name, and a flag indicating whether a debug build should be generated. So far, we had only been passing the target file name. Why are we passing the extra parameters?
When a source-level debugger is used to debug the applications that we compile, it will need to display the source code. There are two choices available: we can write the complete source code into the symbol store (as mentioned above, this is a file with the .pdb or .mdb extension. The debugger needs access to the executable and the symbol store file to work properly), or we can simply tell the symbol store that the source code can be found in a particular source file. The Reflection Emit library encourages the second option. Thus, we need to pass in the name of the source file.
Every CLR assembly has a name that is used to identify it. As the comments in the code above state, this name can consist of up to four parts. So far, we had not given much importance to this, and every assembly we compiled was called MainAssembly. We cannot be so laissez-faire about this forever. So we’ll make a start towards formalizing assembly names in this chapter, and set the simple name part of the assembly to a non-hard-coded value. Expect to hear much more about this in a future chapter.
We start the creation of an assembly by calling DefineDynamicAssembly, and create a module inside it by calling DefineDynamicModule. The third parameter to DefineDynamicModule tells the Reflection Emit library whether debug information is to be emitted. If it is set to True
, appropriate information will be written to the symbol store.
The symbol store itself needs to be set up, and we do that by calling the DefineDocument method on the newly created ModuleBuilder object. The parameters for DefineDocument are, in order, are:
- A URL pointing to a file which contains the source code for this module. We will pass the name of the source file, without any path specification, to this parameter. At the time of debugging, the debugger will look for a file with that name in the same location as the executable file.
- A value specifying the programming language of the source code. Certain languages are recognized by the symbol store, including Basic, C# and COBOL. In our case, we pass
Nothing
. - A value specifying the vendor of the programming language. The symbol store system recognizes some vendors of programming languages. Okay, it recognizes one vendor by default: Microsoft. In our case, we pass
Nothing
. - A value specifying the "kind" of the file passed in the first parameter. The constant SymDocumentType.Text identifies it as a text document.
A corresponding change needs to be made to the Save method in the same class. This gets called after all code in a source file has been parsed successfully. The Save method actually saves the assembly, after marking the single method that we create as the starting point or entry point of the assembly. One additional thing needs to be done for debug builds.
Make the change in Commands.vb, as follows:
Public Sub Save()
' Emit a RETurn opcode, which is the last
' opcode for any method
m_ILGen.Emit(OpCodes.Ret)
' Actually create the type in the module
m_producedtype.CreateType()
' Specify that when the produced assembly
' is run, execution will start from
' the produced method (MainMethod). Also, the
' produced assembly will be a console
' application.
m_producedAssembly.SetEntryPoint( _
m_producedmethod, _
PEFileKinds.ConsoleApplication
)
' If this is a debug build, set the User
' Entry Point, which is the point where a
' debugger will start stepping
If m_debugBuild Then
Dim isMono As Boolean = _
Type.GetType("Mono.Runtime") IsNot Nothing
If Not isMono Then
m_producedmodule.SetUserEntryPoint( _
m_producedmethod _
)
End If
End If
m_producedAssembly.Save(m_SaveToFile)
End Sub
As stated in the comment, we specify a user entry point (the point where a debugger starts stepping into the code) if this is a debug build. If we miss this step, some debuggers will not be able to properly single-step through our compiled programs.
Notice the check using the variable called isMono? That is required because the Mono implementation of the Reflection Emit library does not implement ModuleBuilder.SetUserEntryPoint. This is unfortunate, as it will mean that we will have to manually invoke the debugger before our SIC programs can be debugged. But apart from that, everything else remains the same as other CLR implementations such as the .NET Framework.2
Next, make the following change to the DeclareVariable method in Commands.vb.
Public Function DeclareVariable( _
Name As String, _
VariableType As System.Type _
) As Integer
Dim lb As LocalBuilder
lb = m_ILGen.DeclareLocal(VariableType)
m_LocalVariables.Add(lb.LocalIndex, lb)
If m_debugBuild Then
lb.SetLocalSymInfo(Name)
End If
Return lb.LocalIndex
End Function
You may recall that when we first created the DeclareVariable method in Chapter 7, I had pointed out that we were not doing anything with the variable name. At that point, we had no use for the name, because CIL deals with the variable's index. Now, for debug builds only, we provide the variable name to the symbol store by calling the SetLocalSymInfo method on the newly create LocalBuilder object.
Finally, add the following method to the CodeGen class in Codegen.vb:
Public Sub EmitSequencePoint( _
ByVal startLine As Integer, _
ByVal startColumn As Integer, _
ByVal endLine As Integer, _
ByVal endColumn As Integer _
)
m_ILGen.MarkSequencePoint( _
m_symbolStore, _
startLine, _
startColumn, _
endLine, _
endColumn _
)
End Sub
This will allow us to place a sequence point in the CIL that we generate. A sequence point, as mentioned above, is somewhat like a label, but useful for debugging only.
A sequence point appears just before a set of CIL instructions. The corresponding symbol information states that the instructions from this sequence point up to the next one represent one statement in source code. This source code statement starts at a particular line and column and ends at a particular line and column in the source code file pointed at by our symbol store.
We will call this method from our parsers where relevant.
Changing the Compiler
Now that we have changed the constructor for our CodeGen class, we need to change the class that invokes it as well. Make the following changes to the Main method of the Compiler class, in Compiler.vb.
Public Function Main(Byval CmdArgs() As String) As Integer
Dim reader As TextReader
Dim gen As CodeGen
Dim status As ParseStatus
Dim parser As Parser
Console.WriteLine("Compiler for CLR")
If CmdArgs.Length=0 Then
reader = Console.In
gen = New CodeGen( _
"",
"Test.exe", _
"Test", _
False
)
Else
If File.Exists(CmdArgs(0)) Then
Dim finfo As New FileInfo(CmdArgs(0))
reader = New StreamReader( _
finfo.FullName)
gen = new CodeGen( _
finfo.FullName, _
finfo.Name.Replace( _
finfo.extension, _
".exe" _
), _
finfo.Name.Replace( _
finfo.Extension, _
"" _
), _
True _
)
Else
Console.Write( _
"Error: Could not find the file {0}.", _
CmdArgs(0))
Return 1
End If
End If
parser = New Parser(reader, gen)
status = parser.Parse()
With status
If .Code <> 0 Then
Console.WriteLine( _
"Error at line {0}, column {1} : {2}", _
.Row, _
.Column, _
.Description)
Else
gen.Save()
Console.WriteLine("Done.")
End If
End With
Return status.Code
End Function
If there is no command-line argument to our compiler, we create a CodeGen object passing an empty source file name, “Test.exe” as the target file name, “Test” as the assembly name, and no debug information. If there is an argument, and it is the name of a valid file, we create a CodeGen object with the full source file name, the same name but with the extension replaced with .exe as the target file name, the same name with the extension removed as the assembly name, and debug information. We borrow a trick from Visual Studio here, and set the assembly name to the name of the target file, minus the .EXE extension.
So, at the moment, if we compile a file containing source code, a debug build will be generated.
Changing the Parser
Now, all that remains is to emit sequence points at appropriate locations. The place to do that is the Parser class.
Sequence points are use to mark the start an end of statements. In the SIC language as defined so far, a statement is a line, and a line can be one of the following:
- a command
- a type-first declaration
- an assignment statement
We need to emit sequence points in the relevant parsers.
Now, there are certain lines where we should not emit sequence points. These are:
- Variable declarations, unless they also have an assignment.
- Lines containing the REM command
- Entire COMMENT blocks
- Lines containing ELSE or DEFAULT statements
- Lines containing REPEAT or LOOP statements
- END statements, except in the case of END REPEAT
- Lines containing CASE or ELSEIF statements
The reason is that none of these, except the last, actually generate any CIL code. The statements in the last case, too, do not generate CIL that exactly corresponds to that line. An explanation of the previous sentence can be found below.
But before all that, how do we actually emit the sequence point?
Add to following to the Helper Functions section in Parser.vb:
Private Sub GenerateSequencePoint()
' If this is a debug build
If m_Gen.DebugBuild Then
' Generate a sequence point that corresponds
' to the current source line. Line numbers in
' the symbol store are 1-based, and column
' numbers are 0-based. We have to pass the
' starting and ending lines and columns
' in the source for the statement. We always
' pass the current line, and the first and
' last column of the current line.
m_Gen.EmitSequencePoint( _
m_LinePos, _
0, _
m_LinePos, _
m_LineLength + 1 _
)
End If
End Sub
The comments say it all. The nice thing about having a line-oriented language is that we can make blanket assumptions about the start and end positions of the statement.
Now we need to call this at appropriate places. The easiest one is ParseAssignment. Modify the ParseAssignment method in Parser.vb as follows:
Private Function ParseAssignment(variable As Symbol) _
As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
GenerateSequencePoint()
Select Case variable.Type.ToString()
Case "System.Int32"
result = ParseNumericExpression()
Case "System.String"
result = ParseStringExpression()
Case "System.Boolean"
result = ParseBooleanExpression()
Case Else
result = CreateError(1, " a valid type.")
End Select
If result.Code = 0 Then
' Generate assignment code
m_Gen.EmitStoreInLocal(variable.Handle)
End If
Return result
End Function
This will take care of every statement where an assignment occurs, including type-first declarations.
Next, we should generate a sequence point at the start of every command that immediately generates CIL. Currently, that means Print
, If
, While
, Exit
, Continue
, Until
, For
, and Select
. This being a long list, I won't show all the changes in-camera. You can check them out in the companion code. Here are a couple, for reference:
Private Function ParsePrintCommand() As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
GenerateSequencePoint()
result = ParseExpression()
If result.code = 0 Then
If m_LastTypeProcessed.Equals( _
Type.GetType("System.Int32") _
) Then
m_Gen.EmitWriteLine()
ElseIf m_LastTypeProcessed.Equals( _
Type.GetType("System.String") _
) Then
m_Gen.EmitWriteLineString()
ElseIf m_LastTypeProcessed.Equals( _
Type.GetType("System.Boolean") _
) Then
m_Gen.EmitWriteLineBoolean()
End If
If Not EndOfLine Then
result = CreateError(1, "end of statement.")
End If
End If
Return result
End Function
Private Function ParseUntilCommand() As ParseStatus
Dim result As ParseStatus
' Until is only valid if we are currently in a repeat block
If m_BlockStack.IsEmpty _
OrElse _
m_BlockStack.CurrentBlock.BlockType <> "repeat" Then
result = CreateError(6, "Until")
Else
Dim repeatblock As Block = m_BlockStack.CurrentBlock
SkipWhiteSpace()
' There should be a boolean expression after Until
If EndOfLine Then
result = CreateError(1, "a boolean expression")
Else
GenerateSequencePoint()
result = ParseBooleanExpression()
If result.Code = 0 Then
' If the boolean condition evaluates to FALSE
' we must jump to the start point.
m_Gen.EmitBranchIfFalse(repeatblock.StartPoint)
' End the block
result = BlockEnd()
End If
End If
End If
Return result
End Function
Notice how GenerateSequencePoint is called just before the first CIL-emitting sub-parser. We have to make sure that the sequence point is generated before the CIL it targets.
NOP And Roll
Some important commands where sequence points have not been generated are Repeat
, Loop
, Next
, and Else
. The Repeat
, Loop
and Next
commands, if you remember the previous chapter, do not generate any CIL per se, and as such should not have sequence points. Else
has similar characteristics.
We cannot just emit a sequence point when we parse these statements. A sequence point, as discussed, is just a marker for CIL instructions corresponding to a line of source. We will have to generate some CIL to correspond to these statements.
This is a cue to introduce my favourite CIL instruction: nop. This instruction, much like me, does nothing.
For debug builds, every time we encounter:
- a
Repeat
or aLoop
statement - an
Else
or aDefault
statement - an
End
statement (exceptEnd Repeat
)
we should emit mark a sequence point and emit a nop.
Does this seem wasteful? Are we taking too much trouble and generating “bloated” executable files just to give what I think is a good debugging experience? Well, debugging needs to be as explicit and comfortable a process as possible. Our language syntax has some amount of noise-words, but that is a part of its design. Tweaking the debugger experience around the syntax will make debugging seem more natural.
Besides, we are in very good company. The Visual Basic and C# compilers do exactly this, for their own noise words. Anyway, mop instructions are not generated for non-debug builds.
Let’s add the nop instruction to our CodeGen class. While we are at it, we will add another instruction: break. This simply triggers the debugger. It’s the same instruction that is patched into any code by debuggers themselves to implement breakpoints.
Add the following to the CodeGen class, in CodeGen.vb:
Public Sub EmitNOP()
m_ILGen.Emit(OpCodes.Nop)
End Sub
Public Sub EmitBreak()
m_ILGen.Emit(OpCodes.Break)
End Sub
Next, add the following method to the Helper Functions region in Parser.vb:
Public Sub GenerateNOP()
' We need to generate a NOP instruction for
' some commands, only for debug builds
If m_Gen.DebugBuild Then
m_Gen.EmitNOP()
End If
End Sub
Finally, we have to modify the parser methods themselves. As before, I won't show all the changes in-camera. Three of them are shown for reference below.
Private Function ParseRepeatCommand() As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
' There should be nothing after Repeat
If Not EndOfLine Then
result = CreateError(1, "end of statement")
Else
Dim startpoint As Integer
Dim endpoint As Integer
' Generate and emit startpoint
startpoint = m_Gen.DeclareLabel()
m_Gen.EmitLabel(startpoint)
' Note that the sequence point and the
' NOP are emitted AFTER the startpoint
' label. This will ensure that when the
' debugger jumps back, it will highlight
' the Repeat statement
GenerateSequencePoint()
GenerateNOP()
' Generate endpoint
endpoint = m_Gen.DeclareLabel()
' Parse the Repeat block
Dim repeatblock As New Block( _
"repeat", _
startpoint, _
endpoint _
)
result = ParseBlock(repeatblock)
If result.Code = 0 Then
' Emit endpoint
m_Gen.EmitLabel(repeatblock.EndPoint)
End If
End If
Return result
End Function
Private Function ParseElseCommand() As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
' There should be nothing after Else on a line
If Not EndOfLine Then
result = CreateError(1, "end of statement")
Else
Dim currBlock As Block = m_BlockStack.CurrentBlock
' We should be in an If block, and the Else flag should
' not be set
If currBlock Is Nothing _
OrElse _
currBlock.BlockType <> "if" _
OrElse _
m_ElseFlag Then
result = CreateError(6, "Else")
Else
' The end point is the same as the start point
' This means that only an If statement has been
' parsed before. Generate a new end point
If currBlock.EndPoint = currBlock.StartPoint Then
currBlock.EndPoint = m_Gen.DeclareLabel()
End If
' Emit a jump to the end point
' Because a True If condition should never cause
' code in the Else block to be executed
m_Gen.EmitBranch(currBlock.EndPoint)
' Emit the "start" point (of the Else block now)
' because a False If condition should cause
' execution to continue after the Else command.
m_Gen.EmitLabel(currBlock.StartPoint)
' Note that the sequence point and the
' NOP are emitted AFTER the startpoint
' label. This will ensure that when the
' debugger jumps to the Else part of an
' If block, it will highlight the Else
' statement
GenerateSequencePoint()
GenerateNOP()
' Set the Else flag
m_ElseFlag = True
result = Ok()
End If
End If
Return result
End Function
Private Function ParseEndCommand() As ParseStatus
Dim result As ParseStatus
Dim block As Block = m_BlockStack.CurrentBlock
' We should be in a block when the End command
' is encountered.
If Not block Is Nothing Then
SkipWhiteSpace()
ScanName()
' The token after the End command should
' match the current block type
If block.IsOfType(CurrentToken) Then
' If this is 'End Repeat'
If CurrentToken.ToLowerInvariant() = "repeat" Then
result = ParseUntilCommand()
Else
' For any End command other than
' End Repeat, we need to generate
' a NOP.
GenerateSequencePoint()
GenerateNOP()
result = BlockEnd()
End If
Else
' Unless we are inside a comment block
If m_inCommentBlock Then
result = Ok()
Else
result = CreateError(1, block.BlockType)
End If
End If
Else
result = CreateError(2, "Not inside a block")
End If
Return result
End Function
As the comments in the code point out, the sequence point and accompanying nop get emitted after a startpoint label, if one exists. This is to ensure that after a jump, the debugger correctly highlights the target statement. Whereas sequence points and nops do not affect execution logic in any way, they have to be placed at the correct spots in the CIL for the debugging experience to be accurate. The comments show where the correct spot is located.
At this time, there remain only two statements that do not get sequence points generated: the Case
and ElseIf
statements. Unlike the ones that we just covered, these statements do result in CIL being generated, and therefore do not require a nop. However, there are several jumps and labels involved. To get it right, the sequence point has to be generated at the right spot.
Which is the right spot? The clue is there in the commands above: if a statement can be the target of a jump, the sequence point should be generated after the label marking the jump target.
Modify ParseElseIfCommand and ParseCaseCommand in Commands.vb as follows:
Private Function ParseElseIfCommand As ParseStatus
Dim result As ParseStatus
Dim currBlock As Block = m_BlockStack.CurrentBlock
' The ElseIf command can only be in an If block, and the
' Else flag should not be set
If currBlock Is Nothing _
OrElse _
currBlock.BlockType <> "if" _
OrElse _
m_ElseFlag Then
result = CreateError(6, "ElseIf")
Else
' If the endpoint is the same as the startpoint, this
' is the first ElseIf. Generate new endpoint. This
' will mark the end of the If block.
If currBlock.EndPoint = currBlock.StartPoint Then
currBlock.EndPoint = m_Gen.DeclareLabel()
End If
' Emit jump to the endpoint, because the ElseIf condition
' and block should not be processed if the If condition
' was true.
m_Gen.EmitBranch(currBlock.EndPoint)
' Emit the "start" point. This marks the start of the
' ElseIf block
m_Gen.EmitLabel(currBlock.StartPoint)
' Note that the sequence point is emitted AFTER the
' startpoint label. This will ensure that when the
' debugger jumps to an ElseIf in an If block, it will
' highlight the ElseIf statement
GenerateSequencePoint()
SkipWhiteSpace()
' Parse the ElseIf condition
result = ParseBooleanExpression()
' If successful
If result.Code = 0 Then
' "Eat" the optional "Then"
SkipWhiteSpace()
result = ParseLastNoiseWord("Then")
If result.Code = 0 Then
' Generate new "start" point. This will mark
' the next elseif statement, or an else statement
currBlock.StartPoint = m_Gen.DeclareLabel()
' If the condition is FALSE, jump to the
' "start" point
m_Gen.EmitBranchIfFalse(currBlock.StartPoint)
End If
End If
End If
Return result
End Function
Private Function ParseCaseCommand() As ParseStatus
Dim result As ParseStatus
If m_BlockStack.IsEmpty _
OrElse _
m_BlockStack.CurrentBlock.BlockType <> "switch" Then
Return CreateError(6, "Case")
End If
Dim switchblock As Block = m_BlockStack.CurrentBlock
' If default statement has already been processed
' get out
If m_SwitchState.DefaultFlag Then
Return CreateError(6, "Case")
End If
' This could be the first case statement, or the end
' of the previous case statement. In the latter case,
' we need to emit a jump to the current block's
' endingpoint before we start processing this case
' statement
If Not m_SwitchState.CaseFlag Then
' This is the end of the previous case statement
m_Gen.EmitBranch(switchblock.EndPoint)
End If
' If the fallthrough flag is set, we need to jump over
' the condition. So we generate a label and a jump
' Fallthrough is weird
Dim fallThroughLabel As Integer
If m_SwitchState.FallThroughFlag Then
fallThroughLabel = m_Gen.DeclareLabel()
m_Gen.EmitBranch(fallThroughLabel)
End If
' Emit the current startpoint if there is one
If switchblock.StartPoint <> -1 Then
m_Gen.EmitLabel(switchblock.StartPoint)
End If
' Note that the sequence point is emitted AFTER
' the startpoint label is emitted. This will ensure
' that when the debugger jumps to a Case in a Select
' block, it will highlight the Case statement
GenerateSequencePoint()
' Create new startpoint for the next case
switchblock.StartPoint = m_Gen.DeclareLabel()
SkipWhiteSpace()
' Depending on the type of the switch expression
' do the appropriate type of expression
With m_SwitchState
If .ExpressionType.Equals( _
GetType(System.Int32) _
) Then
result = ParseNumericExpression()
ElseIf .ExpressionType.Equals( _
GetType(System.String) _
) Then
result = ParseStringExpression()
ElseIf .ExpressionType.Equals( _
GetType(System.Boolean) _
) Then
result = ParseBooleanExpression()
Else
result = CreateError(1, "a valid expression")
End If
If result.Code = 0 Then
' Emit the Switch Expression
m_Gen.EmitLoadLocal(.ExpressionVariable)
' Emit comparison
If .ExpressionType.Equals( _
GetType(System.String) _
) Then
m_Gen.EmitStringEquality()
Else
m_Gen.EmitEqualityComparison()
End If
' If NOT equal, jump to the next Case
m_Gen.EmitBranchIfFalse(switchBlock.StartPoint)
' Emit the Fallthrough label if the Fallthrough
' flag is set, and reset the flag
If m_SwitchState.FallThroughFlag Then
m_Gen.EmitLabel(fallThroughLabel)
m_SwitchState.FallThroughFlag = False
End If
' Reset the case flag
m_SwitchState.CaseFlag = False
End If
End With
Return result
End Function
Stop
Some languages have a command which immediately invokes the debugger. Javascript, for example, has the debugger
command, and Visual Basic has Stop
. We might want to do this when debugging a very complex If
or Select
block: if the execution ever gets to this point, I want to know about it, so pop up the debugger.
Such a command becomes even more useful when paired with a condition. Usually, if the condition evaluates to false, then the debugger should be invoked. This technique is called assertion. An "assert" is an assumption or a condition that is intended to be true. If the condition evaluates to false (there is an assertion failure), there is something wrong. In such a case, ideally, a debugger should be invoked.
Of course, this statement should only work for debug builds. For non-debug builds, it should be silently ignored.
We will borrow from Visual Basic, and call our statement Stop
. But we will extend it to add an optional When clause, which can specify a condition as mentioned above, except that the debugger will be invoked if the condition is true.
The required CIL instruction, break, is already in CodeGen. So all we have to do is add a Stop
command parser. Add it to Commands.vb, as follows:
Private Function ParseStopCommand() As ParseStatus
Dim result As ParseStatus
' If not a debug build, ignore the Stop
' instruction
If Not m_Gen.DebugBuild Then
Return Ok()
End If
SkipWhiteSpace()
' If there is no When clause
If EndOfLine Then
' Emit a break
m_Gen.EmitBreak()
GenerateSequencePoint()
GenerateNOP()
result = Ok()
Else
result = ParseWhenClause()
If result.Code = 0 Then
Dim afterBreakPoint As Integer = _
m_Gen.DeclareLabel()
' If the condition evaluates to
' false, jump past the break
m_Gen.EmitBranchIfFalse(afterBreakPoint)
' Emit the break
m_Gen.EmitBreak()
GenerateSequencePoint()
GenerateNOP()
' Emit the label
m_Gen.EmitLabel(afterBreakPoint)
End If
End If
Return result
End Function
There are two things to note here. One, we generate code only if this is a debug build. Otherwise, the Stop statement is silently ignored as far as code generation is concerned.
Secondly, notice that we call GenerateSequencePoint and GenerateNOP. The problem is that the debugger stops at the first sequence point after a break instruction. We want this to represent the Stop
statement itself. Therefore, rather than put the sequence point before the break instruction, we put it after it, followed by a nop.
Finally, add Stop
to the list of valid commands.
And we’re done. Let us compile our compiler, and then test all this. So, compile.
The Test program
Create a new file called TestDebug.ssc, and add the following into it:
Stop
Comment
This program uses just about every feature of
our language. It should be fun to debug.
End Comment
REM Test Repeat
Dim i As Integer:=1
Repeat
print i
if i=3
i:=i+2
continue repeat
end if
exit repeat when i=8
i=i+1
end repeat i>10
print "Done testing Repeat"
REM stop
REM Test If
I := 22
if i>30 Then
print "more than 30"
elseif i>15 then
print "more than 15 but less than 31"
elseif i>10
print "more than 10 but less than 16"
elseif i<5
print "less than 5"
else
print "between 6 and 10 inclusive"
end if
print "Done testing If"
REM Test While
i = 1
while i<5
if i<2 then
i = i + 2
else
i = i + 1
end if
REM stop when i=4
end while
print "Done Testing While"
REM Test Loop
int j
j := 1
loop
print j
exit loop when j==4
j = j + 1
end loop
print "Done testing Loop"
REM Test For
Dim k As Integer
Print "Countdown..."
For k:=10 To 1 Step -1
Print k
Next
Print "Blast off"
For i=1 to 5 Step 2
For j:=i To i+1
Exit For When j==2
Print "j"
Print j
End For
If i==3 Then
Continue For
Else
Print "i"
Print i
End If
Next
Print "Done testing For"
REM Test Switch
Int ItemPrice := 800
Dim CardType As String = "Platinum"
Dim DiscountPercent As Integer
Switch ItemPrice/100
Case 10
DiscountPercent:= 10
Case 9
FallThrough
Case 8
FallThrough
Case 7
Switch CardType
Case "Platinum"
DiscountPercent:=10
Default
DiscountPercent := 8
End Switch
Default
DiscountPercent := 0
End Switch
Print DiscountPercent
Print "Done testing Switch"
Note the initial Stop
. If you have been using Mono, you have to do this to use a debugger. This is because the Mono implementation of the Reflection Emit library does not implement the SetUserEntryPoint method, and therefore does not provide us with the location where a debugger can start. The Stop
command at the very beginning will provide that location for the debugger. Thereafter, all debugger features can be used.
If you have been using the .NET Framework instead, the initial Stop
is not needed, but will not cause any problems if present.
Compile this with our compiler. We are now ready to debug.
Debugging
The first edition of this book had instructions for using some debuggers, which used to be included with the .NET Framework SDK. I don't think these debuggers exist any more. So, I won't provide any instructions here. There are a lot of choices available these days: Visual Studio, mdbg, sdb, MonoDevelop, Visual Studio Code and so on. Use any one of them. They should all allow single stepping through SIC code, and monitoring of all declared variables.
Compiler switches
One more modification and we can close out this chapter. Debug builds are very helpful during the development process. However, as we have seen, a lot of additional information is generated in a debug build. Once we have the program working exactly the way we want it, we don’t need to generate all that any more. In fact, they may (they will, believe me) slow down the program.
So, we will modify the Compiler class such that we can pass a command-line switch that indicates if a debug build is desired. If nothing is indicated, we will compile a non-debug build. While we are at it, we will add one more command-line switch as well, for specifying the name of the output file.
Add to following to Utilities.vb:
Public Class CommandArgs
Public SourceFile As String
Public TargetFile As String
Public AssemblyName As String
Public DebugBuild As Boolean
Public Valid As Boolean
Public Message As String
End Class
Next, add the following functions to Compiler.vb. The Main function provided here replaces the old one:
Private Function ParseCommandLine( _
ByVal CmdArgs() As String _
) As CommandArgs
Dim result As New CommandArgs
result.Message = "Please specify a source file."
Dim switch As String
Dim param As String
For Each cmd As String In CmdArgs
' If the command-line paramater starts
' with a / or a -, it's a switch. If
' it contains a : , then the part after
' the : is the paramter to the switch
If cmd.StartsWith("/") _
OrElse _
cmd.StartsWith("-") Then
cmd = Mid(cmd, 2)
Dim paramStart As Integer = _
InStr(cmd, ":")
If paramStart <> 0 Then
switch = Left( _
cmd, _
paramStart - 1 _
)
param = Mid(cmd, paramStart + 1)
Else
switch = cmd
param = ""
End If
Select Case switch.ToLowerInvariant()
Case "debug", "d"
result.DebugBuild = True
Case "out"
result.TargetFile = param
Case Else
result.Message = String.Format( _
"Error: Invalid switch - /{0}", _
switch _
)
Exit For
End Select
Else
' If the command-line parameter is not
' a switch, it is considered to be the
' name of the source file.
result.SourceFile = cmd
result.Valid = True
End If
Next
Return result
End Function
Public Function Main(Byval CmdArgs() As String) As Integer
Dim reader As TextReader
Dim gen As CodeGen
Dim status As ParseStatus
Dim parser As Parser
Console.WriteLine("Compiler for CLR")
If CmdArgs.Length=0 Then
reader = Console.In
gen = New CodeGen( _
"",
"Test.exe", _
"Test", _
False
)
Else
' Parse the command line. If a source
' file has not been specified, show
' an error message and quit
Dim args As CommandArgs = ParseCommandLine(CmdArgs)
If Not args.Valid Then
Console.WriteLine(args.Message)
Return 1
End If
If File.Exists(args.SourceFile) Then
Dim finfo As New FileInfo(args.SourceFile)
reader = New StreamReader( _
finfo.FullName)
' If no target file has been specified, use the same
' name as the source file with the extension changed
' to .exe
If args.TargetFile = "" Then
args.TargetFile = finfo.Name.Replace( _
finfo.Extension, _
".exe" _
)
Else
' Validate the target filename
Dim finfoValid As FileInfo
Try
finfoValid = New FileInfo(args.TargetFile)
Catch ex As Exception
Console.WriteLine("Error: {0}", ex.Message)
Return 1
End Try
' Target file name must have
' an extension
If finfoValid.Extension = "" Then
args.TargetFile = finfoValid.Name & ".exe"
End If
End If
args.AssemblyName = finfo.Name.Replace( _
finfo.Extension, _
"" _
)
gen = new CodeGen( _
finfo.FullName, _
args.TargetFile, _
args.AssemblyName, _
args.DebugBuild _
)
Else
Console.Write( _
"Error: Could not find the file {0}.", _
args.SourceFile _
)
Return 1
End If
End If
parser = New Parser(reader, gen)
status = parser.Parse()
With status
If .Code <> 0 Then
Console.WriteLine( _
"Error at line {0}, column {1} : {2}", _
.Row, _
.Column, _
.Description)
Else
gen.Save()
Console.WriteLine("Done.")
End If
End With
Return status.Code
End Function
As you can see, the command line of the compiler requires a mini-parser in itself.
We parse two possible command-line parameters here. If the parameter /debug or /d (or –debug or –d) is passed on the command line, our compiler will perform a debug build. The parameter /out:<file>
allows us to specify the name of the output file.
Compile our compiler, and then use it to compile a debug and a release build from TestDebug.txt using the following command lines:
sicc /debug TestDebug.txt
sicc TestDebug.txt /out:TestRelease.exe
Only TestDebug.exe will be accompanied by a .pdb (or .mdb) file.
Conclusion
The CLR has many more features for debugging and tracing applications, but we won’t cover them here. The purpose of this chapter was to show how to emit "debuggable" code from our compiler; I think that purpose has been achieved.
In the next chapter, we are going to take a break, and re-design our compiler as I promised in the last chapter. See you then.
1. This is NOT true. CIL instructions never actually get executed; they get compiled into native machine code. However, for now, just imagine. ↩
2. Mono has a library called Cecil, which is more powerful than Reflection Emit, and can be used for the same purpose. That library does implement SetUserEntryPoint. If Cecil had existed when I had started writing this book, maybe I would have implemented the CodeGen class using it rather than Reflection Emit. ↩