Booleans Part II
Introduction
In the previous chapter, we gave our parser the ability to parse and translate Boolean conditions. In this chapter, we will extend it to recognize complex conditions involving logical operators.
Goal
Complex conditions are one or more conditions, combined with logical operators which affect the final value of these combinations. Common logical operators are:
Operator | What it does |
---|---|
AND | Combines two conditions together, usually using the form <condition> AND <condition> . The combined expression evaluates to true if both the conditions are true , and false in every other case. |
OR | Combines two conditions together, usually using the form <condition> OR <condition> . The combined expression evaluates to true if either one or both the conditions are true , and false otherwise. |
NOT | Negates a single Boolean condition, using the form NOT <condition> . The combined expression is false if the condition is true , and vice versa. |
The condition, in each case, can be a simple condition or a complex one.
An expression can contain any permutation and combination of conditions and operator, such as <condition> AND <condition> OR <condition> AND NOT <condition>
.
In processing such an expression, NOT
takes precedence over AND
, which takes precedence over OR
. So, in the expression :
<condition 2B> OR NOT <condition 2> AND <condition B>
,first, <condition 2>
will be evaluated and negated, then the result would be AND
-ed (pardon the expression) with condition B, and finally the result of that will be OR
-ed with <condition 2B>
.
We will also allow for brackets to increase priority, as we did in the case of numeric expressions. So, in the expression:
(<condition 2B> OR NOT <condition 2>) AND <condition B>
, first <condition 2>
will be negated, then the result OR
-ed with <condition 2B>
, and then the result of that AND
-ed with <condition B>
.
The Approach
The approach is similar to the one we took for mathematical expressions; we will deal with operator precedence by splitting the complex condition into smaller parts, like the terms and factors in Chapter 3. So, let us define these parts:
- BooleanFactor: As of right now, a BooleanFactor is a Condition. We will expand this below.
- NotOperation: this can be two things; a BooleanFactor, or a NotOperator.
- NotOperator this is the word
NOT
(uppercase or lowercase), or the symbol!
, followed by a followed by a BooleanFactor. - AndOperation: this is a NotOperation, followed by zero or more _AndOperator_s.
- AndOperator: this is either the word
AND
(uppercase or lowercase), or the symbol&
, followed by a NotOperation. - OrOperation: this is an AndOperation, followed by zero or more _OROperator_s.
- OrOperator: this is either the word
OR
(uppercase or lowercase), or the symbol|
, followed by an AndOperation.
As you can see, this is exactly the same as the way we process mathematical expressions. The only thing missing in the definition so far are the brackets, for precedence. We will handle these in the definition of BooleanFactor, by recursively calling the topmost level, BooleanExpression, just like we did in the case of factors in mathematical expressions. BooleanExpression itself, currently equivalent to Condition, needs to be modified to call the top of the complex condition recursive-descent hierarchy: OrOperation.
Here's the complete BNF for boolean expressions.
<booleanexpression> ::= <oroperation>
<oroperation> ::= <andoperation><oroperator>*
<oroperator> ::= <orsymbol><andoperation>
<orsymbol> ::= "OR"|"or"|"|"
<andoperation> ::= <notoperation><andoperator>*
<andoperator> ::= <andsymbol><notoperation>
<andsymbol> ::= "AND"|"and"|"&"
<notoperation> ::= <booleanfactor>|<notoperator>
<notoperator> ::= <notsymbol><booleanfactor>
<notsymbol> ::= "NOT"|"not"|"!"
<booleanfactor> ::= <condition>|<bracketedbooleanexpression>
<bracketedbooleanexpression> ::= "("<booleanexpression>")"
<condition> ::= <expression><reloperator><expression>
<reloperator> ::= "="|"=="|"==="|"<>"|"!="|"!=="|
"<"|"<="|"=<"|">"|">=","=>"
First Step
Before we parse any of the parts defined above, we have to enable CodeGen to emit logical operators. Let's start with that.
Modifying CodeGen
There are two IL instructions which we need to use. These are:
Opcode | What it does |
---|---|
And | Pops the last two values on the stack, performs an AND operation on them, and pushes the value back on the stack. An AND operation is a bitwise operation, in which each corresponding binary digit of the two values are compared against the other, and the corresponding bit of the result is 1 if they both are 1, and 0 otherwise. This operation could be applied to any integer number. For example, 4 AND 6 evaluates to 4 (100 AND 110 in binary evaluates to 100 ), and 4 AND 2 evaluates to 0 (100 AND 010 = 000 ). Before using this opcode, we will ensure that the last two values pushed are 'Boolean' values; i.e., either 1 or 0. Thus the result will be 1 (true ) if both values pushed are 1, and 0 otherwise, which is exactly what we want. |
Or | Pops the last two values on the stack, performs an OR operation on them, and pushes the value back on the stack. OR is also a bitwise operation. Since we ensure that the opcode will be emitted only after two 'boolean' integer values, we will get the effect we want. |
For NOT
, we use the technique we already have; we negate the last value on the stack by comparing it with 0; in other words, we call the NegateComparison
method we have already written.
Add the following to CodeGen.vb:
Public Sub EmitBitwiseAnd()
m_ILGen.Emit(OpCodes.And)
End Sub
Public Sub EmitBitwiseOr()
m_ILGen.Emit(OpCodes.Or)
End Sub
Public Sub EmitLogicalNot()
NegateComparison()
End Sub
Parsing Complex Conditions
As we saw in the interlude chapter, BNF translates very nicely to parser code. The parts concerned with terminal symbols get handled by Is and Scan methods. Let's do those first.
Add the following to the appropriate parts of Parser.vb.
Private Function IsNotOperator(ByVal c As Char) As Boolean
Return "nNoOtT!".IndexOf(c) > -1
End Function
Private Function IsAndOperator(ByVal c As Char) As Boolean
Return "aAnNdD&".IndexOf(c) > -1
End Function
Private Function IsOrOperator(ByVal c As Char) As Boolean
Return "oOrR|".IndexOf(c) > -1
End Function
Private Sub ScanNotOperator()
m_CurrentTokenBldr = New StringBuilder
Do While IsNotOperator(LookAhead)
m_CurrentTokenBldr.Append(LookAhead)
m_CharPos += 1
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "not", "!"
' Valid NOT operator
Case Else
m_CurrentTokenBldr = New StringBuilder
End Select
End Sub
Private Sub ScanAndOperator()
m_CurrentTokenBldr = New StringBuilder
Do While IsAndOperator(LookAhead)
m_CurrentTokenBldr.Append(LookAhead)
m_CharPos += 1
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "and", "&"
' Valid AND operator
Case Else
m_CurrentTokenBldr = New StringBuilder
End Select
End Sub
Private Sub ScanOrOperator()
m_CurrentTokenBldr = New StringBuilder
Do While IsOrOperator(LookAhead)
m_CurrentTokenBldr.Append(LookAhead)
m_CharPos += 1
Loop
Select Case CurrentToken.ToLowerInvariant()
Case "or", "|"
' Valid OR operator
Case Else
m_CurrentTokenBldr = New StringBuilder
End Select
End Sub
And now for the parsers. As you have probably guessed, CodeGen gets called after successfully parsing the Operators.
Add the following to the Parser region of Parser.vb.
Private Function ParseBooleanFactor( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
If LookAhead.Equals("("c) Then
SkipCharacter()
result = ParseBooleanExpression()
If result.Code = 0 Then
If Not LookAhead.Equals(")"c) Then
result = CreateError(1, ")")
Else
SkipCharacter()
End If
End If
Else
result = ParseCondition(lastexpressiontype)
End If
SkipWhiteSpace()
Return result
End Function
Private Function ParseNotOperator() _
As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
result = ParseBooleanFactor()
If result.Code = 0 Then
m_Gen.EmitLogicalNot()
End If
Return result
End Function
Private Function ParseNotOperation( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
' If lastexpressiontype has a value, it means we are halfway
' through parsing a boolean condition, and have just met a
' relational operator. A NOT operator cannot appear at this
' point, so we just continue to parse a boolean factor
If lastexpressiontype IsNot Nothing Then
result = ParseBooleanFactor(lastexpressiontype)
Else
If IsNotOperator(LookAhead) Then
ScanNotOperator()
If TokenLength = 0 Then
result = CreateError(1, _
"NOT")
Else
result = ParseNotOperator()
End If
Else
result = ParseBooleanFactor()
End If
End If
Return result
End Function
Private Function ParseAndOperator() _
As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
result = ParseNotOperation()
If result.Code = 0 Then
m_Gen.EmitBitwiseAnd()
End If
Return result
End Function
Private Function ParseAndOperation( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
result = ParseNotOperation(lastexpressiontype)
Do While result.Code = 0 And _
IsAndOperator(LookAhead)
ScanAndOperator()
If TokenLength = 0 Then
result = CreateError(1, _
"AND")
Else
result = ParseAndOperator()
SkipWhiteSpace()
End If
Loop
Return result
End Function
Private Function ParseOrOperator() _
As ParseStatus
Dim result As ParseStatus
SkipWhiteSpace()
result = ParseAndOperation()
If result.Code = 0 Then
m_Gen.EmitBitwiseOr()
End If
Return result
End Function
Private Function ParseOrOperation( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
result = ParseAndOperation(lastexpressiontype)
Do While result.Code = 0 And _
IsOrOperator(LookAhead)
ScanOrOperator()
If TokenLength = 0 Then
result = CreateError(1, _
"OR")
Else
result = ParseOrOperator()
SkipWhiteSpace()
End If
Loop
Return result
End Function
Notice the optional parameter indicating the type of the last expression parsed, which gets passed right down the hierarchy to ParseCondition
. As we saw in the last chapter, this is because we don't know that we are in a boolean expression until we hit the first relational or logical operator.
Next, we must change our definition of ParseBooleanExpression
. Currently, it directly calls ParseCondition
. We need to modify it to call the top level of the complex condition recursive-descent hierarchy, which is ParseOrOperation
. Just like the BNF shows.
Make the change in Parser.vb.
Private Function ParseBooleanExpression( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
result = ParseOrOperation(lastexpressiontype)
If result.Code = 0 Then
m_LastTypeProcessed = _
Type.GetType( _
"System.Boolean")
End If
Return result
End Function
Finally, we need to change the very top and bottom of the boolean recursive-descent hierarchy, ParseExpression
and ParseCondition
respectively, to ensure that boolean expressions can be used at any level of nesting. We need to modify ParseExpression
to ensure the following two things:
- So far, we have defined that expressions can begin with either a numeric character (a digit or a + or - sign), or a string character (double quote). We will now also specify that an expression can begin with a NOT operator, in which case it is a boolean expression.
- If
ParseExpression
identifies (via the lookahead character) the expression to be numeric or string, it calls the relevant parse function, and then checks if the lookahead character is a relational operator. This check should not be performed at all ifParseExpression
has been called fromParseCondition
, because in that case,ParseCondition
itself will scan and parse the relational operator. For this change, we will also need to changeParseCondition
.
Here are the changed ParseCondition
and ParseExpression
functions. Replace them in Parser.vb.
Private Function ParseCondition( _
Optional lastexpressiontype As Type = Nothing _
) As ParseStatus
Dim result As ParseStatus
If lastexpressiontype Is Nothing Then
result = ParseExpression(Type.GetType("System.Boolean"))
Else
m_LastTypeProcessed = lastexpressiontype
result = CreateError(0,"")
End If
If result.Code = 0 Then
ScanRelOperator()
If TokenLength = 0 Then
result = CreateError(1, "a relational operator.")
Else
result = ParseRelOperator()
SkipWhiteSpace()
End If
End If
Return result
End Function
Private Function ParseExpression( _
Optional ByVal expressiontype _
As Type = Nothing) _
As ParseStatus
Dim result As ParseStatus
' Since we are doing the work of the scanner by using the
' lookahead character, we need to initialize the token
' builder
m_CurrentTokenBldr = New StringBuilder
If LookAhead.Equals(""""c) Then
result = ParseStringExpression()
ElseIf IsNumeric(LookAhead) Then
result = ParseNumericExpression()
ElseIf IsNotOperator(LookAhead) Then
result = ParseBooleanExpression()
ElseIf LookAhead.Equals("("c) Then
' For now, assuming only numeric expressions can use ()
result = ParseNumericExpression()
Else
result = CreateError(1, "a boolean, numeric or string expression")
End If
If result.Code = 0 Then
If expressiontype Is Nothing Then
' If a relational operator follows the expression just parsed
' we are in the middle of a Boolean expression. Our work is
' not yet done.
If IsRelOperator(LookAhead) Then
' Parse a boolean expression, letting it know that the
' first part has already been parsed.
result = ParseBooleanExpression(m_LastTypeProcessed)
End If
End If
End If
Return result
End Function
Testing
To test the large number of possibilities we have now enabled, we will once again take advantage of our compiler's ability to process whole files. Type the following into a file called testcomplex.txt. Make sure there are no blank lines in the file - not even at the end. A blank line is not a valid expression, and as such will be rejected by our compiler, which at this point parses expressions only.
"Testing complex boolean conditions"
"----------------------------------"
"1=1"
1=1
"1=1 and 1=1"
1=1 and 1=1
"1=1 and 1=2"
1=1 and 1=2
"1=2 and 1=1"
1=2 and 1=1
"1=2 and 1=2"
1=2 and 1=2
"1=1 or 1=1"
1=1 or 1=1
"1=2 or 1=1"
1=2 or 1=1
"1=1 or 1=2"
1=1 or 1=2
"1=2 or 1=2"
1=2 or 1=2
"not 1=2"
not 1=2
"not 1=1"
not 1=1
"1=1 or 1=2 and 1=1"
1=1 or 1=2 and 1=1
"1=2 or 1=2 and 1=1"
1=2 or 1=2 and 1=1
"1=1 and not 1=2"
1=1 and not 1=2
"1=2 or not 1=1"
1=2 or not 1=1
"1=1 and not (1=2 or not 1=1)"
1=1 and not (1=2 or not 1=1)
Now, compile our compiler. You may want to review the instructions in the Development Environment chapter.
Compile with:
vbc /out:Compiler.exe Compiler.vb Parser.vb CodeGen.vb ParseStatus.vb
Run using:
Compiler.exe testcomplex.txt
This will produce testcomplex.exe. When you run that, your output should look like this:
Testing complex boolean conditions
----------------------------------
1=1
True
1=1 and 1=1
True
1=1 and 1=2
False
1=2 and 1=1
False
1=2 and 1=2
False
1=1 or 1=1
True
1=2 or 1=1
True
1=1 or 1=2
True
1=2 or 1=2
False
not 1=2
True
not 1=1
False
1=1 or 1=2 and 1=1
True
1=2 or 1=2 and 1=1
False
1=1 and not 1=2
True
1=2 or not 1=1
False
1=1 and not (1=2 or not 1=1)
True
Caveat - Brackets
There is one subtle bug in our expression parser. Boolean expressions may have as many brackets as required, but an expression currently cannot start with a bracket. At the moment, ParseExpression
assumes that brackets can begin only a numeric expression.
We will leave this bug in, for the moment, and formally correct it later. The reader is welcome to solve it as an exercise in the meantime.
Note - Short Circuiting
As our complex conditions stand right now, simple conditions on both sides of a logical operator are evaluated in every case. However, most modern languages do not treat logical operators this way. Instead, they use a process called short-circuiting, which involves evaluating only as much of a complex condition as is necessary. For instance, if the condition on the left side of an AND
operation is false
, there is no need to check the one on the right; the complex condition is guaranteed to be false
.
This can turn out to be quite an advantage in many situations. For example, the expression:
1=2 AND 2=1/0
will cause a divide by zero run-time error in our current implementation. In an implementation which uses short-circuiting, the run-time error would never happen.
In fact, if you go by pure definitions, short-circuiting behavior is what is really called logical. The kind of behavior we have implemented is called bitwise. The only operator which currently behaves is a pure logical manner is NOT
.
However, our current behavior works for our purposes. So, at the moment, purity be damned, we will use the current implementation. In a later chapter, we will add two more operators which will emit true logical AND
and OR
behavior.
Conclusion
At this point, our compiler correctly processes boolean conditions of arbitrary complexity. However, comparing constants against one another is not very exciting. We already know the results. Booleans are really useful when they involve variables.
Variables will be introduced in Chapter 7. But just before that, we will take an important decision about the nature of the "language" that our compiler will compile, and introduce a subject which will implement that decision: statements.