Let's build a transpiler! Part 25
This is the twenty-fifth post in a series of building a transpiler.You can find the previous ones here.
Last time I said we would parse array's subscripts.
Let's first create a class to hold both lower and upper subscripts and change Variable class to have a list of them:
Public Class SubscriptPair
Option Explicit
Private UpperBound_ As Token
Public LowerBound As Token
Public Property Get UpperBound() As Token
Set UpperBound = UpperBound_
End Property
Public Property Set UpperBound(ByVal Value As Token)
If Not UpperBound_ Is Nothing Then Set LowerBound = UpperBound_
Set UpperBound_ = Value
End Property
End Class
Public Class Variable
(...)
Public Static Property Get Subscripts() As KeyedList
Dim Hidden As New KeyedList
Set Subscripts = Hidden
End Property
End Class
Now, we need to deal with expressions. I've been postponing it because it has been proved to be a hard topic for me.
I've had several false starts and dead-ends. I don't know if I'm too dumb or just not up to the task of taming VB's expressions.
Anyway, expressions are things that are values or produce values. We'll create a bunch of classes to deal with them, but first, let's create an interface so we can link them together later.
Public Class IExpression
Option Explicit
Public Enum ExpressionKind
ekLiteral
ekSymbol
ekFileHandle
ekTuple
ekUnaryExpr
ekBinaryExpr
End Enum
Private Sub Class_Initialize()
Err.Raise 5
End Sub
Public Property Get Kind() As ExpressionKind
End Property
End Class
VB6 does not have proper interfaces. We create a class, then other classes declare they implement that class' "interface."
In order to not let it be instantiated by mistake, we're raising an error in its constructor.
Now back to the different ways we can have expressions. Right out of the bat, there are some tokens that are values/expressions:
Numbers, strings, and date literals are expressions, as are a few keywords like True, False, Nothing, Empty, Null, Me, and the pseudo-keyword Void that we'll use for missing parameters.
To distinguish these tokens from the others, we'll create a Literal class:
Public Class Literal
Option Explicit
Implements IExpression
Public Value As Token
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekLiteral
End Property
End Class
Identifiers and escaped identifiers are expressions, too. We'll group them into a Symbol class:
Public Class Symbol
Option Explicit
Implements IExpression
Public Value As Token
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekSymbol
End Property
End Class
Filehandles will be represented by the FileHandle class:
Public Class FileHandle
Option Explicit
Implements IExpression
Public Value As Token
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekFileHandle
End Property
End Class
We can have expressions composed by unary operators - like Not something...
Public Class UnaryExpression
Option Explicit
Implements IExpression
Public Operator As Operator
Public Value As IExpression
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekUnaryExpr
End Property
End Class
...and expressions composed by binary operators, like lhs And rhs:
Public Class BinaryExpression
Option Explicit
Implements IExpression
Public LHS As IExpression
Public Operator As Operator
Public RHS As IExpression
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekBinaryExpr
End Property
End Class
For that to work, we need an Operator class:
Public Class Operator
Option Explicit
Public Value As Token
Public Property Get IsUnary() As Boolean
Select Case Value.Code
Case opAddressOf, opNew, opNot, opTypeOf, opId, opNeg, opWithDot, opWithBang
IsUnary = True
End Select
End Property
Public Property Get IsBinary() As Boolean
IsBinary = Not IsUnary
End Property
End Class
As I'm expecting our GetExpression function to grow a bit, I'm moving it to a new Expressionist class.
Because of that, I had to change the Fail, IsBreak, and IsOp methods from Private to Friend, and also all GetExpression calls to provide a Me argument.
An important change is that any call to NextToken immediately after a call to GetExpression has to be changed to recovering the token from Expresionnist's LastToken. The pattern is:
Dim Xp As Expressionist
Set Xp = New Expressionist
(...)
Set whatever = Xp.GetExpression(Me)
Set token = Xp.LastToken
Here it is Expressionist:
Private Class Expressionist
Option Explicit
Private LastToken_ As Token
Public Property Get LastToken() As Token
Set LastToken = LastToken_
End Property
Private Function Peek(ByVal Stack As KeyedList) As Variant
Set Peek = Stack(Stack.Count)
End Function
Private Function Pop(ByVal Stack As KeyedList) As Variant
Dim Index As Long
Index = Stack.Count
Set Pop = Stack(Index)
Stack.Remove Index
End Function
Public Function GetExpression(ByVal Parser As Parser) As IExpression
Rem TODO: Complete it.
End Function
End Class
Note that GetExpression now returns an IExpression instead of a Token.
Due to that, we'll need to update ParseConsts...
Set Cnt.Value = Xp.GetExpression(Me)
... and DataType...
Public FixedLength As IExpression
... and Parameter and Variable...
Public Init As IExpression
... and SubscriptPair...
Public Class SubscriptPair
Option Explicit
Private UpperBound_ As IExpression
Public LowerBound As IExpression
Public Property Get UpperBound() As IExpression
Set UpperBound = UpperBound_
End Property
Public Property Set UpperBound(ByVal Value As IExpression)
If Not UpperBound_ Is Nothing Then Set LowerBound = UpperBound_
Set UpperBound_ = Value
End Property
End Class
... and ConstConstruct and EnumerandConstruct...
Public Value As IExpression
... and ParseEnum:
Dim Lit As Literal
(...)
Set Lit = New Literal
Set Lit.Value = New Token
Lit.Value.Kind = tkIntegerNumber
Lit.Value.Text = CStr(Count)
Lit.Value.Suffix = "&"
Set Emd.Value = Lit
With that out of our way, let's deal with the code.
We will base ours on this Stack Overflow answer.
What follows is pretty much a translation from the answer to VB with these exceptions:
- We are taking advantage of the algorithm to disambiguate unary "+" and "-" from their binary counterparts.
- We don't have stack or queues, only KeyedList, so that's what we're using.
- Also, I've found it easier to have only "stacks", so no OutQueue.
- After we collected all expression tokens, we create an expression tree and return it as a result.
- We need to read one token too much to know when we're done with the expression. This token is made available to the calling code though the Expressionist's LastToken property.
Const EXPR_ERR = "Invalid expression"
Dim WantOperand As Boolean
Dim Cp As Integer
Dim Token As Token
Dim Sym As Symbol
Dim Lit As Literal
Dim Op As Operator
Dim Op2 As Operator
Dim OpStack As KeyedList
Dim OutStack As KeyedList
Dim Handle As FileHandle
Set OpStack = New KeyedList
Set OutStack = New KeyedList
WantOperand = True
Do
Set Token = Parser.NextToken
If WantOperand Then
WantOperand = False
Select Case Token.Kind
Case tkOperator
WantOperand = True
Select Case Token.Code
Case opSum
Token.Code = opId
Case opSubt
Token.Code = opNeg
Case opAddressOf, opNew, opNot, opTypeOf, opWithBang, opWithDot
Rem OK
Case Else
Parser.Fail Token, EXPR_ERR
End Select
Set Op = NewOperator(Token, Prefix)
OpStack.Add Op
Case tkLeftParenthesis
Rem Pseudo-operator
Set Op = NewOperator(Token)
OpStack.Add Op
WantOperand = True
Case tkIdentifier, tkEscapedIdentifier
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym
Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, _
tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case tkFileHandle
Set Handle = New FileHandle
Set Handle.Value = Token
OutStack.Add Handle
Case tkKeyword
Select Case Token.Code
Case kwTrue, kwFalse, kwNothing, kwEmpty, kwNull, kwMe
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case Else
Parser.Fail Token, EXPR_ERR
End Select
Case Else
Parser.Fail Token, EXPR_ERR
End Select
Else
If Parser.IsBreak(Token) Then
While OpStack.Count > 0
Move OpStack, OutStack
Wend
Exit Do
End If
Select Case Token.Kind
Case tkOperator
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf, opCompAnd, opCompEqv, _
opCompImp, opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, opCompMul, _
opCompDiv, opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat
Parser.Fail Token, EXPR_ERR
End Select
Set Op2 = NewOperator(Token)
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Cp = ComparePrecedence(Op, Op2)
If Cp = -1 Then Exit Do
Move OpStack, OutStack, Op
Loop
OpStack.Add Op2
WantOperand = True
Case tkLeftParenthesis
Rem Apply operator
Set Token = New Token
Token.Kind = tkOperator
Token.Code = opApply
Set Op = NewOperator(Token)
OpStack.Add Op
Case tkRightParenthesis
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
If IsOp(Op.Value, opApply) Then Exit Do
Move OpStack, OutStack, Op
Loop
If OutStack.Count = 0 Then Parser.Fail Token, EXPR_ERR
Pop OpStack
Case tkListSeparator
Do While OpStack.Count > 0
Set Op = Pop(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Move OpStack, OutStack, Op
Loop
If OutStack.Count = 0 Then Parser.Fail Token, EXPR_ERR
WantOperand = True
Case Else
Parser.Fail Token, EXPR_ERR
End Select
End If
Loop
Do While OutStack.Count > 1
Set Op = Pop(OutStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Move OpStack, OutStack, Op
Loop
Set LastToken_ = Token
Set GetExpression = Pop(OutStack)
End Function
Private Sub Move(ByVal OpStack As KeyedList, ByVal OutStack As KeyedList, Optional ByVal Op As Operator)
Dim IExpr As IExpression
Dim Uni As UnaryExpression
Dim Bin As BinaryExpression
If Op Is Nothing Then Set Op = Peek(OpStack)
If Op.IsUnary Then
Set Uni = New UnaryExpression
Set Uni.Operator = Op
Set Uni.Value = Pop(OutStack)
Set IExpr = Uni
Else
Set Bin = New BinaryExpression
Set Bin.Operator = Op
Set Bin.RHS = Pop(OutStack)
Set Bin.LHS = Pop(OutStack)
Set IExpr = Bin
End If
OutStack.Add IExpr
Pop OpStack
End Sub
Rem This goes into Globals module.
Public Function ComparePrecedence(ByVal LeftOp As Operator, ByVal RightOp As Operator) As Integer
Dim LHS As Integer
Dim RHS As Integer
LHS = Precedence(LeftOp)
RHS = Precedence(RightOp)
If LHS = RHS Then Exit Function
If LHS < RHS Then
ComparePrecedence = -1
Else
ComparePrecedence = 1
End If
End Function
Private Function Precedence(ByVal Op As Operator) As Integer
Select Case Op.Value.Code
Case opApply
Precedence = 19
Case opPow
Precedence = 18
Case opAddressOf, opNew
Precedence = 17
Case opId, opNeg, opDot, opBang, opWithDot, opWithBang, opTypeOf
Precedence = 16
Case opLSh, opRSh, opURSh
Precedence = 15
Case opMul, opDiv
Precedence = 14
Case opIntDiv
Precedence = 13
Case opMod
Precedence = 12
Case opSum, opSubt
Precedence = 11
Case opConcat
Precedence = 10
Case opGt, opGe, opEq, opLe, opLt, opNe, opIsNot, opIs, opLike, opTo
Precedence = 9
Case opNot
Precedence = 8
Case opAnd, opAndAlso
Precedence = 7
Case opOr, opOrElse
Precedence = 6
Case opXor
Precedence = 5
Case opEqv
Precedence = 4
Case opImp
Precedence = 3
Case opNamed
Precedence = 2
Case opCompSum, opCompSubt, opCompMul, opCompDiv, opCompIntDiv, opCompPow, opCompConcat, opCompLSh, _
opCompRSh, opCompURSh, opCompAnd, opCompEqv, opCompImp, opCompMod, opCompOr, opCompXor
Precedence = 1
Case Else
Debug.Assert False
End Select
End Function
Rem This goes into Messages.
Public Property Get Msg065() As String
Msg065 = "Invalid expression"
End Property
Now we will enhance GetExpression to deal with subscripts. Here it is the updated code:
Rem Add it to classe's declaration area.
Public CanHaveTo As Boolean
Public Function GetExpression(ByVal Parser As Parser) As IExpression
Dim HadTo As Boolean
Dim WantOperand As Boolean
Dim Cp As Integer
Dim Token As Token
Dim Sym As Symbol
Dim Lit As Literal
Dim Op As Operator
Dim Op2 As Operator
Dim OpStack As KeyedList
Dim OutStack As KeyedList
Dim Handle As FileHandle
Set OpStack = New KeyedList
Set OutStack = New KeyedList
WantOperand = True
Do
Set Token = Parser.NextToken
If WantOperand Then
WantOperand = False
Select Case Token.Kind
Case tkOperator
WantOperand = True
Select Case Token.Code
Case opSum
Token.Code = opId
Case opSubt
Token.Code = opNeg
Rem Remaining unary operators.
Case opAddressOf, opNew, opNot, opTypeOf, opWithBang, opWithDot
Rem OK
Case Else
Exit Do
End Select
Set Op = NewOperator(Token)
OpStack.Add Op
Case tkLeftParenthesis
Rem Pseudo-operator.
Set Op = NewOperator(Token)
OpStack.Add Op
WantOperand = True
Case tkIdentifier, tkEscapedIdentifier
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym
Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, _
tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case tkFileHandle
Set Handle = New FileHandle
Set Handle.Value = Token
OutStack.Add Handle
Case tkKeyword
Select Case Token.Code
Case kwTrue, kwFalse, kwNothing, kwEmpty, kwNull, kwMe
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case Else
Exit Do
End Select
Case Else
Exit Do
End Select
Else
If Parser.IsBreak(Token) Then
While OpStack.Count > 0
Move OpStack, OutStack
Wend
Exit Do
End If
Select Case Token.Kind
Case tkOperator
Rem Unary and compound operators.
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf, opCompAnd, opCompEqv, opCompImp, _
opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, opCompMul, opCompDiv, _
opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat
Parser.Fail Token, Msg065
End Select
Set Op2 = NewOperator(Token)
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Cp = ComparePrecedence(Op, Op2)
If Cp = -1 Then Exit Do
Move OpStack, OutStack, Op
Loop
OpStack.Add Op2
WantOperand = True
Case tkLeftParenthesis
Set Token = New Token
Token.Kind = tkOperator
Token.Code = opApply
Set Op = NewOperator(Token)
OpStack.Add Op
Case tkRightParenthesis
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
If Parser.IsOp(Op.Value, opApply) Then Exit Do
Move OpStack, OutStack, Op
Loop
If OpStack.Count = 0 Then Exit Do
Pop OpStack
Case tkKeyword
Select Case Token.Code
Case kwTo
If Not CanHaveTo Or HadTo Then Err.Raise vbObjectError + 13
HadTo = True
Token.Kind = tkOperator
Token.Code = Parser.Scanner.Operators.IndexOf(vTo)
OpStack.Add NewOperator(Token)
WantOperand = True
Case Else
Debug.Assert False
End Select
Case Else
Exit Do
End Select
End If
Loop
Set LastToken_ = Token
Do While OutStack.Count > 1
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Move OpStack, OutStack, Op
Loop
If OutStack.Count = 1 Then Set GetExpression = Pop(OutStack)
End Function
You may have noticed we introduced operators To and "apply":
Public Enum OperatorConstants
(...)
opOrElse
opTo
opTypeOf
(...)
opApply
End Enum
Private Sub Class_Initialize()
(...)
Values = Array(vAddressOf, vAndAlso, vIs, vIsNot, vLike, vNew, vNot, vOrElse, vTo, vTypeOf, _
(...), "&=", "")
(...)
End Sub
Now we can change ParseDim to take variable's subscripts into consideration:
Private Sub ParseDim( _
ByVal Access As Accessibility, _
ByVal Entity As Entity, _
Optional ByVal InsideProc As Boolean, _
Optional ByVal Token As Token _
)
Dim WasArray As Boolean
Dim Var As Variable
Dim Xp As Expressionist
Dim Expr As IExpression
Dim Bin As BinaryExpression
Dim Uni As UnaryExpression
Dim Subs As SubscriptPair
Dim Lit As Literal
Dim Tkn As Token
If InsideProc Then: If Access = AccessPublic Or Access = AccessPrivate Then Fail Token, Msg063
If Token Is Nothing Then Set Token = NextToken
Set Xp = New Expressionist
Xp.CanHaveTo = True
Do
Set Var = New Variable
Var.Access = Access
If IsKw(Token, kwWithEvents) Then
If Not Entity.IsClass Then Fail Token, Msg016
If InsideProc Then Fail Token, Msg063
Var.HasWithEvents = True
Set Token = NextToken
End If
If Not IsProperId(Token, CanHaveSuffix:=True) Then Fail Token, Msg061, Msg003
Set Var.Id.Name = Token
Set Token = NextToken
WasArray = False
If Token.Kind = tkLeftParenthesis Then
Do
Set Expr = Xp.GetExpression(Me)
Set Token = Xp.LastToken
If Not Expr Is Nothing Then
Select Case Expr.Kind
Case ekLiteral, ekSymbol, ekUnaryExpr
Set Subs = New SubscriptPair
Set Subs.LowerBound = SynthLower(Entity)
Set Subs.UpperBound = Expr
Case ekBinaryExpr
Set Bin = Expr
Set Subs = New SubscriptPair
If IsOp(Bin.Operator.Value, opTo) Then
Set Subs.LowerBound = Bin.LHS
Set Subs.UpperBound = Bin.RHS
Else
Set Subs.LowerBound = SynthLower(Entity)
Set Subs.UpperBound = Expr
End If
Case Else
Fail Token, Msg065
End Select
End If
Var.Subscripts.Add Subs
If Token.Kind <> tkListSeparator Then Exit Do
Loop
If Token.Kind <> tkRightParenthesis And _
Xp.LastToken.Kind <> tkRightParenthesis Then _
Fail Token, Msg057
WasArray = True
Set Token = NextToken
End If
If IsKw(Token, kwAs) Then
If Var.Id.Name.Suffix <> vbNullChar Then Fail Token, Msg024
Set Token = NextToken
If IsOp(Token, kwNew) Then
Var.HasNew = True
Set Token = NextToken
End If
If Not IsProperDataType(Token) Then Fail Token, Msg061, Msg025
Set Var.DataType = NewDataType(Token)
If Var.HasNew And Var.DataType.Id.Name.Kind = tkKeyword Then _
Fail Token, Msg062, Msg059
Set Token = NextToken
If IsOp(Token, opDot) Then
Set Token = NextToken
If Not IsProperDataType(Token) Then Fail Token, Msg061, Msg003
Set Var.DataType.Id.Name = Token
Set Token = NextToken
End If
ElseIf Var.Id.Name.Suffix <> vbNullChar Then
Set Var.DataType = FromChar(Var.Id.Name.Suffix)
Else
Set Var.DataType = Entity.DefTypes(NameOf(Var.Id.Name))
End If
Var.DataType.IsArray = WasArray
If Var.HasNew And Var.DataType.IsArray Then Fail Token, Msg064
If IsOp(Token, opEq) Then
Set Var.Init = Xp.GetExpression(Me)
Set Token = Xp.LastToken
End If
Entity.Vars.Add Var, NameOf(Var.Id.Name)
If IsBreak(Token) Then Exit Do
If Token.Kind <> tkListSeparator Then Fail Token, Msg061, ","
Set Token = NextToken
Loop
End Sub
It depends on two utilities: SynthLower and NewOperator:
Rem This belongs to Parser
Private Function SynthLower(ByVal Entity As Entity) As IExpression
Dim Lit As Literal
Dim Tkn As Token
Set Tkn = New Token
Tkn.Kind = tkIntegerNumber
Tkn.Text = CStr(Entity.OptionBase)
Set Lit = New Literal
Set Lit.Value = Tkn
Set SynthLower = Lit
End Function
Rem This belongs to Globals module
Public Function NewOperator(ByVal Token As Token) As Operator
Dim Result As Operator
Set Result = New Operator
Set Result.Value = Token
Set NewOperator = Result
End Function
Finally we are able to parse something like "Private Letters_(0 To LAST_INDEX) As Token", but not "ReDim CodePoints_(0 To Size \ SizeOf(kwInteger) - 1) As Integer".
We are not able to deal with function calls yet.
Next week, we'll parse Types and wrap up with the declaration area.
Andrej Biasic
2021-02-10