Let's build a transpiler! Part 31
This is the thirty-first post in a series of building a transpiler.You can find the previous ones here.
Last time I said we'd parse implicit Let and Call statements.
I was planning to tackle For, While, etc., but ended up playing with this instead, so be it!
Every VB line statement starts with a keyword, except for implicit Let and Call ones.
When we start parsing a new line, if we get an identifier instead of a keyword, it can be that it will be assigned some value further on.
I'm talking about something like this:
A = B
This is equivalent to an explicit Let:
Let A = B
We can't be sure until we find that equal sign in the correct place, though.
But there's a second option to getting an identifier. Maybe it is an implicit call to some procedure?
A B
This is equivalent to the following explicit Call:
Call A(B)
At first, it may seem that to figure it out the only thing we need to do is check what comes next after the identifier, but consider the following snippet:
A.B(C) = D
In this case, what comes next is a dot operator...
So, we must consider that we have an expression (instead of an identifier) that happens to start with an identifier, and then we'll have an equal sign.
But what if we have something like the snippet below?
A(B).C D
Then we have an expression that ends up being an implicit call to some method C having D as argument.
But before jumping to code, let's see one more ambiguity we have so far.
Visual Basic does not have an assignment operator that's different from the equal-to operator as other languages have.
If your background is C, the code below will be misleading:
A = B = C = D
What it does in VB is:
- Compare B to C.
- Compare the result of the comparison above with D.
- Assign the result of the second comparison above to A.
Let's start by completing LetConstruct and CallConstruct classes:
Public Class CallConstruct
Option Explicit
Implements IStmt
Implements IExpression
Private Arguments_ As KeyedList
Public LHS As IExpression
Private Sub Class_Initialize()
Set Arguments_ = New KeyedList
Set Arguments_.T = New ExprValidator
End Sub
Public Property Get Arguments() As KeyedList
Set Arguments = Arguments_
End Property
Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekIndexer
End Property
Private Property Get IStmt_Kind() As StmtNumbers
IStmt_Kind = snCall
End Property
End Class
Public Class LetConstruct
Option Explicit
Implements IStmt
Public Name As IExpression
Public Operator As Operator
Public Value As IExpression
Private Property Get IStmt_Kind() As StmtNumbers
IStmt_Kind = snLet
End Property
End Class
Now we'll create a GetStmt procedure in Expressionist.
Its GetExpression method is unable to handle function calls - it does not know what to do when it sees an unexpected left parenthesis.
GetStmt will consume it, collect the arguments, check if there is a closing parenthesis, then go back parsing the remaining of the expression.
When it finds an assignment operator (or any derivative one like +=, /=, etc.) or something that resembles an argument, it will stop and return what has been parsed so far, so we get the left-hand part of the assignment or call.
But only if we let it do it through a new property FullMode.
Public Function GetStmt(ByVal Parser As Parser, ByVal Token As Token, Optional ByVal LookAhead As Token) As IStmt
Dim Done As Boolean
Dim Result As IStmt
Dim Sym As Symbol
Dim Name As IExpression
Dim Asg As LetConstruct
Dim Xp As Expressionist
Dim Exec As CallConstruct
Dim Bin As BinaryExpression
Set Xp = New Expressionist
Set Sym = New Symbol
Set Sym.Value = Token
Set Name = Sym
If LookAhead Is Nothing Then
Set Token = Parser.NextToken
Else
Set Token = LookAhead
End If
Do
Done = True
Select Case Token.Kind
Case tkLeftParenthesis
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser)
If Token.Kind = tkRightParenthesis Then Set Token = Parser.NextToken
Set Name = Exec
Rem Let's iterate again
Done = False
Case tkOperator
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf, opWithDot, opWithBang
Rem Operator is being passed to CollectArgs through Token argument.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case opDot
Set Bin = New BinaryExpression
Set Bin.Operator = NewOperator(Token)
Set Bin.LHS = Name
Set Token = Parser.NextToken
If Token.Kind <> tkIdentifier And Token.Kind <> tkEscapedIdentifier Then Exit Do
Set Sym = New Symbol
Set Sym.Value = Token
Set Bin.RHS = Sym
Set Name = Bin
Set Token = Parser.NextToken
Done = False
Case opEq
Set Asg = New LetConstruct
Set Asg.Operator = NewOperator(Token)
Set Asg.Name = Name
Xp.FullMode = True
Set Asg.Value = Xp.GetExpression(Parser)
Set Token = Xp.LastToken
If Asg.Value Is Nothing Then Exit Do
Set Result = Asg
Case opSum
Rem Identity operator. We'll ignore it.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser)
Set Result = Exec
Case opSubt
Rem Operator is being passed to CollectArgs through Token argument.
Token.Code = opNeg
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case opConcat, opCompAnd, opCompEqv, opCompImp, opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, _
opCompMul, opCompDiv, opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat
Set Asg = New LetConstruct
Set Asg.Operator = NewOperator(Token)
Set Asg.Name = Name
Xp.FullMode = True
Asg.Value = Xp.GetExpression(Parser)
If Asg.Value Is Nothing Then Exit Do 'We'll return Nothing to sign a problem.
Set Result = Asg
End Select
Case tkIdentifier, tkEscapedIdentifier
Set Exec = New CallConstruct
Set Exec.LHS = Name
Rem Identifier is being passed to CollectArgs through Token
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case tkKeyword
Rem Keyword is being passed to CollectArgs through Token
Select Case Token.Code
Case kwByVal
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case kwDate, kwString
Token.Kind = tkIdentifier
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case kwEmpty, kwFalse, kwMe, kwNothing, kwNull, kwTrue
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case Else
Exit Do
End Select
Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Exec = New CallConstruct
Set Exec.LHS = Name
Rem Literal is being passed to CollectArgs through Token
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case tkListSeparator
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec
Case Else
If Not Parser.IsBreak(Token) Then Exit Do
Rem Method call with no arguments.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Result = Exec
End Select
Loop Until Done
Set LastToken_ = Token
Debug.Assert Parser.IsBreak(Token) Or Token.Code = kwElse
Set GetStmt = Result
End Function
Private Function CollectArgs(ByVal Args As KeyedList, ByVal Parser As Parser, Optional ByVal Token As Token) As Token
Dim Lit As Literal
Dim Expr As IExpression
Dim Xp As Expressionist
Dim Tkn As Token
Set Xp = New Expressionist
Xp.FullMode = True
If Not Token Is Nothing Then
If Token.Kind = tkListSeparator Then
Token.Kind = tkKeyword
Token.Code = kwVoid
Set Lit = New Literal
Set Lit.Value = Token
Args.Add Lit
Set Token = Nothing
End If
End If
Do
Set Expr = Xp.GetExpression(Parser, Token)
Set Token = Xp.LastToken
If Expr Is Nothing And Token.Kind = tkListSeparator Then
Set Tkn = New Token
Tkn.Column = Token.Column
Tkn.Line = Token.Line
Tkn.Kind = tkKeyword
Tkn.Code = kwVoid
Set Lit = New Literal
Set Lit.Value = Tkn
Set Expr = Lit
End If
Args.Add Expr
If Token.Kind = tkRightParenthesis Then Exit Do
If Token.Kind <> tkListSeparator Then Exit Do
Set Token = Nothing
Loop
Set CollectArgs = Token
End Function
What we are doing here is accumulating names (identifiers) connected by dots and dealing with the eventual open parenthesis.
As we are not using the GetExpression method, we had to replicate some of its functionalities here, like reclassifying keywords to operators, expressions, or functions, for instance.
We'll get back to Expressionist in a moment.
These are the changes in ParseBody.
Private Function ParseBody( _
ByVal Entity As Entity, _
ByVal Body As KeyedList, _
ByVal ClosingToken As Long, _
Optional ByVal SingleLine As Boolean, _
Optional ByVal LookAhead As Token _
) As Token
Dim Token As Token
Dim Stmt As IStmt
Dim Xp As Expressionist
Dim Label As LabelConstruct
Dim LinNum As LineNumberConstruct
Set Xp = New Expressionist
Do
If LookAhead Is Nothing Then
Set Token = SkipLineBreaks
Else
Set Token = LookAhead
Set LookAhead = Nothing
If IsBreak(Token) Then Set Token = SkipLineBreaks
End If
Rem Do we have a line number?
If Token.Kind = tkIntegerNumber Then
Set LinNum = New LineNumberConstruct
Set LinNum.Value = Token
Body.Add LinNum
Set Token = NextToken
End If
Rem Do we have a label?
If Token.Kind = tkIdentifier Then
Set LookAhead = NextToken
If LookAhead.Kind = tkSoftLineBreak Then
Set Label = New LabelConstruct
Set Label.Id = NewId(Token)
Body.Add Label
Set LookAhead = Nothing
Set Token = NextToken
End If
End If
Select Case Token.Kind
Case tkKeyword
Select Case Token.Code
Case kwEnd
Rem Is it the End statement of an "End Sub", "End Function", or "End Property"?
Set LookAhead = NextToken
If LookAhead.IsKeyword(ClosingToken) Then
Set Token = LookAhead
Exit Do
End If
If LookAhead.Kind = tkIdentifier And LookAhead.Code = cxProperty Then
Set Token = LookAhead
Exit Do
End If
Body.Add New EndConstruct
Case kwDim
ParseDim acLocal, Entity, Body, InsideProc:=True
Case kwStatic
ParseDim acLocal, Entity, Body, InsideProc:=True, IsStatic:=True
Case kwConst
Set Token = ParseConsts(acLocal, Entity, Body, InsideProc:=True)
Case kwIf
ParseIf Entity, Body
Case kwSelect
ParseSelect Entity, Body
Case kwElseIf, kwElse, kwCase
Exit Do
Case Else
Debug.Assert False
'TODO: Fail
End Select
Case tkIdentifier
Set Stmt = Xp.GetStmt(Me, Token, LookAhead)
Set Token = Xp.LastToken
Set LookAhead = Nothing
If Stmt Is Nothing Then Fail Token, Msg084
Body.Add Stmt
Case tkEndOfStream
Exit Do
Case Else
Debug.Assert False
'TODO: Fail
End Select
Loop Until SingleLine
Set ParseBody = Token
End Function
Rem Add it to Messages module:
Public Property Get Msg084() As String
Msg084 = "Expected: = or argument"
End Property
The change needed in GetExpression is minimal, but I had to do additional ones because sometimes we read too much of a token in CollectArgs and need to pass it to GetExpression:
Public Class Expressionist
Option Explicit
Private LastToken_ As Token
Public CanHaveTo As Boolean
Public FullMode As Boolean
Public Property Get LastToken() As Token
Set LastToken = LastToken_
End Property
Private Function Peek(ByVal Stack As KeyedList) As Variant
Set Peek = Stack(Stack.Count)
End Function
Private Function Pop(ByVal Stack As KeyedList) As Variant
Dim Index As Long
Index = Stack.Count
Set Pop = Stack(Index)
Stack.Remove Index
End Function
Rem Adapted from https://stackoverflow.com/questions/16380234/handling-extra-operators-in-shunting-yard/16392115#16392115
Public Function GetExpression(ByVal Parser As Parser, Optional ByVal Token As Token) As IExpression
Dim HadTo As Boolean
Dim WantOperand As Boolean
Dim Cp As Integer
Dim Sym As Symbol
Dim Lit As Literal
Dim Op As Operator
Dim Op2 As Operator
Dim OpStack As KeyedList
Dim OutStack As KeyedList
Dim Handle As FileHandle
Dim Args As TupleConstruct
Set OpStack = New KeyedList
Set OpStack.T = NewValidator(TypeName(New Operator))
Set OutStack = New KeyedList
Set OutStack.T = New ExprValidator
WantOperand = True
Do
If Token Is Nothing Then Set Token = Parser.NextToken
If WantOperand Then
WantOperand = False
Select Case Token.Kind
Case tkOperator
WantOperand = True
Select Case Token.Code
Case opSum
Token.Code = opId
Case opSubt
Token.Code = opNeg
Rem Unary operators
Case opAddressOf, opNew, opNot, opTypeOf, opWithBang, opWithDot
Rem OK
Case Else
Exit Do
End Select
Set Op = NewOperator(Token)
OpStack.Add Op
Case tkLeftParenthesis
Rem Pseudo-operator
Set Op = NewOperator(Token)
OpStack.Add Op
WantOperand = True
Case tkIdentifier, tkEscapedIdentifier
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym
Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, _
tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case tkFileHandle
Set Handle = New FileHandle
Set Handle.Value = Token
OutStack.Add Handle
Case tkKeyword
Select Case Token.Code
Case kwTrue, kwFalse, kwNothing, kwEmpty, kwNull, kwMe
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit
Case kwInput, kwSeek
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym
Case kwByVal
Token.Kind = tkOperator
Token.Code = opByVal
GoTo Down
Case Else
Exit Do
End Select
Case Else
Exit Do
End Select
Else
If Parser.IsBreak(Token) Then
While OpStack.Count > 0
Move OpStack, OutStack
Wend
Exit Do
End If
Select Case Token.Kind
Case tkOperator
Down: Rem Unary and compound operators
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf
Parser.Fail Token, Msg065
Case opCompAnd, opCompEqv, opCompImp, _
opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, opCompMul, opCompDiv, _
opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat
Exit Do
End Select
Set Op2 = NewOperator(Token)
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Cp = ComparePrecedence(Op, Op2)
If Cp = -1 Then Exit Do
Move OpStack, OutStack, Op
Loop
OpStack.Add Op2
WantOperand = True
Case tkRightParenthesis
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Move OpStack, OutStack, Op
Loop
Rem It is allowed to not have a "(" on OpStack because we can be evaluating the following:
Rem Sub A(Optional B As Integer = 1)
Rem We'll get to ")" without having ")" on stack.
If OpStack.Count = 0 Then Exit Do
Pop OpStack
Case tkKeyword
If Token.Code <> kwTo Then Exit Do
If Not CanHaveTo Or HadTo Then Err.Raise vbObjectError + 13
HadTo = True
Token.Kind = tkOperator
Token.Code = Parser.Scanner.Operators.IndexOf(vTo)
OpStack.Add NewOperator(Token)
WantOperand = True
Case tkLeftParenthesis
If Not FullMode Then Exit Do
Token.Kind = tkOperator
Token.Code = opApply
OpStack.Add NewOperator(Token)
Set Args = New TupleConstruct
Set Token = CollectArgs(Args.Elements, Parser)
If Token.Kind <> tkRightParenthesis Then Debug.Assert False 'TODO: Error
OutStack.Add Args
Case Else
Exit Do
End Select
End If
Set Token = Nothing
Loop
Set LastToken_ = Token
Do While OutStack.Count > 1 Or OutStack.Count = 1 And OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
Move OpStack, OutStack, Op
Loop
Debug.Assert OpStack.Count = 0
Debug.Assert OutStack.Count <= 1
If OutStack.Count = 1 Then Set GetExpression = Pop(OutStack)
End Function
Private Sub Move(ByVal OpStack As KeyedList, ByVal OutStack As KeyedList, Optional ByVal Op As Operator)
Dim IExpr As IExpression
Dim Uni As UnaryExpression
Dim Bin As BinaryExpression
Dim Exec As CallConstruct
Dim Tup As TupleConstruct
Dim Elem As Variant
If Op Is Nothing Then Set Op = Peek(OpStack)
If Op.IsUnary Then
Set Uni = New UnaryExpression
Set Uni.Operator = Op
Set Uni.Value = Pop(OutStack)
Set IExpr = Uni
ElseIf Op.Value.Code = opApply Then
Set Exec = New CallConstruct
Set Tup = Pop(OutStack)
For Each Elem In Tup.Elements
Exec.Arguments.Add Elem
Next
Set Exec.LHS = Pop(OutStack)
Set IExpr = Exec
Else
Set Bin = New BinaryExpression
Set Bin.Operator = Op
Set Bin.RHS = Pop(OutStack)
Set Bin.LHS = Pop(OutStack)
Set IExpr = Bin
End If
OutStack.Add IExpr
Pop OpStack
End Sub
End Class
This is another piece of code that I struggled to get it right.
You can see that there is another GoTo there. Im my case it always seems to signal trouble.
Anyway, let's see how it goes.
Next week, even though we are not done with parsing yet, we'll have a glimpse of "transpiling".
Andrej Biasic
2021-03-31