Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 31

This is the thirty-first post in a series of building a transpiler.
You can find the previous ones here.

Last time I said we'd parse implicit Let and Call statements.

I was planning to tackle For, While, etc., but ended up playing with this instead, so be it!

Every VB line statement starts with a keyword, except for implicit Let and Call ones.
When we start parsing a new line, if we get an identifier instead of a keyword, it can be that it will be assigned some value further on.

I'm talking about something like this:

A = B

This is equivalent to an explicit Let:

Let A = B

We can't be sure until we find that equal sign in the correct place, though.
But there's a second option to getting an identifier. Maybe it is an implicit call to some procedure?

A B

This is equivalent to the following explicit Call:

Call A(B)

At first, it may seem that to figure it out the only thing we need to do is check what comes next after the identifier, but consider the following snippet:

A.B(C) = D

In this case, what comes next is a dot operator...
So, we must consider that we have an expression (instead of an identifier) that happens to start with an identifier, and then we'll have an equal sign.
But what if we have something like the snippet below?

A(B).C D

Then we have an expression that ends up being an implicit call to some method C having D as argument.
But before jumping to code, let's see one more ambiguity we have so far.

Visual Basic does not have an assignment operator that's different from the equal-to operator as other languages have.
If your background is C, the code below will be misleading:

A = B = C = D

What it does in VB is:
This demonstrates that the first equal sign is the assignment one and that it has lower precedence than the remaining equal-to operators.

Let's start by completing LetConstruct and CallConstruct classes:

Public Class CallConstruct
Option Explicit
Implements IStmt
Implements IExpression

Private Arguments_ As KeyedList

Public LHS As IExpression

Private Sub Class_Initialize()
Set Arguments_ = New KeyedList
Set Arguments_.T = New ExprValidator
End Sub

Public Property Get Arguments() As KeyedList
Set Arguments = Arguments_
End Property

Private Property Get IExpression_Kind() As ExpressionKind
IExpression_Kind = ekIndexer
End Property

Private Property Get IStmt_Kind() As StmtNumbers
IStmt_Kind = snCall
End Property
End Class


Public Class LetConstruct
Option Explicit
Implements IStmt

Public Name As IExpression
Public Operator As Operator
Public Value As IExpression

Private Property Get IStmt_Kind() As StmtNumbers
IStmt_Kind = snLet
End Property
End Class

Now we'll create a GetStmt procedure in Expressionist.
Its GetExpression method is unable to handle function calls - it does not know what to do when it sees an unexpected left parenthesis.
GetStmt will consume it, collect the arguments, check if there is a closing parenthesis, then go back parsing the remaining of the expression.

When it finds an assignment operator (or any derivative one like +=, /=, etc.) or something that resembles an argument, it will stop and return what has been parsed so far, so we get the left-hand part of the assignment or call.
But only if we let it do it through a new property FullMode.

Public Function GetStmt(ByVal Parser As Parser, ByVal Token As Token, Optional ByVal LookAhead As Token) As IStmt
Dim Done As Boolean
Dim Result As IStmt
Dim Sym As Symbol
Dim Name As IExpression
Dim Asg As LetConstruct
Dim Xp As Expressionist
Dim Exec As CallConstruct
Dim Bin As BinaryExpression

Set Xp = New Expressionist

Set Sym = New Symbol
Set Sym.Value = Token
Set Name = Sym

If LookAhead Is Nothing Then
Set Token = Parser.NextToken
Else
Set Token = LookAhead
End If

Do
Done = True

Select Case Token.Kind
Case tkLeftParenthesis
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser)
If Token.Kind = tkRightParenthesis Then Set Token = Parser.NextToken
Set Name = Exec
Rem Let's iterate again
Done = False

Case tkOperator
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf, opWithDot, opWithBang
Rem Operator is being passed to CollectArgs through Token argument.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case opDot
Set Bin = New BinaryExpression
Set Bin.Operator = NewOperator(Token)
Set Bin.LHS = Name

Set Token = Parser.NextToken
If Token.Kind <> tkIdentifier And Token.Kind <> tkEscapedIdentifier Then Exit Do

Set Sym = New Symbol
Set Sym.Value = Token
Set Bin.RHS = Sym

Set Name = Bin

Set Token = Parser.NextToken
Done = False

Case opEq
Set Asg = New LetConstruct
Set Asg.Operator = NewOperator(Token)
Set Asg.Name = Name

Xp.FullMode = True
Set Asg.Value = Xp.GetExpression(Parser)
Set Token = Xp.LastToken
If Asg.Value Is Nothing Then Exit Do

Set Result = Asg

Case opSum
Rem Identity operator. We'll ignore it.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser)
Set Result = Exec

Case opSubt
Rem Operator is being passed to CollectArgs through Token argument.
Token.Code = opNeg
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case opConcat, opCompAnd, opCompEqv, opCompImp, opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, _
opCompMul, opCompDiv, opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat

Set Asg = New LetConstruct
Set Asg.Operator = NewOperator(Token)
Set Asg.Name = Name

Xp.FullMode = True
Asg.Value = Xp.GetExpression(Parser)
If Asg.Value Is Nothing Then Exit Do 'We'll return Nothing to sign a problem.

Set Result = Asg
End Select

Case tkIdentifier, tkEscapedIdentifier
Set Exec = New CallConstruct
Set Exec.LHS = Name

Rem Identifier is being passed to CollectArgs through Token
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case tkKeyword
Rem Keyword is being passed to CollectArgs through Token
Select Case Token.Code
Case kwByVal
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case kwDate, kwString
Token.Kind = tkIdentifier
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case kwEmpty, kwFalse, kwMe, kwNothing, kwNull, kwTrue
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case Else
Exit Do
End Select

Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Exec = New CallConstruct
Set Exec.LHS = Name
Rem Literal is being passed to CollectArgs through Token
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case tkListSeparator
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Token = CollectArgs(Exec.Arguments, Parser, Token)
Set Result = Exec

Case Else
If Not Parser.IsBreak(Token) Then Exit Do

Rem Method call with no arguments.
Set Exec = New CallConstruct
Set Exec.LHS = Name
Set Result = Exec
End Select
Loop Until Done

Set LastToken_ = Token
Debug.Assert Parser.IsBreak(Token) Or Token.Code = kwElse
Set GetStmt = Result
End Function


Private Function CollectArgs(ByVal Args As KeyedList, ByVal Parser As Parser, Optional ByVal Token As Token) As Token
Dim Lit As Literal
Dim Expr As IExpression
Dim Xp As Expressionist
Dim Tkn As Token

Set Xp = New Expressionist
Xp.FullMode = True

If Not Token Is Nothing Then
If Token.Kind = tkListSeparator Then
Token.Kind = tkKeyword
Token.Code = kwVoid

Set Lit = New Literal
Set Lit.Value = Token

Args.Add Lit
Set Token = Nothing
End If
End If

Do
Set Expr = Xp.GetExpression(Parser, Token)
Set Token = Xp.LastToken

If Expr Is Nothing And Token.Kind = tkListSeparator Then
Set Tkn = New Token
Tkn.Column = Token.Column
Tkn.Line = Token.Line
Tkn.Kind = tkKeyword
Tkn.Code = kwVoid
Set Lit = New Literal
Set Lit.Value = Tkn
Set Expr = Lit
End If

Args.Add Expr

If Token.Kind = tkRightParenthesis Then Exit Do
If Token.Kind <> tkListSeparator Then Exit Do
Set Token = Nothing
Loop

Set CollectArgs = Token
End Function

What we are doing here is accumulating names (identifiers) connected by dots and dealing with the eventual open parenthesis.
As we are not using the GetExpression method, we had to replicate some of its functionalities here, like reclassifying keywords to operators, expressions, or functions, for instance.

We'll get back to Expressionist in a moment.
These are the changes in ParseBody.

Private Function ParseBody( _
ByVal Entity As Entity, _
ByVal Body As KeyedList, _
ByVal ClosingToken As Long, _
Optional ByVal SingleLine As Boolean, _
Optional ByVal LookAhead As Token _
) As Token
Dim Token As Token
Dim Stmt As IStmt
Dim Xp As Expressionist
Dim Label As LabelConstruct
Dim LinNum As LineNumberConstruct

Set Xp = New Expressionist

Do
If LookAhead Is Nothing Then
Set Token = SkipLineBreaks
Else
Set Token = LookAhead
Set LookAhead = Nothing
If IsBreak(Token) Then Set Token = SkipLineBreaks
End If

Rem Do we have a line number?
If Token.Kind = tkIntegerNumber Then
Set LinNum = New LineNumberConstruct
Set LinNum.Value = Token
Body.Add LinNum
Set Token = NextToken
End If

Rem Do we have a label?
If Token.Kind = tkIdentifier Then
Set LookAhead = NextToken

If LookAhead.Kind = tkSoftLineBreak Then
Set Label = New LabelConstruct
Set Label.Id = NewId(Token)
Body.Add Label
Set LookAhead = Nothing
Set Token = NextToken
End If
End If

Select Case Token.Kind
Case tkKeyword
Select Case Token.Code
Case kwEnd
Rem Is it the End statement of an "End Sub", "End Function", or "End Property"?
Set LookAhead = NextToken

If LookAhead.IsKeyword(ClosingToken) Then
Set Token = LookAhead
Exit Do
End If

If LookAhead.Kind = tkIdentifier And LookAhead.Code = cxProperty Then
Set Token = LookAhead
Exit Do
End If

Body.Add New EndConstruct

Case kwDim
ParseDim acLocal, Entity, Body, InsideProc:=True

Case kwStatic
ParseDim acLocal, Entity, Body, InsideProc:=True, IsStatic:=True

Case kwConst
Set Token = ParseConsts(acLocal, Entity, Body, InsideProc:=True)

Case kwIf
ParseIf Entity, Body

Case kwSelect
ParseSelect Entity, Body

Case kwElseIf, kwElse, kwCase
Exit Do

Case Else
Debug.Assert False
'TODO: Fail
End Select

Case tkIdentifier
Set Stmt = Xp.GetStmt(Me, Token, LookAhead)
Set Token = Xp.LastToken
Set LookAhead = Nothing
If Stmt Is Nothing Then Fail Token, Msg084
Body.Add Stmt

Case tkEndOfStream
Exit Do

Case Else
Debug.Assert False
'TODO: Fail
End Select
Loop Until SingleLine

Set ParseBody = Token
End Function


Rem Add it to Messages module:

Public Property Get Msg084() As String
Msg084 = "Expected: = or argument"
End Property

The change needed in GetExpression is minimal, but I had to do additional ones because sometimes we read too much of a token in CollectArgs and need to pass it to GetExpression:

Public Class Expressionist
Option Explicit

Private LastToken_ As Token

Public CanHaveTo As Boolean
Public FullMode As Boolean

Public Property Get LastToken() As Token
Set LastToken = LastToken_
End Property

Private Function Peek(ByVal Stack As KeyedList) As Variant
Set Peek = Stack(Stack.Count)
End Function

Private Function Pop(ByVal Stack As KeyedList) As Variant
Dim Index As Long

Index = Stack.Count
Set Pop = Stack(Index)
Stack.Remove Index
End Function

Rem Adapted from https://stackoverflow.com/questions/16380234/handling-extra-operators-in-shunting-yard/16392115#16392115
Public Function GetExpression(ByVal Parser As Parser, Optional ByVal Token As Token) As IExpression
Dim HadTo As Boolean
Dim WantOperand As Boolean
Dim Cp As Integer
Dim Sym As Symbol
Dim Lit As Literal
Dim Op As Operator
Dim Op2 As Operator
Dim OpStack As KeyedList
Dim OutStack As KeyedList
Dim Handle As FileHandle
Dim Args As TupleConstruct

Set OpStack = New KeyedList
Set OpStack.T = NewValidator(TypeName(New Operator))

Set OutStack = New KeyedList
Set OutStack.T = New ExprValidator
WantOperand = True

Do
If Token Is Nothing Then Set Token = Parser.NextToken

If WantOperand Then
WantOperand = False

Select Case Token.Kind
Case tkOperator
WantOperand = True

Select Case Token.Code
Case opSum
Token.Code = opId

Case opSubt
Token.Code = opNeg

Rem Unary operators
Case opAddressOf, opNew, opNot, opTypeOf, opWithBang, opWithDot
Rem OK

Case Else
Exit Do
End Select

Set Op = NewOperator(Token)
OpStack.Add Op

Case tkLeftParenthesis
Rem Pseudo-operator
Set Op = NewOperator(Token)
OpStack.Add Op
WantOperand = True

Case tkIdentifier, tkEscapedIdentifier
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym

Case tkIntegerNumber, tkFloatNumber, tkSciNumber, tkBinaryNumber, _
tkOctalNumber, tkHexaNumber, tkString, tkDateTime
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit

Case tkFileHandle
Set Handle = New FileHandle
Set Handle.Value = Token
OutStack.Add Handle

Case tkKeyword
Select Case Token.Code
Case kwTrue, kwFalse, kwNothing, kwEmpty, kwNull, kwMe
Set Lit = New Literal
Set Lit.Value = Token
OutStack.Add Lit

Case kwInput, kwSeek
Set Sym = New Symbol
Set Sym.Value = Token
OutStack.Add Sym

Case kwByVal
Token.Kind = tkOperator
Token.Code = opByVal
GoTo Down

Case Else
Exit Do
End Select

Case Else
Exit Do
End Select
Else
If Parser.IsBreak(Token) Then
While OpStack.Count > 0
Move OpStack, OutStack
Wend

Exit Do
End If

Select Case Token.Kind
Case tkOperator
Down:  Rem Unary and compound operators
Select Case Token.Code
Case opAddressOf, opNew, opNot, opTypeOf
Parser.Fail Token, Msg065

Case opCompAnd, opCompEqv, opCompImp, _
opCompMod, opCompOr, opCompXor, opCompSum, opCompSubt, opCompMul, opCompDiv, _
opCompIntDiv, opCompPow, opCompLSh, opCompRSh, opCompURSh, opCompConcat
Exit Do
End Select

Set Op2 = NewOperator(Token)

Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do

Cp = ComparePrecedence(Op, Op2)
If Cp = -1 Then Exit Do
Move OpStack, OutStack, Op
Loop

OpStack.Add Op2
WantOperand = True

Case tkLeftParenthesis
Rem Apply operator
Set Token = New Token
Token.Kind = tkOperator
Token.Code = opApply
Set Op = NewOperator(Token)
OpStack.Add Op

Case tkRightParenthesis
Do While OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do
If Op.Value.IsOperator(opApply) Then Exit Do

Move OpStack, OutStack, Op
Loop

Rem It is allowed to not have a "(" on OpStack because we can be evaluating the following:
Rem Sub A(Optional B As Integer = 1)
Rem We'll get to ")" without having ")" on stack.
If OpStack.Count = 0 Then Exit Do
Pop OpStack

Case tkKeyword
If Token.Code <> kwTo Then Exit Do
If Not CanHaveTo Or HadTo Then Err.Raise vbObjectError + 13
HadTo = True

Token.Kind = tkOperator
Token.Code = Parser.Scanner.Operators.IndexOf(vTo)
OpStack.Add NewOperator(Token)
WantOperand = True

Case tkLeftParenthesis
If Not FullMode Then Exit Do

Token.Kind = tkOperator
Token.Code = opApply
OpStack.Add NewOperator(Token)

Set Args = New TupleConstruct
Set Token = CollectArgs(Args.Elements, Parser)
If Token.Kind <> tkRightParenthesis Then Debug.Assert False 'TODO: Error
OutStack.Add Args

Case Else
Exit Do
End Select
End If

Set Token = Nothing
Loop

Set LastToken_ = Token

Do While OutStack.Count > 1 Or OutStack.Count = 1 And OpStack.Count > 0
Set Op = Peek(OpStack)
If Op.Value.Kind = tkLeftParenthesis Then Exit Do

Move OpStack, OutStack, Op
Loop

Debug.Assert OpStack.Count = 0
Debug.Assert OutStack.Count <= 1
If OutStack.Count = 1 Then Set GetExpression = Pop(OutStack)
End Function


Private Sub Move(ByVal OpStack As KeyedList, ByVal OutStack As KeyedList, Optional ByVal Op As Operator)
Dim IExpr As IExpression
Dim Uni As UnaryExpression
Dim Bin As BinaryExpression
Dim Exec As CallConstruct
Dim Tup As TupleConstruct
Dim Elem As Variant

If Op Is Nothing Then Set Op = Peek(OpStack)

If Op.IsUnary Then
Set Uni = New UnaryExpression
Set Uni.Operator = Op
Set Uni.Value = Pop(OutStack)
Set IExpr = Uni

ElseIf Op.Value.Code = opApply Then
Set Exec = New CallConstruct

Set Tup = Pop(OutStack)

For Each Elem In Tup.Elements
Exec.Arguments.Add Elem
Next

Set Exec.LHS = Pop(OutStack)
Set IExpr = Exec

Else
Set Bin = New BinaryExpression
Set Bin.Operator = Op
Set Bin.RHS = Pop(OutStack)
Set Bin.LHS = Pop(OutStack)
Set IExpr = Bin
End If

OutStack.Add IExpr
Pop OpStack
End Sub
End Class

This is another piece of code that I struggled to get it right.
You can see that there is another GoTo there. Im my case it always seems to signal trouble.
Anyway, let's see how it goes.

Next week, even though we are not done with parsing yet, we'll have a glimpse of "transpiling".

Andrej Biasic
2021-03-31