Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 15

This is the fifteenth post in a series of building a transpiler.
You can find the previous ones here.

Last time I said we would improve the relationship between Scanner and Parser classes.
But we said we had two problems too: We need to improve our error messages and there's a bug!

We have quite a bit to do.

Let's start with the error messages. Our class Scanner has a Fail procedure, so we could do the same for Parser. It will have access to the SourceFile so we can display its path in the error message.
We will also pass the offending token to it to be able to add its line and column to the message.

Private Sub Fail(ByVal Token As Token, ByVal Message As String, Optional ByVal Expected As String)
Dim Msg As String
Dim Got As String
Dim Ch As Integer

Select Case Token.Kind
Case tkEscapedIdentifier
Got = "[" & Token.Text & "]"

Case tkFileHandle
Got = "#" & Token.Text

Case Else
Got = Token.Text
End Select

If Len(Token.Text) = 1 Then
Ch = ChrW(Token.Text)
If Ch <= 32 Then Got = "Character " & Ch
End If

Msg = "Parser Error" & vbNewLine
Msg = Msg & "File: '" & Source_.Path & "'" & vbNewLine
Msg = Msg & "Line: " & Token.Line & vbNewLine
Msg = Msg & "Column: " & Token.Column & vbNewLine
If Expected <> "" Then Msg = Msg & "Expected: " & Expected & vbNewLine
Msg = Msg & "Got: " & Got & vbNewLine
Msg = Msg & Message
Err.Raise vbObjectError + 13, , Msg
End Sub

Note that we are using a Source_ object there. We don't have it yet, so let's change our API.
Instead of instantiating a Scanner and a Parser and pass the scanner to Parser's TokenFrom method every time we call it, we'll make Parser instantiate a scanner of its own and use it.
By doing that we can change the TokenFrom method to not receive a scanner anymore, and change its name to NextToken.
It will make the function's name collide with a static variable of his own, so we'll need to rename that variable to SpareToken.

As the scanner needs a source file path to scan it, we'll create a SourceFile Set Property on Parser to receive the source file object and inside it, we'll pass source file's path to the scanner's OpenFile method.
Now, every time we receive a new source file to parse, we need to reset NextToken's static variables, so we'll upgrade them to class' global variables and reset them inside SourceFile.
Caveat: We need to expose Parser's inner scanner because our pretty-printer uses its IsLetter method.

Here are the relevant changes:

Private Scanner_ As Scanner
Private Source_ As SourceFile

Rem The static variables that we "upgraded" are below.
Private Downgrade_ As Boolean
Private WasAs_ As Boolean
Private LastToken_ As Token
Private State_ As NarrowContext
Private SpareToken_ As Token


Private Sub Class_Initialize()
Set Scanner_ = New Scanner
End Sub


Public Property Set SourceFile(ByVal Source As SourceFile)
Set Scanner_ = New Scanner
Set Source_ = Source
Scanner_.OpenFile Source_.Path
Downgrade_ = False
WasAs_ = False
Set LastToken_ = Nothing
State_ = NoContext
Set SpareToken_ = Nothing
End Property


Rem Exposing Parser's inner scanner.
Public Property Get Scanner()
Set Scanner = Scanner_
End Property


Rem This is TokenFrom renamed to NexToken.
Rem Inside it we need to append an underscore to all variables' names
Rem except for Upgrade, Revoke, and Token.

Public Function NextToken() As Token
(...)

Now, let's adapt our code to cope with the changes. Our Main sub will be like this:

Public Sub Main()
Dim Source As SourceFile
Dim Parser As Parser

Set Source = New SourceFile
Source.Path = Command$

Set Parser = New Parser
Set Parser.SourceFile = Source

Parser.Parse
End Sub

Most of the Main's old code will now go to Parser's Parse method:

Public Sub Parse()
Dim Entity As Entity
Dim Token As Token
Dim Access As Accessibility

Do
Set Entity = New Entity

Set Token = SkipLineBreaks
If Token.Kind = tkEndOfStream Then Exit Do

If Token.Suffix <> vbNullChar Then _
Fail Token, "Rule: [Public | Private] (Class | Module) name", "Public, Private, Class, or Module"

If Token.Kind = tkKeyword And Token.Text = "Public" Then
Access = AccessPublic
Set Token = NextToken

ElseIf Token.Kind = tkKeyword And Token.Text = "Private" Then
Access = AccessPrivate
Set Token = NextToken
End If

If Token.Kind = tkKeyword And Token.Text = "Class" Then
Entity.IsClass = True

ElseIf Token.Kind = tkKeyword And Token.Text = "Module" Then
Rem Nothing to do.

Else
If Entity.Accessibility = AccessLocal Then
Fail Token, "Rule: [Public | Private] (Class | Module) name", "Public, Private, Class, or Module"
Else
Fail Token, "Rule: [Public | Private] (Class | Module) name", "Class or Module"
End If
End If

If Entity.Accessibility = AccessLocal Then Entity.Accessibility = AccessPublic
Set Token = NextToken

If Token.Kind <> tkEscapedIdentifier And Token.Kind <> tkIdentifier Then
Fail Token, "Rule: [Public | Private] (Class | Module) name", "name"
End If

Set Entity.Name = Token
MustEatLineBreak

Set Token = NextToken
If Token.Kind <> tkKeyword Or Token.Text <> "End" Then Fail Token, "Rule: End (Class | Module)", "End"

Set Token = NextToken

If Token.Kind <> tkKeyword Or Token.Text <> IIf(Entity.IsClass, "Class", "Module") Then
Fail Token, "Rule: End (Class | Module)", "Class or Module"
End If

Entity.Accessibility = Access
Source_.Entities.Add Entity
MustEatLineBreak
Loop
End Sub

Note that instead of "Cls As ClassConstruct" and "Mdl As ModuleConstruct" we are using an "Entity As Entity".
More about it later.

There is a kind of cosmetic change that we can do, too.
We can accept a SourceFile in Parse, and inside it pass Source to SourceFile property:

Public Sub Parse(ByVal Source As SourceFile)
Const RULE = "Rule: [Public | Private] (Class | Module) identifier"
Dim Entity As Entity
Dim Token As Token

Set SourceFile = Source
(...)

Rem Now Main would be like this:
Public Sub Main()
Dim Source As SourceFile
Dim Parser As Parser

Set Source = New SourceFile
Source.Path = Command$

Set Parser = New Parser
Parser.Parse Source
End Sub

Regarding the bug, I'm going with a kind of workaround.
When opening the file, we'll check if its last character is a line break. If it's not, then we'll add it to the file.
Here are the changes needed to make it happen:

Public Sub OpenFile(ByVal FilePath As String)
Dim Cp As Integer

If Dir(FilePath) = "" Then Err.Raise 53
File_ = FreeFile
Open FilePath For Binary Access Read Write As #File_

Rem If the error below happens, we'll let a new-ly created zero-length file behind.
If LOF(File_) = 0 Then Err.Raise 53

Seek #File_, LOF(File_) - 3
Cp = GetCodePoint

If Cp <> 10 Then
Seek #File_, LOF(File_) + 1

Select Case vbNewLine
Case vbCr
Put #File_, , 13

Case vbLf
Put #File_, , 10

Case vbCrLf
Put #File_, , &HA000D
End Select
End If

Seek #File_, 1

Cp = GetCodePoint
If Cp <> &HFEFF Then UngetChar ChrW$(Cp)
End Sub

I've thought a little about MayEatSpaces and MustEatSpace and decided to embed them into NextToken.
I also concluded that MustEatLineBreak and MayEatLineBreaks must be proper sub/function on their own.
But, in doing that, we will no longer report spaces and Scanner will no longer be useful to our pretty-printer.
So, we'll add a Spaces property to Token class and will set it to how many spaces there were before the token.
We'll use it in PrettyPrint to print the correct amount of spaces.

Relevant changes in NextToken:

Public Function NextToken() As Token
Dim Upgrade As Boolean
Dim Revoke As Boolean
Dim Token As Token
Dim Done As Boolean

Do
Done = True
(...)
Case tkLineContinuation
Set Token = NextToken()

While IsBreak(Token)
Set Token = NextToken()
Wend

Case tkWhiteSpace
Done = False
End Select

If Upgrade Then
Token.Kind = tkKeyword
If Revoke Then State_ = NoContext
End If
End If

Set LastToken_ = Token
Loop While Not Done

Set NextToken = Token
End Function

MustEatLinebreak: We'll add comments to it as ' and Rem both end with a line break.

Private Function IsBreak(ByVal Token As Token) As Boolean
IsBreak = Token.Kind = tkSoftLineBreak Or Token.Kind = tkHardLineBreak Or Token.Kind = tkComment
End Function


Public Sub MustEatLinebreak()
Dim Token As Token

Set Token = NextToken
If IsBreak(Token) Then Exit Sub
Fail Token, "Rule: vbCr | vbLf | vbCrLf | : | '", "line break"
End Sub

MayEatLineBreaks morphed into a SkipLineBreaks function:

Public Function SkipLineBreaks() As Token
Dim Token As Token

Do
Set Token = NextToken
Loop While IsBreak(Token)

Set SkipLineBreaks = Token
End Function

Adding Spaces property to Token class:

Class Token
Public Text As String
Public Suffix As String
Public Kind As TokenKind
Public Line As Long
Public Column As Long
Public Spaces As Long

Private Sub Class_Initialize()
Text = " "
Suffix = vbNullChar
End Sub
End Class

Setting it in NextToken:

Public Function NextToken() As Token
Dim Upgrade As Boolean
Dim Revoke As Boolean
Dim Token As Token
Dim Done As Boolean
Dim Spaces As Long
(...)
Case tkWhiteSpace
Done = False
Spaces = Spaces + 1
End Select
(...)
If Token.Kind <> tkHardLineBreak And Token.Spaces = 0 Then Token.Spaces = Spaces
Set NextToken = Token
End Function

Using it in PrettyPrint:

(...)
Do
Set Token = Parser.NextToken

If Nbsp Then
For Index = 1 To Token.Spaces
Print #HtmlFile, "&nbsp;&nbsp;&nbsp;&nbsp;";
Next
Else
Print #HtmlFile, Space$(Token.Spaces);
End If

Select Case Token.Kind
(...)

Finally, I'm going to merge ClassConstruct and ModuleConstruct classes into one single class, Entity.
The only (structural) difference between a class and a module is that a class can have Events and Implements, while a module cannot.

Class Entity
Public IsClass As Boolean
Public Accessibility As Accessibility
Public Name As Token
End Class


Class SourceFile
Public Path As String
Public Entities As New Collection
End Class

Next time we'll start parsing whatever is inside classes and modules.

Andrej Biasic
2020-10-28