Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 14

This is the fourteenth post in a series of building a transpiler.
You can find the previous ones here.

Last time I said we would start checking whether the sequence of tokens makes sense or not.
For now, we will deal with a single source file, so, inside it, we'll allow zero to several classes or modules.
This is the syntax that we will follow:

[[Public | Private] Module name
(...)
End Module]

[[Public | Private] Class name
(...)
End Class]

Tokens inside square brackets are optional. The pipe ("|") means a choice; it can be any of the tokens separated by it. The arrow means a line break.
We'll also create classes to represent a class or a module:

Public Enum Accessibility
AccessLocal
AccessPublic
AccessPrivate
AccessFriend
End Enum

Public Class SourceFile
Public Path As String
Public Classes As New Collection
Public Modules As New Collection
End Class

Public Class ModuleConstruct
Public Accessibility As Accessibility
Public Name As Token
End Class

Public Class ClassConstruct
Public Accessibility As Accessibility
Public Name As Token
End Class

Now, let's not mix things up: The classes above use VB6's non-supported keyword Class.
It is there to represent class files (.CLS) we will create.
The Class (and Module) keyword(s) in the syntax above, however, are code inside a source file that we'll parse.
Now we can start parsing things:

Public Sub Main()
Dim Source As SourceFile
Dim Scanner As Scanner
Dim Parser As Parser
Dim Token As Token
Dim Access As Accessibility
Dim Cls As ClassConstruct
Dim Mdl As ModuleConstruct
Dim IsClass As Boolean
Dim Name As Token

Set Source = New SourceFile
Source.Path = Command$

Set Scanner = New Scanner
Scanner.OpenFile Source.Path

Set Parser = New Parser

Do
Access = AccessPublic
GoSub MayEatLineBreaks
If Token.Kind = tkEndOfStream Then Exit Do

If Token.Kind = tkKeyword And Token.Text = "Public" Then
GoSub MustEatSpace

ElseIf Token.Kind = tkKeyword And Token.Text = "Private" Then
Access = AccessPrivate
GoSub MustEatSpace
End If

If Token.Suffix <> vbNullChar Then Err.Raise vbObjectError + 13, , "Expected: Public, Private, Class, or Module"

If Token.Kind = tkKeyword And Token.Text = "Class" Then
IsClass = True
Set Cls = New ClassConstruct

ElseIf Token.Kind = tkKeyword And Token.Text = "Module" Then
IsClass = False
Set Mdl = New ModuleConstruct

Else
Err.Raise vbObjectError + 13, , "Expected: Public, Private, Class, or Module"
End If

GoSub MustEatSpace

If Token.Kind <> tkEscapedIdentifier And Token.Kind <> tkIdentifier Then
Err.Raise vbObjectError + 13, , "Expected: identifier"
End If

Set Name = Token
GoSub MustEatLineBreak

If Token.Kind <> tkKeyword Or Token.Text <> "End" Then Err.Raise vbObjectError + 13, , "Expected: End"

GoSub MustEatSpace

If Token.Kind <> tkKeyword Or Token.Text <> IIf(IsClass, "Class", "Module") Then
Err.Raise vbObjectError + 13, , "Expected: " & IIf(IsClass, "Class", "Module")
End If

If IsClass Then
Cls.Accessibility = Access
Set Cls.Name = Name
Source.Classes.Add Cls
Else
Mdl.Accessibility = Access
Set Mdl.Name = Name
Source.Modules.Add Mdl
End If
Loop

Exit Sub

MustEatSpace:
Set Token = Parser.TokenFrom(Scanner)
If Token.Kind <> tkWhiteSpace Then Err.Raise vbObjectError + 13, , "Expected: white space"

MayEatSpaces:
Do
Set Token = Parser.TokenFrom(Scanner)
Loop While Token.Kind = tkWhiteSpace

Return

MustEatLineBreak:
Set Token = Parser.TokenFrom(Scanner)
If Token.Kind <> tkSoftLineBreak And Token.Kind <> tkHardLineBreak Then Err.Raise vbObjectError + 13, , "Expected: line break"

MayEatLineBreaks:
Do
Set Token = Parser.TokenFrom(Scanner)
Loop While Token.Kind = tkSoftLineBreak Or Token.Kind = tkHardLineBreak

Return
End Sub

So far, it is a crude start, but it is a start nonetheless.

We check if we get a Public or a Private keyword. If we do, we save it to be used later. If we don't, we default it to Public.
Then we check if there is a Class or a Module keyword. We raise an error if we got something different, otherwise, we set a variable IsClass to remember what we got and instantiate the correct object.
Next, we get the class' or module's name and save it for later.
As we are not dealing with class' or modules' inner code yet, we just expect to have an End keyword followed by a Class or Module keyword, whichever we got previously.

By the way, we need to check if we got keywords, otherwise, we would accept invalid things like "Public [Class] MyClass", for instance.

After we're done, we set the ClassConstruct's or ModuleConstruct's accessibility and name and save it.
Then we start all over again until there are no more tokens to consume.

Finally, we have four subroutines:
MayEatLineBreaks will discard zero or several line breaks.
MustEatLineBreak will discard at least one line break. If there's none to be discarded, we'll raise an error.
The same goes for MayEatSpaces and MustEatSpace, but for spaces instead of line breaks.

With the code above we can parse a source file like this:

Class MyClass

End Class

Private Module MyModule
End Module


We have two problems, though: Our error messages are hideous. We'll need to improve them.
And I noticed we have a bug we will have to deal with later.
To work around it we need to ensure our source file ends with a line break, otherwise, we will raise an error by mistake.

Next time we'll improve the relationship between Scanner and Parser classes.

Andrej Biasic
2020-10-21