Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 6

This is the sixth post in a series of building a transpiler. You can find the previous ones here.

Last time we were left to deal with comments.

VB6 has two kinds of comments. One is the apostrophe ('). It comments out everything at its right until the end of the line.
The other one is the Rem keyword. It acts the same as the apostrophe, the difference between them being the former can be used in a line with other commands, while the latter needs to be the first thing on its line.

This is how we will handle them: We'll create a function ReadComment that will accumulate every character up to a line break but not including it.
While we are at it, let's also get rid of the UCS-2 LE BOM if we see one:

Public Sub Main()
Dim Cp As Integer
Dim Ch As String * 1
Dim Token As String
Dim TokenType As String

Rem Get an available file number.
FileHandle_ = FreeFile
Rem File path for the source code is passed as a command-line argument.
Open Command$ For Binary As #FileHandle_
Rem Ensuring we close the file in case we have an error.
On Error GoTo CloseIt

Cp = GetCodePoint
Rem Get rid of BOM if there is one.
If Cp <> -257 Then UngetChar

Rem While we do not reach the end of file...
While Not EOF(FileHandle_)
Rem ...read a codepoint from it.
Cp = GetCodePoint
Ch = ToChar(Cp)

Select Case Ch
Case "["
Token = ReadEscapedIdentifier
TokenType = "EscapedIdentifier"

Case """"
Token = ReadString
TokenType = "String"

Case "&"
Token = ReadAmpersand
TokenType = "Hexa/Octa/Operator"

Case "#"
Token = ReadDateTime
TokenType = "Date"

Case "0" To "9"
Token = ReadNumber(Ch)
TokenType = "Number"

Case "+", "-", "*", "/", "\", "^", "=", ".", "!"
Token = Ch
TokenType = "Operator"

Case "<"
TokenType = "Operator"
Token = Ch
Ch = GetChar

If Ch = ">" Or Ch = "=" Then
Token = Token & Ch
Else
UngetChar
End If

Case ">"
TokenType = "Operator"
Token = Ch

If GetChar = "=" Then
Token = Token & "="
Else
UngetChar
End If

Case ":"
TokenType = "SoftLineBreak"
Token = Ch

Case vbLf
TokenType = "HardLineBreak"
Token = "LINE-BREAK"

Case vbCr
TokenType = "HardLineBreak"
If GetCodePoint <> 10 Then UngetChar
Token = "LINE-BREAK"

Case "'"
TokenType = "COMMENT"
Token = ReadComment(Ch)

Case Else
If IsSpace(Cp) Then
If Not EOF(FileHandle_) Then
If GetChar = "_" Then
If EOF(FileHandle_) Then
UngetChar
Else
Cp = GetCodePoint
If Cp <> 10 And Cp <> 13 Then UngetChar 2
End If
Else
UngetChar
End If
End If

TokenType = "WhiteSpace"
Token = "WHITE-SPACE"

ElseIf IsLetter(Cp) Then
Token = ReadIdentifier(Cp)

If IsKeyword(Token) Then
TokenType = "Keyword"

If Token = "Rem" Then
TokenType = "WHITE-SPACE"
Token = ReadComment(Token)
End If

ElseIf IsOperator(Token) Then
TokenType = "Operator"

Else
TokenType = "Identifier"
End If

Else
Rem tokens we did not deal yet.
End If
End Select

Debug.Print TokenType & "->" & Token
Wend

CloseIt:
Close #FileHandle_
Rem This is equivalent to a Throw in a Catch.
If Err.Number Then Err.Raise Err.Number
End Sub


Private Function ReadComment(ByVal Mark As String) As String
Const MAX_LENGTH = 1013
Dim Count As Integer
Dim Cp As Integer
Dim Ch1 As String * 1
Dim Ch2 As String * 1
Dim Ch3 As String * 1
Dim Buffer As String * MAX_LENGTH

Count = Len(Mark)
Mid$(Buffer, 1, Count) = Mark

Do While Not EOF(FileHandle_)
If Count = MAX_LENGTH Then Fail "Comment too long"
Ch1 = Ch2
Ch2 = Ch3
Cp = GetCodePoint
Ch3 = ToChar(Cp)

Select Case Ch3
Case vbCr
If GetChar <> vbLf Then UngetChar
GoTo CaseLF

Case vbLf
CaseLF:If IsSpace(AscW(Ch1)) And Ch2 = "_" Then
Cp = " "
GoTo CaseElse
End If

Exit Do

Case Else
CaseElse:  Count = Count + 1
Mid$(Buffer, Count, 1) = Ch3
End Select
Loop

ReadComment = Left$(Buffer, Count)
End Function

We are done scanning comments. Now, there are only four tokens left to be scanned: An opening parenthesis, a closing one, a comma, and a semicolon.
Let's do it:

Case ","
Token = Ch
TokenType = "LIST-SEPARATOR"

Case ";"
Token = Ch
TokenType = "PRINT-SEPARATOR"

Case "("
Token = Ch
TokenType = "LEFT-PAR"

Case ")"
Token = Ch
TokenType = "RIGHT-PAR"

You may have noticed that sometimes we don't know exactly what we have just scanned. Was it an octal number, a hexadecimal one, or a concatenation operator?
We'll fix that next week.

Andrej Biasic
2020-08-19