Let's build a transpiler! Part 6
This is the sixth post in a series of building a transpiler. You can find the previous ones here.Last time we were left to deal with comments.
VB6 has two kinds of comments. One is the apostrophe ('). It comments out everything at its right until the end of the line.
The other one is the Rem keyword. It acts the same as the apostrophe, the difference between them being the former can be used in a line with other commands, while the latter needs to be the first thing on its line.
This is how we will handle them: We'll create a function ReadComment that will accumulate every character up to a line break but not including it.
While we are at it, let's also get rid of the UCS-2 LE BOM if we see one:
Public Sub Main()
Dim Cp As Integer
Dim Ch As String * 1
Dim Token As String
Dim TokenType As String
Rem Get an available file number.
FileHandle_ = FreeFile
Rem File path for the source code is passed as a command-line argument.
Open Command$ For Binary As #FileHandle_
Rem Ensuring we close the file in case we have an error.
On Error GoTo CloseIt
Cp = GetCodePoint
Rem Get rid of BOM if there is one.
If Cp <> -257 Then UngetChar
Rem While we do not reach the end of file...
While Not EOF(FileHandle_)
Rem ...read a codepoint from it.
Cp = GetCodePoint
Ch = ToChar(Cp)
Select Case Ch
Case "["
Token = ReadEscapedIdentifier
TokenType = "EscapedIdentifier"
Case """"
Token = ReadString
TokenType = "String"
Case "&"
Token = ReadAmpersand
TokenType = "Hexa/Octa/Operator"
Case "#"
Token = ReadDateTime
TokenType = "Date"
Case "0" To "9"
Token = ReadNumber(Ch)
TokenType = "Number"
Case "+", "-", "*", "/", "\", "^", "=", ".", "!"
Token = Ch
TokenType = "Operator"
Case "<"
TokenType = "Operator"
Token = Ch
Ch = GetChar
If Ch = ">" Or Ch = "=" Then
Token = Token & Ch
Else
UngetChar
End If
Case ">"
TokenType = "Operator"
Token = Ch
If GetChar = "=" Then
Token = Token & "="
Else
UngetChar
End If
Case ":"
TokenType = "SoftLineBreak"
Token = Ch
Case vbLf
TokenType = "HardLineBreak"
Token = "LINE-BREAK"
Case vbCr
TokenType = "HardLineBreak"
If GetCodePoint <> 10 Then UngetChar
Token = "LINE-BREAK"
Case "'"
TokenType = "COMMENT"
Token = ReadComment(Ch)
Case Else
If IsSpace(Cp) Then
If Not EOF(FileHandle_) Then
If GetChar = "_" Then
If EOF(FileHandle_) Then
UngetChar
Else
Cp = GetCodePoint
If Cp <> 10 And Cp <> 13 Then UngetChar 2
End If
Else
UngetChar
End If
End If
TokenType = "WhiteSpace"
Token = "WHITE-SPACE"
ElseIf IsLetter(Cp) Then
Token = ReadIdentifier(Cp)
If IsKeyword(Token) Then
TokenType = "Keyword"
If Token = "Rem" Then
TokenType = "WHITE-SPACE"
Token = ReadComment(Token)
End If
ElseIf IsOperator(Token) Then
TokenType = "Operator"
Else
TokenType = "Identifier"
End If
Else
Rem tokens we did not deal yet.
End If
End Select
Debug.Print TokenType & "->" & Token
Wend
CloseIt:
Close #FileHandle_
Rem This is equivalent to a Throw in a Catch.
If Err.Number Then Err.Raise Err.Number
End Sub
Private Function ReadComment(ByVal Mark As String) As String
Const MAX_LENGTH = 1013
Dim Count As Integer
Dim Cp As Integer
Dim Ch1 As String * 1
Dim Ch2 As String * 1
Dim Ch3 As String * 1
Dim Buffer As String * MAX_LENGTH
Count = Len(Mark)
Mid$(Buffer, 1, Count) = Mark
Do While Not EOF(FileHandle_)
If Count = MAX_LENGTH Then Fail "Comment too long"
Ch1 = Ch2
Ch2 = Ch3
Cp = GetCodePoint
Ch3 = ToChar(Cp)
Select Case Ch3
Case vbCr
If GetChar <> vbLf Then UngetChar
GoTo CaseLF
Case vbLf
CaseLF:If IsSpace(AscW(Ch1)) And Ch2 = "_" Then
Cp = " "
GoTo CaseElse
End If
Exit Do
Case Else
CaseElse: Count = Count + 1
Mid$(Buffer, Count, 1) = Ch3
End Select
Loop
ReadComment = Left$(Buffer, Count)
End Function
We are done scanning comments. Now, there are only four tokens left to be scanned: An opening parenthesis, a closing one, a comma, and a semicolon.
Let's do it:
Case ","
Token = Ch
TokenType = "LIST-SEPARATOR"
Case ";"
Token = Ch
TokenType = "PRINT-SEPARATOR"
Case "("
Token = Ch
TokenType = "LEFT-PAR"
Case ")"
Token = Ch
TokenType = "RIGHT-PAR"
You may have noticed that sometimes we don't know exactly what we have just scanned. Was it an octal number, a hexadecimal one, or a concatenation operator?
We'll fix that next week.
Andrej Biasic
2020-08-19