Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 10

This is the tenth post in a series of building a transpiler.
You can find the previous ones here.

Last time, I said we would turn our Scanner module into a class.
Before we do that, let me tell you that it took me too much time to realize that some people would like to run the code instead of just reading it.
I know that not many of you out there have a VB6 compiler if any. So, we're going to convert the code we have so far into a Visual Basic Script (VBS) file.
(Sorry if you do not use Windows.)

VBS is a sub-set of VB, so we'll have to let go of several things:
Almost all of the items above are boring and mechanical transformations. Let's discuss now the big changes:
Regarding the duplicated code, what we are doing now is "pre-processing" the character we are going to return from the GetCodePoint function.
When reading a character, if it is a white-space - any white-space: tab, null, etc. - we turn it to a regular one (character 32.)
If it is a carriage return, we check if it is followed by a line feed. If it is, we discard the carriage return and return the line feed.
Doing this simplifies our code in several places. We no longer need to check if we have a carriage return or a line feed. It will always be a line feed. The same goes for spaces.
Finally, whenever we get a space, we check the next two characters. If they are an underscore and a line feed, we swallow these two characters and return the space.

This strategy presents a chalenge, though. As we are now "compressing" a white-space + underscore + line-break into a single white-space, our UngetChar won't work anymore.
Due to that we'll have to add a tkLineContinuation constant and return a proper token, so we know how much characters we need to "push back" in UnGetChar.

The final touch was to display the line and column when an error occurs. It helped me a lot when debugging the code.

Here it is:
(Click here if you want to skip the code.)

Option Explicit

Private Const SPAN_STRING = "<span style='color: brown;'>"
Private Const SPAN_KEYWORD = "<span style='color: blue;'>"
Private Const SPAN_COMMENT = "<span style='color: green;'>"

Const tkWhiteSpace = 0
Const tkComment = 1
Const tkIdentifier = 2
Const tkEscapedIdentifier = 3
Const tkKeyword = 4
Const tkIntegerNumber = 5
Const tkFloatNumber = 6
Const tkSciNumber = 7
Const tkOctalNumber = 8
Const tkHexaNumber = 9
Const tkFileHandle = 10
Const tkString = 11
Const tkDateTime = 12
Const tkOperator = 13
Const tkLeftParenthesis = 14
Const tkRightParenthesis = 15
Const tkHardLineBreak = 16
Const tkSoftLineBreak = 17
Const tkLineContinuation = 18
Const tkListSeparator = 19
Const tkPrintSeparator = 20
Const tkDirective = 21
Const tkEndOfStream = 22

Const ForReading = 1
Const ForWriting = 2
Dim vbBack : vbBack = Chr(8)

Main

Public Sub Main()
Dim Nbsp
Dim HtmlFile
Dim Text
Dim FilePath
Dim Token
Dim Scanner
Dim FSO

Rem File path for the source code is passed as a command-line argument.
FilePath = WScript.Arguments(0)

Set Scanner = New Scanner
Scanner.Open FilePath

Rem Output file will have the same name as the input file, but with an .HTML extension.
Set FSO = CreateObject("Scripting.FileSystemObject")
FilePath = FSO.BuildPath(FSO.GetParentFolderName(FilePath), FSO.GetBaseName(FilePath) & ".html")
Set HtmlFile = FSO.OpenTextFile(FilePath, ForWriting, True, True)

Nbsp = True

Do
Set Token = Scanner.GetToken

Select Case Token.Kind
Case tkWhiteSpace
If Nbsp Then
HtmlFile.Write "&nbsp;&nbsp;"
Else
HtmlFile.Write " "
End If

Case tkComment
HtmlFile.Write SPAN_COMMENT
HtmlFile.Write EncodeHtml(Token.Text)
HtmlFile.WriteLine "</span><br>"
Nbsp = True

Case tkIdentifier, tkIntegerNumber, tkFloatNumber, tkSciNumber
HtmlFile.Write Token.Text
Nbsp = False

Case tkEscapedIdentifier
HtmlFile.Write "[" & Token.Text & "]"
Nbsp = False

Case tkKeyword
HtmlFile.Write SPAN_KEYWORD & Token.Text & "</span>"
Nbsp = False

Case tkOctalNumber
HtmlFile.Write "&amp;O" & Token.Text

Case tkHexaNumber
HtmlFile.Write "&amp;H" & UCase(Token.Text)

Case tkFileHandle
HtmlFile.Write "#" & Token.Text

Case tkString
Text = Token.Text
Text = Replace(Token.Text, """", """""")
Text = EncodeHtml(Text)
HtmlFile.Write SPAN_STRING & """" & Text & """</span>"

Case tkDateTime
HtmlFile.Write "#" & Token.Text & "#"

Case tkOperator
If StrComp(Token.Text, Token.Text, vbBinaryCompare) = 0 Then
HtmlFile.Write EncodeHtml(Token.Text)
Else
HtmlFile.Write SPAN_KEYWORD & Token.Text & "</span>"
End If

Case tkLeftParenthesis, tkRightParenthesis, tkSoftLineBreak, tkListSeparator, tkPrintSeparator
HtmlFile.Write Token.Text

Case tkLineContinuation
HtmlFile.Write "&nbsp;_<br>"
Nbsp = True

Case tkHardLineBreak
HtmlFile.WriteLine "<br>"
Nbsp = True

Case tkDirective
HtmlFile.Write "#" & Token.Text
Nbsp = False

Case tkEndOfStream
Exit Do
End Select

If Token.Suffix <> vbNullChar Then HtmlFile.Write Token.Suffix
Loop

HtmlFile.Close
End Sub

Private Function EncodeHtml(ByVal Text)
Text = Replace(Text, "&", "&amp;")
Text = Replace(Text, "<", "&lt;")
Text = Replace(Text, ">", "&gt;")
EncodeHtml = Text
End Function

Class Token
Public Text
Public Suffix
Public Kind
Public Line
Public Column

Private Sub Class_Initialize()
Text = " "
Suffix = vbNullChar
Kind = tkWhiteSpace
End Sub
End Class

Class Scanner
Private File_ 'Source code file
Private UnChars_ 'Buffer for UngetChar
Private RunningLine_
Private RunningColumn_
private FrozenColumn_
Private PreviousColumn_

Private Sub Class_Initialize()
RunningLine_ = 0
RunningColumn_ = 1
Set File_ = Nothing
End Sub

Private Sub Class_Terminate()
If Not File_ Is Nothing Then File_.Close
End Sub

Public Sub Open(ByVal FilePath)
Dim Cp

With CreateObject("Scripting.FileSystemObject")
Set File_ = .OpenTextFile(FilePath, ForReading, False, True)
End With

Cp = GetCodePoint
If Cp <> -257 Then UngetChar Cp
End Sub

Public Function GetToken()
Dim Cp
Dim Ch
Dim Token

If File_.AtEndOfStream Then
Set GetToken = NewToken("", tkEndOfStream, vbNullChar)
Exit Function
End If

FrozenColumn_ = RunningColumn_
Cp = GetCodePoint
Ch = ChrW(Cp)

Select Case Ch
Case "["
Set Token = ReadEscapedIdentifier

Case """"
Set Token = ReadString

Case "&"
Set Token = ReadAmpersand

Case "#"
Set Token = ReadHash

Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Set Token = ReadNumber(Ch)

Case "+", "-", "*", "/", "\", "^", "=", ".", "!"
Set Token = NewToken(Ch, tkOperator, vbNullChar)

Case "<"
Set Token = NewToken(Ch, tkOperator, vbNullChar)
Ch = GetChar

If Ch = ">" Or Ch = "=" Then
Token.Text = Token.Text & Ch
Else
UngetChar Ch
End If

Case ">"
Set Token = NewToken(Ch, tkOperator, vbNullChar)
Ch = GetChar

If Ch = "=" Then
Token.Text = Token.Text & "="
Else
UngetChar Ch
End If

Case ":"
Set Token = NewToken(Ch, tkSoftLineBreak, vbNullChar)

Case vbLf
Set Token = NewToken(Ch, tkHardLineBreak, vbNullChar)

Case "'"
Set Token = ReadComment(Ch)

Case ","
Set Token = NewToken(Ch, tkListSeparator, vbNullChar)

Case ";"
Set Token = NewToken(Ch, tkPrintSeparator, vbNullChar)

Case "("
Set Token = NewToken(Ch, tkLeftParenthesis, vbNullChar)

Case ")"
Set Token = NewToken(Ch, tkRightParenthesis, vbNullChar)

Case " "
Set Token = NewToken(Ch, tkWhiteSpace, vbNullChar)

Case vbBack
Set Token = NewToken(Ch, tkLineContinuation, vbNullChar)

Case Else
If Not IsLetter(Cp) Then Fail "Invalid token: '" & Ch & "'"

Set Token = ReadIdentifier(Cp)

If IsKeyword(Token.Text) And Token.Suffix = vbNullChar Then
Token.Kind = tkKeyword
If StrComp(Token.Text, "Rem", vbTextCompare) = 0 Then Set Token = ReadComment(Token.Text)

ElseIf IsOperator(Token.Text) And Token.Suffix = vbNullChar Then
Token.Kind = tkOperator
End If
End Select

Set GetToken = Token
End Function

Private Function GetCodePoint()
Dim Ch, Ch2, Ch3, CheckLF

Ch = NextChar
If IsSpace(AscW(Ch)) Then Ch = " "

Select Case Ch
Case " "
Ch2 = NextChar

If Ch2 = "_" Then
Ch3 = NextChar

Select Case Ch3
Case vbCr
CheckLF = True
AdvanceLine
Ch = vbBack

Case vbLf
AdvanceLine
Ch = vbBack

Case Else
UnGetChar Ch3
UnGetChar Ch2
End Select
Else
UnGetChar Ch2
End If

Case vbCr
CheckLF = True
Ch = vbLf
End Select

If CheckLF Then
Ch2 = NextChar
If Ch2 <> vbLf Then UnGetChar Ch2
End IF

If Ch = vbLf Then AdvanceLine
GetCodePoint = AscW(Ch)
End Function

Private Function NextChar()
Dim Result

If UnChars_ = "" Then
If File_.AtEndOfStream Then Fail "Unexpected end of file"
Result = File_.Read(1)
Else
Result = Left(UnChars_, 1)
UnChars_ = Mid(UnChars_, 2)
End IF

RunningColumn_ = RunningColumn_ + 1
NextChar = Result
End Function

Private Function GetChar()
Dim Cp

Cp = GetCodePoint
GetChar = ChrW(Cp)
End Function

Private Sub AdvanceLine()
RunningLine_ = RunningLine_ + 1
PreviousColumn_ = RunningColumn_
RunningColumn_ = 1
End Sub

Private Sub UngetChar(ByVal Ch)
If VarType(Ch) <> vbString Then Ch = ChrW(Ch)

If Ch = vbLf Or Ch = vbBack Then
RunningLine_ = RunningLine_ - 1
RunningColumn_ = PreviousColumn_
End If

RunningColumn_ = RunningColumn_ - 1

If Ch = vbBack Then
RunningColumn_ = RunningColumn_ - 1
Ch = " _" & vbNewLine
End If

UnChars_ = Ch & UnChars_
End Sub

Private Sub Fail(ByVal Msg)
Err.Raise vbObjectError + 13, "Scanner", "(" & RunningLine_ & ", " & FrozenColumn_ & ") " & Msg
End Sub

Private Function ReadIdentifier(ByVal CodePoint)
Const MAX_LENGTH = 255
Dim IsOK
Dim Cp
Dim Count
Dim Ch
Dim Suffix
Dim Buffer

Suffix = vbNullChar
Count = 1
Buffer = ChrW(CodePoint)

Do While Not File_.AtEndOfStream
Cp = GetCodePoint
Ch = ChrW(Cp)

IsOK = Ch = "_"
If Not IsOK Then IsOK = Ch >= "0" And Ch <= "9"
If Not IsOK Then IsOK = IsLetter(Cp)
If Not IsOK Then IsOK = IsSurrogate(Cp)
If Not IsOK Then Exit Do

Buffer = Buffer & Ch
Count = Count + 1
If Count > MAX_LENGTH Then Fail "Identifier too long"
Loop

Select Case Ch
Case "%", "&", "^", "@", "!", "#", "$"
Suffix = Ch

Case Else
UngetChar Ch
End Select

Set ReadIdentifier = NewToken(Buffer, tkIdentifier, Suffix)
End Function

Private Function IsLetter(ByVal CodePoint)
IsLetter = CodePoint >= -32768 And CodePoint <= -24645 Or _
CodePoint >= -24576 And CodePoint <= -23412 Or _
CodePoint >= -22761 And CodePoint <= -22758 Or _
CodePoint >= -22528 And CodePoint <= -22527 Or _
CodePoint >= -22525 And CodePoint <= -22523 Or _
CodePoint >= -22521 And CodePoint <= -22518 Or _
CodePoint >= -22516 And CodePoint <= -22494 Or _
CodePoint >= -22464 And CodePoint <= -22413 Or _
CodePoint >= -21504 And CodePoint <= -10333 Or _
CodePoint >= -1792 And CodePoint <= -1491 Or _
CodePoint >= -1488 And CodePoint <= -1430 Or _
CodePoint >= -1424 And CodePoint <= -1319 Or _
CodePoint >= -1280 And CodePoint <= -1274 Or _
CodePoint >= -1261 And CodePoint <= -1257 Or _
CodePoint = -1251 Or _
CodePoint >= -1249 And CodePoint <= -1240 Or _
CodePoint >= -1238 And CodePoint <= -1226 Or _
CodePoint >= -1224 And CodePoint <= -1220 Or _
CodePoint = -1218 Or _
CodePoint = -1216 Or _
CodePoint = -1215 Or _
CodePoint = -1213 Or _
CodePoint = -1212 Or _
CodePoint >= -1210 And CodePoint <= -1103 Or _
CodePoint = -1069 Or _
CodePoint >= -1068 And CodePoint <= -707 Or _
CodePoint >= -688 And CodePoint <= -625 Or _
CodePoint >= -622 And CodePoint <= -569 Or _
CodePoint >= -528 And CodePoint <= -517 Or _
CodePoint >= -400 And CodePoint <= -396 Or _
CodePoint >= -394 And CodePoint <= -260 Or _
CodePoint >= -223 And CodePoint <= -198 Or _
CodePoint >= -191 And CodePoint <= -166 Or _
CodePoint >= -154 And CodePoint <= -66 Or _
CodePoint >= -62 And CodePoint <= -57 Or _
CodePoint >= -54 And CodePoint <= -49 Or _
CodePoint >= -46 And CodePoint <= -41 Or _
CodePoint >= -38 And CodePoint <= -36 Or _
CodePoint >= 65 And CodePoint <= 90 Or _
CodePoint >= 97 And CodePoint <= 122 Or _
CodePoint = 170 Or _
CodePoint = 181 Or _
CodePoint = 186 Or _
CodePoint >= 192 And CodePoint <= 214 Or _
CodePoint >= 216 And CodePoint <= 246 Or _
CodePoint >= 248 And CodePoint <= 705 Or _
CodePoint >= 710 And CodePoint <= 721 Or _
CodePoint >= 736 And CodePoint <= 740 Or _
CodePoint = 750 Or _
CodePoint >= 890 And CodePoint <= 893 Or _
CodePoint = 902 Or _
CodePoint >= 904 And CodePoint <= 906 Or _
CodePoint = 908 Or _
CodePoint >= 910 And CodePoint <= 929 Or _
CodePoint >= 931 And CodePoint <= 974 Or _
CodePoint >= 976 And CodePoint <= 1013 Or _
CodePoint >= 1015 And CodePoint <= 1153 Or _
CodePoint >= 1162 And CodePoint <= 1299 Or _
CodePoint >= 1329 And CodePoint <= 1366 Or _
CodePoint = 1369 Or _
CodePoint >= 1377 And CodePoint <= 1415 Or _
CodePoint >= 1488 And CodePoint <= 1514 Or _
CodePoint >= 1520 And CodePoint <= 1522 Or _
CodePoint >= 1569 And CodePoint <= 1594 Or _
CodePoint >= 1600 And CodePoint <= 1610 Or _
CodePoint = 1646 Or _
CodePoint = 1647 Or _
CodePoint >= 1649 And CodePoint <= 1747 Or _
CodePoint = 1749 Or _
CodePoint = 1765 Or _
CodePoint = 1766 Or _
CodePoint = 1774 Or _
CodePoint = 1775 Or _
CodePoint >= 1786 And CodePoint <= 1788 Or _
CodePoint = 1791 Or _
CodePoint = 1808 Or _
CodePoint >= 1810 And CodePoint <= 1839 Or _
CodePoint >= 1869 And CodePoint <= 1901 Or _
CodePoint >= 1920 And CodePoint <= 1957 Or _
CodePoint = 1969 Or _
CodePoint >= 1994 And CodePoint <= 2026 Or _
CodePoint = 2036 Or _
CodePoint = 2037 Or _
CodePoint = 2042
End Function

Private Function IsKeyword(ByVal Identifier)
Dim Keyword

For Each Keyword In Array("Any", "As", "Attribute", "Boolean", "ByRef", "ByVal", "Byte", "Call", "Case", "CDecl", "Circle", "Close", "Const", _
"Currency", "Date", "Debug", "Decimal", "Declare", "DefBool", "DefByte", "DefCur", "DefDate", "DefDbl", _
"DefDec", "DefInt", "DefLng", "DefObj", "DefSng", "DefStr", "DefVar", "Dim", "Do", "Double", "Each", _
"ElseIf", "Else", "Empty", "EndIf", "End", "EndIf", "Enum", "Erase", "Event", "Exit", "False", "For", "Friend", _
"Function", "Get", "Global", "GoSub", "GoTo", "If", "Implements", "In", "Input", "Integer", _
"Local", "Lock", "Let", "Like", "Long", "Loop", "LSet", "Len", "Me", "Next", "Nothing", "Null", "On", "Open", "Option", _
"Optional", "ParamArray", "PSet", "Preserve", "Print", "Private", "Public", "Put", "RaiseEvent", _
"ReDim", "Rem", "Resume", "Return", "RSet", "Seek", "Select", "Set", "Scale", "Shared", "Single", _
"Static", "Spc", "Stop", "String", "Sub", "Tab", "Then", "To", "True", "Type", "Unlock", "Until", "Variant", "Wend", _
"While", "With", "WithEvents", "Write")

If StrComp(Identifier, Keyword, vbTextCompare) = 0 Then
IsKeyword = True
Exit For
End If
Next
End Function

Private Function IsOperator(ByVal Identifier)
Dim Keyword

For Each Keyword In Array("AddressOf", "And", "Eqv", "Imp", "Is", "Mod", "New", "Not", "Or", "TypeOf", "Xor")

If StrComp(Identifier, Keyword, vbTextCompare) = 0 Then
IsOperator = True
Exit For
End If
Next
End Function

Private Function ReadEscapedIdentifier()
Const MAX_LENGTH = 255
Dim Cp
Dim Count
Dim Suffix
Dim Buffer

Suffix = vbNullChar

Do While Not File_.AtEndOfStream
Cp = GetCodePoint
If Cp = AscW("]") Then Exit Do
If Cp = 10 Then Fail "Invalid identifier"

Count = Count + 1
If Count > MAX_LENGTH Then Fail "Identifier too long"
Buffer = Buffer & ChrW(Cp)
Loop

If Not File_.AtEndOfStream Then
Suffix = GetChar

Select Case Suffix
Case "%", "&", "^", "@", "!", "#", "$"
Rem OK

Case Else
UngetChar Suffix
Suffix = vbNullChar
End Select
End If

Set ReadEscapedIdentifier = NewToken(Buffer, tkEscapedIdentifier, Suffix)
End Function

Private Function ReadString()
Const MAX_LENGTH = 1013
Dim Count
Dim Ch
Dim Buffer

Do
If Count = MAX_LENGTH Then Fail "String too long"

If File_.AtEndOfStream Then
Ch = vbLf
Else
Ch = GetChar
End If

Select Case Ch
Case """"
If File_.AtEndOfStream Then Exit Do
Ch = GetChar

If Ch = """" Then
Count = Append(Count, Buffer, Ch)
Else
Rem We read too much. Let's put it "back".
UngetChar Ch
Exit Do
End If

Case vbLf
Fail "Unclosed string"

Case Else
Count = Append(Count, Buffer, Ch)
End Select
Loop

Set ReadString = NewToken(Left(Buffer, Count), tkString, vbNullChar)
End Function

Private Function Append(ByRef Count, ByRef Buffer, ByVal Ch)
Buffer = Buffer & Ch
Append = Count + 1
End Function

Private Function ReadInteger(ByVal FirstDigit)
Const MAX_LENGTH = 29
Dim Cp
Dim Count
Dim Ch
Dim Suffix
Dim Buffer

Suffix = vbNullChar

If FirstDigit >= "0" And FirstDigit <= "9" Then
Count = 1
Buffer = FirstDigit
End If

Do Until File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Cp = GetCodePoint
Ch = ChrW(Cp)

Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Count = Count + 1
Buffer = Buffer & Ch

Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do

Case Else
UngetChar Ch
Exit Do
End Select
Loop

Set ReadInteger = NewToken(Buffer, tkIntegerNumber, Suffix)
End Function

Private Function ReadFloat(ByVal FirstDigit)
Dim Ch
Dim Result
Dim FracPart

Set Result = ReadInteger(FirstDigit)

If Result.Suffix = vbNullChar Then
If Not File_.AtEndOfStream Then
Ch = GetChar

If Ch = "." Then
Set FracPart = ReadInteger("")
If FracPart.Text = "" Then Fail "Invalid literal"
Result.Text = Result.Text & "." & FracPart.Text
Result.Kind = tkFloatNumber
Result.Suffix = FracPart.Suffix
Else
UngetChar Ch
End If
End If
End If

Set ReadFloat = Result
End Function

Private Function ReadNumber(ByVal FirstDigit)
Dim Ch
Dim Sg
Dim Result
Dim ExpPart

Sg = vbNullChar
Set Result = ReadFloat(FirstDigit)

If Result.Suffix = vbNullChar Then
If Not File_.AtEndOfStream Then
Ch = GetChar

Select Case Ch
Case "e", "E"
If File_.AtEndOfStream Then
UngetChar Ch
Else
Sg = GetChar

If Sg = "-" Or Sg = "+" Then
Ch = ""
Else
Ch = Sg
Sg = "+"
End If

Set ExpPart = ReadInteger(Ch)
If ExpPart.Text = "" Or ExpPart.Suffix <> vbNullChar Then Fail "Invalid literal"
Result.Text = Result.Text & "E" & Sg & ExpPart.Text
Result.Kind = tkSciNumber
End If

Case Else
UngetChar Ch
End Select
End If
End If

Set ReadNumber = Result
End Function

Private Function ReadAmpersand()
Const Msg = "Invalid literal"
Dim Cp
Dim Ch

Cp = GetCodePoint
Ch = ChrW(Cp)

Select Case Ch
Case "o", "O"
Set ReadAmpersand = ReadOctal

Case "h", "H"
Set ReadAmpersand = ReadHexa

Case Else
UngetChar Ch
Set ReadAmpersand = NewToken("&", tkOperator, vbNullChar)
End Select
End Function

Private Function ReadOctal()
Const MAX_LENGTH = 32
Dim Count
Dim Ch
Dim Suffix
Dim Buffer

Suffix = vbNullChar

Do While Not File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Ch = GetChar

Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7"
Count = Count + 1
Buffer = Buffer & Ch

Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do

Case Else
Exit Do
End Select
Loop

If Count = 0 Then Fail "Invalid literal"
Set ReadOctal = NewToken(Buffer, tkOctalNumber, Suffix)
End Function

Private Function ReadHexa()
Const MAX_LENGTH = 24
Dim Count
Dim Ch
Dim Suffix
Dim Buffer

Suffix = vbNullChar

Do While Not File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Ch = GetChar

Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", _
"a", "b", "c", "d", "e", "f", _
"A", "B", "C", "D", "E", "F"
Count = Count + 1
Buffer = Buffer & Ch

Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do

Case Else
UnGetChar Ch
Exit Do
End Select
Loop

If Count = 0 Then Fail "Invalid literal"
Set ReadHexa = NewToken(Buffer, tkHexaNumber, Suffix)
End Function

Private Function ReadHash()
Const Msg = "Invalid literal"
Dim Cp
Dim Number
Dim Name
Dim Ch
Dim Token

Rem Let's get the first number.
Set Token = ReadInteger("")

If Token.Text = "" Then
Rem Maybe we have a month name?
Name = ReadMonthName

Select Case UCase(Name)
Case "IF", "ELSEIF", "ELSE", "END", "CONST"
Rem Not a month name, we have a compiler directive instead.
Set ReadHash = NewToken(Name, tkDirective, vbNullChar)
Exit Function

Case ""
Fail Msg

Case Else
Number = ConvertNameToNumber(Name)

If Number = 0 Then
Rem Not a month name, we have a variable file-handle instead.
Rem Hopefully this variable is not named Feb, for instance, otherwise we'll get tricked...
Set ReadHash = NewToken(Name, tkFileHandle, vbNullChar)
Exit Function
End If

Token.Text = CStr(Number)
End Select
End If

Rem Let's get the first separator.
Cp = GetCodePoint
Ch = ChrW(Cp)

If IsLetter(Cp) Or Ch = "," Then
Rem We have a numeric file-handle
UngetChar Ch
Token.Kind = tkFileHandle
Set ReadHash = Token
Exit Function
End If

If Ch = ":" Then
Rem We are reading a time literal.
Name = ReadTime(Token.Text)

Rem Date literal must end with a '#'.
Ch = GetChar
If Ch <> "#" Then Fail Msg

Name = "1899-12-30 " & Name
Set ReadHash = NewToken(Name, tkDateTime, vbNullChar)
Exit Function
End If

Rem We 'll suppose it is a valid separator.
On Error Resume Next
Name = ReadDate(Token.Text, Ch)

If Err.Number Then
Rem It is not a date, but a numeric file handle
On Error GoTo 0
UngetChar Ch
Token.Kind = tkFileHandle
Set ReadHash = Token
Exit Function
End If

On Error GoTo 0
Ch = GetChar

Select Case Ch
Case " "
Rem We may have a date and time literal together.
Set ReadHash = NewToken(ReadTime, tkDateTime, vbNullChar)
If ReadHash.Text = "" Then Fail Msg
ReadHash.Text = Name & " " & ReadHash.Text

Ch = GetChar
If Ch <> "#" Then Fail Msg

Case "#"
Rem Literal does not have a time part. Let's add it.
Set ReadHash = NewToken(Name & " 00:00:00", tkDateTime, vbNullChar)

Case Else
Fail Msg
End Select
End Function

Private Function ReadDate(ByVal FirstNumber, ByVal Separator)
Const Msg = "Invalid literal"
Dim YYYY
Dim MM
Dim DD
Dim N
Dim S
Dim Result
Dim SecondNumber
Dim ThirdNumber
Dim Ch

SecondNumber = ReadInteger("")

If SecondNumber = "" Then Fail Msg

Rem The next separator must match the first one.
Ch = GetChar
If Ch <> Separator Then Fail Msg

ThirdNumber = ReadInteger("")
If ThirdNumber = "" Then Fail Msg

If CInt(FirstNumber) >= 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)

If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If

Rem Validate year.
If YYYY > 9999 Then Fail Msg

Rem Validate month.
If MM < 1 Or MM > 12 Then Fail Msg

Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then Fail Msg

Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then Fail Msg
Else
If DD > 28 Then Fail Msg
End If

Case Else
If DD > 31 Then Fail Msg
End Select

Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"

If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"

If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)

ReadDate = Result
End Function

Private Function ReadTime(ByVal FirstNumber)
Const Msg = "Invalid literal"
Dim HH
Dim NN
Dim SS
Dim Number
Dim Ch
Dim Ch2
Dim AP

AP = vbNullChar

HH = CInt(FirstNumber)
Number = ReadInteger("")
If Number = "" Then Err.Raise 0
NN = CInt(Number)

Ch = GetChar

If Ch = ":" Then
Number = ReadInteger("")
If Number = "" Then Err.Raise 0
SS = CInt(Number)
Else
UngetChar Ch
SS = 0
End If

If Not File_.AtEndOfStream Then
Ch = GetChar

If Ch = " " Then
If Not File_.AtEndOfStream Then
Ch = GetChar

If Ch = "a" Or Ch = "A" Then
Ch2 = GetChar

If Ch2 = "m" Or Ch2 = "M" Then
AP = "A"

Else
UnGetChar Ch2
UnGetChar Ch
UngetChar " "
End If

ElseIf Ch = "p" Or Ch = "P" Then
Ch2 = GetChar

If Ch2 = "m" Or Ch2 = "M" Then
AP = "P"
Else
UnGetChar Ch2
UnGetChar Ch
UngetChar " "
End If

Else
UnGetChar Ch
UngetChar " "
End If
End If
Else
UngetChar Ch
End If
End If

Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then Err.Raise 0
If NN < 0 Or NN > 59 Then Err.Raise 0
If SS < 0 Or SS > 59 Then Err.Raise 0

If AP = "A" Then
If HH = 12 Then HH = 0

ElseIf AP = "P" Then
If HH <> 12 Then HH = HH + 12
End If

Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number

Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number

Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number

ReadTime = Number
End Function

Private Function ReadMonthName()
Dim Result
Dim Ch
Dim Prv

Do While Not File_.AtEndOfStream
Prv = Ch
Ch = GetChar

Select Case Ch
Case "#", vbLf, ",", ";", ")", " "
UngetChar Ch
Exit Do

Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Rem We safely can assume we read two characters more than needed.
UngetChar Ch
UngetChar Prv
Result = Left(Result, Len(Result) - 1)
Exit Do

Case Else
Result = Result & Ch
End Select
Loop

ReadMonthName = Result
End Function

Private Function ConvertNameToNumber(ByVal Name)
Dim Count
Dim Result
Dim MonthName
Dim MonthNames

MonthNames = Array("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

For Each MonthName In MonthNames
Count = Count + 1

If StrComp(Name, MonthName, vbTextCompare) = 0 Then Result = Count
If Result = 0 Then: If StrComp(Name, Left(MonthName, 3), vbTextCompare) = 0 Then Result = Count
If Result <> 0 Then Exit For
Next

ConvertNameToNumber = Result
End Function

Private Function IsSpace(ByVal CodePoint)
Const NULL_CHAR = 0
Const VERTICAL_TAB = 9
Const EOM = 25
Const WHITE_SPACE = 32
Const NO_BREAK_SPACE = 160
Const OGHAM_SPACE_MARK = &H1680
Const MONGOLIAN_VOWEL_SEPARATOR = &H180E
Const EN_QUAD = &H2000
Const HAIR_SPACE = &H200A
Const NARROW_NO_BREAK_SPACE = &H202F
Const MEDIUM_MATHEMATICAL_SPACE = &H205F
Const IDEOGRAPHIC_SPACE = &H3000

Select Case CodePoint
Case NULL_CHAR, WHITE_SPACE, VERTICAL_TAB, EOM, NO_BREAK_SPACE, OGHAM_SPACE_MARK, MONGOLIAN_VOWEL_SEPARATOR, NARROW_NO_BREAK_SPACE, MEDIUM_MATHEMATICAL_SPACE, IDEOGRAPHIC_SPACE
IsSpace = True

Case Else
IsSpace = CodePoint >= EN_QUAD And CodePoint <= HAIR_SPACE
End Select
End Function

Private Function NewToken(ByVal Text, ByVal Kind, ByVal Suffix)
Set NewToken = New Token

With NewToken
.Text = Text
.Kind = Kind
.Suffix = Suffix
.Line = RunningLine_
.Column = FrozenColumn_
End With
End Function

Private Function ReadComment(ByVal Mark)
Const MAX_LENGTH = 1013
Dim Count
Dim Ch
Dim Buffer

Count = Len(Mark)
Buffer = Mark

Do Until File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Comment too long"
Ch = GetChar
If Ch = vbLf Then Exit Do

Count = Count + 1
Buffer = Buffer & Ch
Loop

Set ReadComment = NewToken(Buffer, tkComment, vbNullChar)
End Function

Rem The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF.
Private Function IsHighSurrogate(ByVal Character)
IsHighSurrogate = Character >= -10240 And Character <= -9217 Or Character >= 55296 And Character <= 56319
End Function

Rem The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF.
Private Function IsLowSurrogate(ByVal Character)
IsLowSurrogate = Character >= -9216 And Character <= -8193 Or Character >= 56320 And Character <= 57343
End Function

Private Function IsSurrogate(ByVal Character)
IsSurrogate = IsLowSurrogate(Character) Or IsHighSurrogate(Character)
End Function
End Class

Now, all you have to do is copy the code above, put it in a file, and make sure its extension is .VBS.
Then, write some VB code in another file, save it as UCS-2 LE (a.k.a UTF-16 LE), and drag and drop it over the .VBS file.
It will create an .HTML file with colored keywords, comments, and strings.
Or you'll get an unhandled error. Ugh!

Next week, we'll take a break from our transpiler journey.

Andrej Biasic
2020-09-16