Let's build a transpiler! Part 10
This is the tenth post in a series of building a transpiler.You can find the previous ones here.
Last time, I said we would turn our Scanner module into a class.
Before we do that, let me tell you that it took me too much time to realize that some people would like to run the code instead of just reading it.
I know that not many of you out there have a VB6 compiler if any. So, we're going to convert the code we have so far into a Visual Basic Script (VBS) file.
(Sorry if you do not use Windows.)
VBS is a sub-set of VB, so we'll have to let go of several things:
- For starters, it does not have type annotations. So every As Integer, As String, etc. has to go away.
- It does not support optional arguments either. Even though there is a hack to have some kind of optional argument support in VBS, I chose not to do it.
So, Optional and the default value has to go away too, and we must now supply all parameters in function calls. - Our ToChar function "stuffed" two bytes of an integer into a string and called it the day. As we no longer have bytes, integers, or strings, we need to substitute it by a native function: ChrW.
- By the way, VBS does not "understand" names that end with a "$", so we must remove it too.
- Another thing we must remove is the name for arguments. They make our code more legible, but VBS does not allow them.
- VBS does not let us have Enums, so we will turn every enumerand into a Const declaration.
- Another annoying thing we miss when converting to VBS is the use of To in a Case. Now we need to spell out every single value (sigh!)
- Something we've been relying on so far is foreign to VBS: Option Compare Text. Every time we compared two or more strings uncarelessly, now we will have to use function StrComp instead.
- As I said before, VBS lacks a fundamental feature that, to me, makes it only a toy language - albeit a very useful one: Error handling. There is no substitute for it unless we start doing strange things. So, no more On Error (...)
- We can't use GoSub in VBS. We'll have to morph its target into proper sub or function. Fortunately, we only have a few of them.
- GoTo is a no game, too. Now we are forced to deal with that duplicated code I mentioned in my previous post. More about it below.
- Finally, for VBS, Mid is only a function, not a statement too. As it was mostly an optimization, we don't lose much using simple string concatenation in its place.
- Command-line arguments are made available to VBS through WScript's Arguments property. We'll replace Command$ for that.
- File I/O in VBS is done through the FileSystemObject. We'll instantiate it to be able to open a file, read from or write to it, and close it.
- We cannot read integers from the source code file as we were doing before, only characters (or text.) While our previous code reads an integer from the file and turns it into a string, now we need to do the opposite: Read a character and turn it into an integer.
- Also, now we cannot rewind our next read position within the file if we read too much. I declared a class-level variable (UnChars_) to temporarily store these characters. Whenever we read too much, we concatenate the character in UnChars_. Inside GetChar function, we check it first. If there's a character sitting there, we grab it and remove it from there.
- We can finally use the Class keyword to declare classes. Yay!
- Our old Main now is adapted into a GetToken function.
- The new Main instantiates a Scanner class, asks it to open our source code file, and calls Scanner's GetToken to... well, get the token we are going to work on.
When reading a character, if it is a white-space - any white-space: tab, null, etc. - we turn it to a regular one (character 32.)
If it is a carriage return, we check if it is followed by a line feed. If it is, we discard the carriage return and return the line feed.
Doing this simplifies our code in several places. We no longer need to check if we have a carriage return or a line feed. It will always be a line feed. The same goes for spaces.
Finally, whenever we get a space, we check the next two characters. If they are an underscore and a line feed, we swallow these two characters and return the space.
This strategy presents a chalenge, though. As we are now "compressing" a white-space + underscore + line-break into a single white-space, our UngetChar won't work anymore.
Due to that we'll have to add a tkLineContinuation constant and return a proper token, so we know how much characters we need to "push back" in UnGetChar.
The final touch was to display the line and column when an error occurs. It helped me a lot when debugging the code.
Here it is:
(Click here if you want to skip the code.)
Option Explicit
Private Const SPAN_STRING = "<span style='color: brown;'>"
Private Const SPAN_KEYWORD = "<span style='color: blue;'>"
Private Const SPAN_COMMENT = "<span style='color: green;'>"
Const tkWhiteSpace = 0
Const tkComment = 1
Const tkIdentifier = 2
Const tkEscapedIdentifier = 3
Const tkKeyword = 4
Const tkIntegerNumber = 5
Const tkFloatNumber = 6
Const tkSciNumber = 7
Const tkOctalNumber = 8
Const tkHexaNumber = 9
Const tkFileHandle = 10
Const tkString = 11
Const tkDateTime = 12
Const tkOperator = 13
Const tkLeftParenthesis = 14
Const tkRightParenthesis = 15
Const tkHardLineBreak = 16
Const tkSoftLineBreak = 17
Const tkLineContinuation = 18
Const tkListSeparator = 19
Const tkPrintSeparator = 20
Const tkDirective = 21
Const tkEndOfStream = 22
Const ForReading = 1
Const ForWriting = 2
Dim vbBack : vbBack = Chr(8)
Main
Public Sub Main()
Dim Nbsp
Dim HtmlFile
Dim Text
Dim FilePath
Dim Token
Dim Scanner
Dim FSO
Rem File path for the source code is passed as a command-line argument.
FilePath = WScript.Arguments(0)
Set Scanner = New Scanner
Scanner.Open FilePath
Rem Output file will have the same name as the input file, but with an .HTML extension.
Set FSO = CreateObject("Scripting.FileSystemObject")
FilePath = FSO.BuildPath(FSO.GetParentFolderName(FilePath), FSO.GetBaseName(FilePath) & ".html")
Set HtmlFile = FSO.OpenTextFile(FilePath, ForWriting, True, True)
Nbsp = True
Do
Set Token = Scanner.GetToken
Select Case Token.Kind
Case tkWhiteSpace
If Nbsp Then
HtmlFile.Write " "
Else
HtmlFile.Write " "
End If
Case tkComment
HtmlFile.Write SPAN_COMMENT
HtmlFile.Write EncodeHtml(Token.Text)
HtmlFile.WriteLine "</span><br>"
Nbsp = True
Case tkIdentifier, tkIntegerNumber, tkFloatNumber, tkSciNumber
HtmlFile.Write Token.Text
Nbsp = False
Case tkEscapedIdentifier
HtmlFile.Write "[" & Token.Text & "]"
Nbsp = False
Case tkKeyword
HtmlFile.Write SPAN_KEYWORD & Token.Text & "</span>"
Nbsp = False
Case tkOctalNumber
HtmlFile.Write "&O" & Token.Text
Case tkHexaNumber
HtmlFile.Write "&H" & UCase(Token.Text)
Case tkFileHandle
HtmlFile.Write "#" & Token.Text
Case tkString
Text = Token.Text
Text = Replace(Token.Text, """", """""")
Text = EncodeHtml(Text)
HtmlFile.Write SPAN_STRING & """" & Text & """</span>"
Case tkDateTime
HtmlFile.Write "#" & Token.Text & "#"
Case tkOperator
If StrComp(Token.Text, Token.Text, vbBinaryCompare) = 0 Then
HtmlFile.Write EncodeHtml(Token.Text)
Else
HtmlFile.Write SPAN_KEYWORD & Token.Text & "</span>"
End If
Case tkLeftParenthesis, tkRightParenthesis, tkSoftLineBreak, tkListSeparator, tkPrintSeparator
HtmlFile.Write Token.Text
Case tkLineContinuation
HtmlFile.Write " _<br>"
Nbsp = True
Case tkHardLineBreak
HtmlFile.WriteLine "<br>"
Nbsp = True
Case tkDirective
HtmlFile.Write "#" & Token.Text
Nbsp = False
Case tkEndOfStream
Exit Do
End Select
If Token.Suffix <> vbNullChar Then HtmlFile.Write Token.Suffix
Loop
HtmlFile.Close
End Sub
Private Function EncodeHtml(ByVal Text)
Text = Replace(Text, "&", "&")
Text = Replace(Text, "<", "<")
Text = Replace(Text, ">", ">")
EncodeHtml = Text
End Function
Class Token
Public Text
Public Suffix
Public Kind
Public Line
Public Column
Private Sub Class_Initialize()
Text = " "
Suffix = vbNullChar
Kind = tkWhiteSpace
End Sub
End Class
Class Scanner
Private File_ 'Source code file
Private UnChars_ 'Buffer for UngetChar
Private RunningLine_
Private RunningColumn_
private FrozenColumn_
Private PreviousColumn_
Private Sub Class_Initialize()
RunningLine_ = 0
RunningColumn_ = 1
Set File_ = Nothing
End Sub
Private Sub Class_Terminate()
If Not File_ Is Nothing Then File_.Close
End Sub
Public Sub Open(ByVal FilePath)
Dim Cp
With CreateObject("Scripting.FileSystemObject")
Set File_ = .OpenTextFile(FilePath, ForReading, False, True)
End With
Cp = GetCodePoint
If Cp <> -257 Then UngetChar Cp
End Sub
Public Function GetToken()
Dim Cp
Dim Ch
Dim Token
If File_.AtEndOfStream Then
Set GetToken = NewToken("", tkEndOfStream, vbNullChar)
Exit Function
End If
FrozenColumn_ = RunningColumn_
Cp = GetCodePoint
Ch = ChrW(Cp)
Select Case Ch
Case "["
Set Token = ReadEscapedIdentifier
Case """"
Set Token = ReadString
Case "&"
Set Token = ReadAmpersand
Case "#"
Set Token = ReadHash
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Set Token = ReadNumber(Ch)
Case "+", "-", "*", "/", "\", "^", "=", ".", "!"
Set Token = NewToken(Ch, tkOperator, vbNullChar)
Case "<"
Set Token = NewToken(Ch, tkOperator, vbNullChar)
Ch = GetChar
If Ch = ">" Or Ch = "=" Then
Token.Text = Token.Text & Ch
Else
UngetChar Ch
End If
Case ">"
Set Token = NewToken(Ch, tkOperator, vbNullChar)
Ch = GetChar
If Ch = "=" Then
Token.Text = Token.Text & "="
Else
UngetChar Ch
End If
Case ":"
Set Token = NewToken(Ch, tkSoftLineBreak, vbNullChar)
Case vbLf
Set Token = NewToken(Ch, tkHardLineBreak, vbNullChar)
Case "'"
Set Token = ReadComment(Ch)
Case ","
Set Token = NewToken(Ch, tkListSeparator, vbNullChar)
Case ";"
Set Token = NewToken(Ch, tkPrintSeparator, vbNullChar)
Case "("
Set Token = NewToken(Ch, tkLeftParenthesis, vbNullChar)
Case ")"
Set Token = NewToken(Ch, tkRightParenthesis, vbNullChar)
Case " "
Set Token = NewToken(Ch, tkWhiteSpace, vbNullChar)
Case vbBack
Set Token = NewToken(Ch, tkLineContinuation, vbNullChar)
Case Else
If Not IsLetter(Cp) Then Fail "Invalid token: '" & Ch & "'"
Set Token = ReadIdentifier(Cp)
If IsKeyword(Token.Text) And Token.Suffix = vbNullChar Then
Token.Kind = tkKeyword
If StrComp(Token.Text, "Rem", vbTextCompare) = 0 Then Set Token = ReadComment(Token.Text)
ElseIf IsOperator(Token.Text) And Token.Suffix = vbNullChar Then
Token.Kind = tkOperator
End If
End Select
Set GetToken = Token
End Function
Private Function GetCodePoint()
Dim Ch, Ch2, Ch3, CheckLF
Ch = NextChar
If IsSpace(AscW(Ch)) Then Ch = " "
Select Case Ch
Case " "
Ch2 = NextChar
If Ch2 = "_" Then
Ch3 = NextChar
Select Case Ch3
Case vbCr
CheckLF = True
AdvanceLine
Ch = vbBack
Case vbLf
AdvanceLine
Ch = vbBack
Case Else
UnGetChar Ch3
UnGetChar Ch2
End Select
Else
UnGetChar Ch2
End If
Case vbCr
CheckLF = True
Ch = vbLf
End Select
If CheckLF Then
Ch2 = NextChar
If Ch2 <> vbLf Then UnGetChar Ch2
End IF
If Ch = vbLf Then AdvanceLine
GetCodePoint = AscW(Ch)
End Function
Private Function NextChar()
Dim Result
If UnChars_ = "" Then
If File_.AtEndOfStream Then Fail "Unexpected end of file"
Result = File_.Read(1)
Else
Result = Left(UnChars_, 1)
UnChars_ = Mid(UnChars_, 2)
End IF
RunningColumn_ = RunningColumn_ + 1
NextChar = Result
End Function
Private Function GetChar()
Dim Cp
Cp = GetCodePoint
GetChar = ChrW(Cp)
End Function
Private Sub AdvanceLine()
RunningLine_ = RunningLine_ + 1
PreviousColumn_ = RunningColumn_
RunningColumn_ = 1
End Sub
Private Sub UngetChar(ByVal Ch)
If VarType(Ch) <> vbString Then Ch = ChrW(Ch)
If Ch = vbLf Or Ch = vbBack Then
RunningLine_ = RunningLine_ - 1
RunningColumn_ = PreviousColumn_
End If
RunningColumn_ = RunningColumn_ - 1
If Ch = vbBack Then
RunningColumn_ = RunningColumn_ - 1
Ch = " _" & vbNewLine
End If
UnChars_ = Ch & UnChars_
End Sub
Private Sub Fail(ByVal Msg)
Err.Raise vbObjectError + 13, "Scanner", "(" & RunningLine_ & ", " & FrozenColumn_ & ") " & Msg
End Sub
Private Function ReadIdentifier(ByVal CodePoint)
Const MAX_LENGTH = 255
Dim IsOK
Dim Cp
Dim Count
Dim Ch
Dim Suffix
Dim Buffer
Suffix = vbNullChar
Count = 1
Buffer = ChrW(CodePoint)
Do While Not File_.AtEndOfStream
Cp = GetCodePoint
Ch = ChrW(Cp)
IsOK = Ch = "_"
If Not IsOK Then IsOK = Ch >= "0" And Ch <= "9"
If Not IsOK Then IsOK = IsLetter(Cp)
If Not IsOK Then IsOK = IsSurrogate(Cp)
If Not IsOK Then Exit Do
Buffer = Buffer & Ch
Count = Count + 1
If Count > MAX_LENGTH Then Fail "Identifier too long"
Loop
Select Case Ch
Case "%", "&", "^", "@", "!", "#", "$"
Suffix = Ch
Case Else
UngetChar Ch
End Select
Set ReadIdentifier = NewToken(Buffer, tkIdentifier, Suffix)
End Function
Private Function IsLetter(ByVal CodePoint)
IsLetter = CodePoint >= -32768 And CodePoint <= -24645 Or _
CodePoint >= -24576 And CodePoint <= -23412 Or _
CodePoint >= -22761 And CodePoint <= -22758 Or _
CodePoint >= -22528 And CodePoint <= -22527 Or _
CodePoint >= -22525 And CodePoint <= -22523 Or _
CodePoint >= -22521 And CodePoint <= -22518 Or _
CodePoint >= -22516 And CodePoint <= -22494 Or _
CodePoint >= -22464 And CodePoint <= -22413 Or _
CodePoint >= -21504 And CodePoint <= -10333 Or _
CodePoint >= -1792 And CodePoint <= -1491 Or _
CodePoint >= -1488 And CodePoint <= -1430 Or _
CodePoint >= -1424 And CodePoint <= -1319 Or _
CodePoint >= -1280 And CodePoint <= -1274 Or _
CodePoint >= -1261 And CodePoint <= -1257 Or _
CodePoint = -1251 Or _
CodePoint >= -1249 And CodePoint <= -1240 Or _
CodePoint >= -1238 And CodePoint <= -1226 Or _
CodePoint >= -1224 And CodePoint <= -1220 Or _
CodePoint = -1218 Or _
CodePoint = -1216 Or _
CodePoint = -1215 Or _
CodePoint = -1213 Or _
CodePoint = -1212 Or _
CodePoint >= -1210 And CodePoint <= -1103 Or _
CodePoint = -1069 Or _
CodePoint >= -1068 And CodePoint <= -707 Or _
CodePoint >= -688 And CodePoint <= -625 Or _
CodePoint >= -622 And CodePoint <= -569 Or _
CodePoint >= -528 And CodePoint <= -517 Or _
CodePoint >= -400 And CodePoint <= -396 Or _
CodePoint >= -394 And CodePoint <= -260 Or _
CodePoint >= -223 And CodePoint <= -198 Or _
CodePoint >= -191 And CodePoint <= -166 Or _
CodePoint >= -154 And CodePoint <= -66 Or _
CodePoint >= -62 And CodePoint <= -57 Or _
CodePoint >= -54 And CodePoint <= -49 Or _
CodePoint >= -46 And CodePoint <= -41 Or _
CodePoint >= -38 And CodePoint <= -36 Or _
CodePoint >= 65 And CodePoint <= 90 Or _
CodePoint >= 97 And CodePoint <= 122 Or _
CodePoint = 170 Or _
CodePoint = 181 Or _
CodePoint = 186 Or _
CodePoint >= 192 And CodePoint <= 214 Or _
CodePoint >= 216 And CodePoint <= 246 Or _
CodePoint >= 248 And CodePoint <= 705 Or _
CodePoint >= 710 And CodePoint <= 721 Or _
CodePoint >= 736 And CodePoint <= 740 Or _
CodePoint = 750 Or _
CodePoint >= 890 And CodePoint <= 893 Or _
CodePoint = 902 Or _
CodePoint >= 904 And CodePoint <= 906 Or _
CodePoint = 908 Or _
CodePoint >= 910 And CodePoint <= 929 Or _
CodePoint >= 931 And CodePoint <= 974 Or _
CodePoint >= 976 And CodePoint <= 1013 Or _
CodePoint >= 1015 And CodePoint <= 1153 Or _
CodePoint >= 1162 And CodePoint <= 1299 Or _
CodePoint >= 1329 And CodePoint <= 1366 Or _
CodePoint = 1369 Or _
CodePoint >= 1377 And CodePoint <= 1415 Or _
CodePoint >= 1488 And CodePoint <= 1514 Or _
CodePoint >= 1520 And CodePoint <= 1522 Or _
CodePoint >= 1569 And CodePoint <= 1594 Or _
CodePoint >= 1600 And CodePoint <= 1610 Or _
CodePoint = 1646 Or _
CodePoint = 1647 Or _
CodePoint >= 1649 And CodePoint <= 1747 Or _
CodePoint = 1749 Or _
CodePoint = 1765 Or _
CodePoint = 1766 Or _
CodePoint = 1774 Or _
CodePoint = 1775 Or _
CodePoint >= 1786 And CodePoint <= 1788 Or _
CodePoint = 1791 Or _
CodePoint = 1808 Or _
CodePoint >= 1810 And CodePoint <= 1839 Or _
CodePoint >= 1869 And CodePoint <= 1901 Or _
CodePoint >= 1920 And CodePoint <= 1957 Or _
CodePoint = 1969 Or _
CodePoint >= 1994 And CodePoint <= 2026 Or _
CodePoint = 2036 Or _
CodePoint = 2037 Or _
CodePoint = 2042
End Function
Private Function IsKeyword(ByVal Identifier)
Dim Keyword
For Each Keyword In Array("Any", "As", "Attribute", "Boolean", "ByRef", "ByVal", "Byte", "Call", "Case", "CDecl", "Circle", "Close", "Const", _
"Currency", "Date", "Debug", "Decimal", "Declare", "DefBool", "DefByte", "DefCur", "DefDate", "DefDbl", _
"DefDec", "DefInt", "DefLng", "DefObj", "DefSng", "DefStr", "DefVar", "Dim", "Do", "Double", "Each", _
"ElseIf", "Else", "Empty", "EndIf", "End", "EndIf", "Enum", "Erase", "Event", "Exit", "False", "For", "Friend", _
"Function", "Get", "Global", "GoSub", "GoTo", "If", "Implements", "In", "Input", "Integer", _
"Local", "Lock", "Let", "Like", "Long", "Loop", "LSet", "Len", "Me", "Next", "Nothing", "Null", "On", "Open", "Option", _
"Optional", "ParamArray", "PSet", "Preserve", "Print", "Private", "Public", "Put", "RaiseEvent", _
"ReDim", "Rem", "Resume", "Return", "RSet", "Seek", "Select", "Set", "Scale", "Shared", "Single", _
"Static", "Spc", "Stop", "String", "Sub", "Tab", "Then", "To", "True", "Type", "Unlock", "Until", "Variant", "Wend", _
"While", "With", "WithEvents", "Write")
If StrComp(Identifier, Keyword, vbTextCompare) = 0 Then
IsKeyword = True
Exit For
End If
Next
End Function
Private Function IsOperator(ByVal Identifier)
Dim Keyword
For Each Keyword In Array("AddressOf", "And", "Eqv", "Imp", "Is", "Mod", "New", "Not", "Or", "TypeOf", "Xor")
If StrComp(Identifier, Keyword, vbTextCompare) = 0 Then
IsOperator = True
Exit For
End If
Next
End Function
Private Function ReadEscapedIdentifier()
Const MAX_LENGTH = 255
Dim Cp
Dim Count
Dim Suffix
Dim Buffer
Suffix = vbNullChar
Do While Not File_.AtEndOfStream
Cp = GetCodePoint
If Cp = AscW("]") Then Exit Do
If Cp = 10 Then Fail "Invalid identifier"
Count = Count + 1
If Count > MAX_LENGTH Then Fail "Identifier too long"
Buffer = Buffer & ChrW(Cp)
Loop
If Not File_.AtEndOfStream Then
Suffix = GetChar
Select Case Suffix
Case "%", "&", "^", "@", "!", "#", "$"
Rem OK
Case Else
UngetChar Suffix
Suffix = vbNullChar
End Select
End If
Set ReadEscapedIdentifier = NewToken(Buffer, tkEscapedIdentifier, Suffix)
End Function
Private Function ReadString()
Const MAX_LENGTH = 1013
Dim Count
Dim Ch
Dim Buffer
Do
If Count = MAX_LENGTH Then Fail "String too long"
If File_.AtEndOfStream Then
Ch = vbLf
Else
Ch = GetChar
End If
Select Case Ch
Case """"
If File_.AtEndOfStream Then Exit Do
Ch = GetChar
If Ch = """" Then
Count = Append(Count, Buffer, Ch)
Else
Rem We read too much. Let's put it "back".
UngetChar Ch
Exit Do
End If
Case vbLf
Fail "Unclosed string"
Case Else
Count = Append(Count, Buffer, Ch)
End Select
Loop
Set ReadString = NewToken(Left(Buffer, Count), tkString, vbNullChar)
End Function
Private Function Append(ByRef Count, ByRef Buffer, ByVal Ch)
Buffer = Buffer & Ch
Append = Count + 1
End Function
Private Function ReadInteger(ByVal FirstDigit)
Const MAX_LENGTH = 29
Dim Cp
Dim Count
Dim Ch
Dim Suffix
Dim Buffer
Suffix = vbNullChar
If FirstDigit >= "0" And FirstDigit <= "9" Then
Count = 1
Buffer = FirstDigit
End If
Do Until File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Cp = GetCodePoint
Ch = ChrW(Cp)
Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Count = Count + 1
Buffer = Buffer & Ch
Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do
Case Else
UngetChar Ch
Exit Do
End Select
Loop
Set ReadInteger = NewToken(Buffer, tkIntegerNumber, Suffix)
End Function
Private Function ReadFloat(ByVal FirstDigit)
Dim Ch
Dim Result
Dim FracPart
Set Result = ReadInteger(FirstDigit)
If Result.Suffix = vbNullChar Then
If Not File_.AtEndOfStream Then
Ch = GetChar
If Ch = "." Then
Set FracPart = ReadInteger("")
If FracPart.Text = "" Then Fail "Invalid literal"
Result.Text = Result.Text & "." & FracPart.Text
Result.Kind = tkFloatNumber
Result.Suffix = FracPart.Suffix
Else
UngetChar Ch
End If
End If
End If
Set ReadFloat = Result
End Function
Private Function ReadNumber(ByVal FirstDigit)
Dim Ch
Dim Sg
Dim Result
Dim ExpPart
Sg = vbNullChar
Set Result = ReadFloat(FirstDigit)
If Result.Suffix = vbNullChar Then
If Not File_.AtEndOfStream Then
Ch = GetChar
Select Case Ch
Case "e", "E"
If File_.AtEndOfStream Then
UngetChar Ch
Else
Sg = GetChar
If Sg = "-" Or Sg = "+" Then
Ch = ""
Else
Ch = Sg
Sg = "+"
End If
Set ExpPart = ReadInteger(Ch)
If ExpPart.Text = "" Or ExpPart.Suffix <> vbNullChar Then Fail "Invalid literal"
Result.Text = Result.Text & "E" & Sg & ExpPart.Text
Result.Kind = tkSciNumber
End If
Case Else
UngetChar Ch
End Select
End If
End If
Set ReadNumber = Result
End Function
Private Function ReadAmpersand()
Const Msg = "Invalid literal"
Dim Cp
Dim Ch
Cp = GetCodePoint
Ch = ChrW(Cp)
Select Case Ch
Case "o", "O"
Set ReadAmpersand = ReadOctal
Case "h", "H"
Set ReadAmpersand = ReadHexa
Case Else
UngetChar Ch
Set ReadAmpersand = NewToken("&", tkOperator, vbNullChar)
End Select
End Function
Private Function ReadOctal()
Const MAX_LENGTH = 32
Dim Count
Dim Ch
Dim Suffix
Dim Buffer
Suffix = vbNullChar
Do While Not File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Ch = GetChar
Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7"
Count = Count + 1
Buffer = Buffer & Ch
Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do
Case Else
Exit Do
End Select
Loop
If Count = 0 Then Fail "Invalid literal"
Set ReadOctal = NewToken(Buffer, tkOctalNumber, Suffix)
End Function
Private Function ReadHexa()
Const MAX_LENGTH = 24
Dim Count
Dim Ch
Dim Suffix
Dim Buffer
Suffix = vbNullChar
Do While Not File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Literal too long"
Ch = GetChar
Select Case Ch
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", _
"a", "b", "c", "d", "e", "f", _
"A", "B", "C", "D", "E", "F"
Count = Count + 1
Buffer = Buffer & Ch
Case "%", "&", "^", "@", "!", "#"
Suffix = Ch
Exit Do
Case Else
UnGetChar Ch
Exit Do
End Select
Loop
If Count = 0 Then Fail "Invalid literal"
Set ReadHexa = NewToken(Buffer, tkHexaNumber, Suffix)
End Function
Private Function ReadHash()
Const Msg = "Invalid literal"
Dim Cp
Dim Number
Dim Name
Dim Ch
Dim Token
Rem Let's get the first number.
Set Token = ReadInteger("")
If Token.Text = "" Then
Rem Maybe we have a month name?
Name = ReadMonthName
Select Case UCase(Name)
Case "IF", "ELSEIF", "ELSE", "END", "CONST"
Rem Not a month name, we have a compiler directive instead.
Set ReadHash = NewToken(Name, tkDirective, vbNullChar)
Exit Function
Case ""
Fail Msg
Case Else
Number = ConvertNameToNumber(Name)
If Number = 0 Then
Rem Not a month name, we have a variable file-handle instead.
Rem Hopefully this variable is not named Feb, for instance, otherwise we'll get tricked...
Set ReadHash = NewToken(Name, tkFileHandle, vbNullChar)
Exit Function
End If
Token.Text = CStr(Number)
End Select
End If
Rem Let's get the first separator.
Cp = GetCodePoint
Ch = ChrW(Cp)
If IsLetter(Cp) Or Ch = "," Then
Rem We have a numeric file-handle
UngetChar Ch
Token.Kind = tkFileHandle
Set ReadHash = Token
Exit Function
End If
If Ch = ":" Then
Rem We are reading a time literal.
Name = ReadTime(Token.Text)
Rem Date literal must end with a '#'.
Ch = GetChar
If Ch <> "#" Then Fail Msg
Name = "1899-12-30 " & Name
Set ReadHash = NewToken(Name, tkDateTime, vbNullChar)
Exit Function
End If
Rem We 'll suppose it is a valid separator.
On Error Resume Next
Name = ReadDate(Token.Text, Ch)
If Err.Number Then
Rem It is not a date, but a numeric file handle
On Error GoTo 0
UngetChar Ch
Token.Kind = tkFileHandle
Set ReadHash = Token
Exit Function
End If
On Error GoTo 0
Ch = GetChar
Select Case Ch
Case " "
Rem We may have a date and time literal together.
Set ReadHash = NewToken(ReadTime, tkDateTime, vbNullChar)
If ReadHash.Text = "" Then Fail Msg
ReadHash.Text = Name & " " & ReadHash.Text
Ch = GetChar
If Ch <> "#" Then Fail Msg
Case "#"
Rem Literal does not have a time part. Let's add it.
Set ReadHash = NewToken(Name & " 00:00:00", tkDateTime, vbNullChar)
Case Else
Fail Msg
End Select
End Function
Private Function ReadDate(ByVal FirstNumber, ByVal Separator)
Const Msg = "Invalid literal"
Dim YYYY
Dim MM
Dim DD
Dim N
Dim S
Dim Result
Dim SecondNumber
Dim ThirdNumber
Dim Ch
SecondNumber = ReadInteger("")
If SecondNumber = "" Then Fail Msg
Rem The next separator must match the first one.
Ch = GetChar
If Ch <> Separator Then Fail Msg
ThirdNumber = ReadInteger("")
If ThirdNumber = "" Then Fail Msg
If CInt(FirstNumber) >= 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)
If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If
Rem Validate year.
If YYYY > 9999 Then Fail Msg
Rem Validate month.
If MM < 1 Or MM > 12 Then Fail Msg
Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then Fail Msg
Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then Fail Msg
Else
If DD > 28 Then Fail Msg
End If
Case Else
If DD > 31 Then Fail Msg
End Select
Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"
If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"
If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)
ReadDate = Result
End Function
Private Function ReadTime(ByVal FirstNumber)
Const Msg = "Invalid literal"
Dim HH
Dim NN
Dim SS
Dim Number
Dim Ch
Dim Ch2
Dim AP
AP = vbNullChar
HH = CInt(FirstNumber)
Number = ReadInteger("")
If Number = "" Then Err.Raise 0
NN = CInt(Number)
Ch = GetChar
If Ch = ":" Then
Number = ReadInteger("")
If Number = "" Then Err.Raise 0
SS = CInt(Number)
Else
UngetChar Ch
SS = 0
End If
If Not File_.AtEndOfStream Then
Ch = GetChar
If Ch = " " Then
If Not File_.AtEndOfStream Then
Ch = GetChar
If Ch = "a" Or Ch = "A" Then
Ch2 = GetChar
If Ch2 = "m" Or Ch2 = "M" Then
AP = "A"
Else
UnGetChar Ch2
UnGetChar Ch
UngetChar " "
End If
ElseIf Ch = "p" Or Ch = "P" Then
Ch2 = GetChar
If Ch2 = "m" Or Ch2 = "M" Then
AP = "P"
Else
UnGetChar Ch2
UnGetChar Ch
UngetChar " "
End If
Else
UnGetChar Ch
UngetChar " "
End If
End If
Else
UngetChar Ch
End If
End If
Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then Err.Raise 0
If NN < 0 Or NN > 59 Then Err.Raise 0
If SS < 0 Or SS > 59 Then Err.Raise 0
If AP = "A" Then
If HH = 12 Then HH = 0
ElseIf AP = "P" Then
If HH <> 12 Then HH = HH + 12
End If
Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number
ReadTime = Number
End Function
Private Function ReadMonthName()
Dim Result
Dim Ch
Dim Prv
Do While Not File_.AtEndOfStream
Prv = Ch
Ch = GetChar
Select Case Ch
Case "#", vbLf, ",", ";", ")", " "
UngetChar Ch
Exit Do
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Rem We safely can assume we read two characters more than needed.
UngetChar Ch
UngetChar Prv
Result = Left(Result, Len(Result) - 1)
Exit Do
Case Else
Result = Result & Ch
End Select
Loop
ReadMonthName = Result
End Function
Private Function ConvertNameToNumber(ByVal Name)
Dim Count
Dim Result
Dim MonthName
Dim MonthNames
MonthNames = Array("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
For Each MonthName In MonthNames
Count = Count + 1
If StrComp(Name, MonthName, vbTextCompare) = 0 Then Result = Count
If Result = 0 Then: If StrComp(Name, Left(MonthName, 3), vbTextCompare) = 0 Then Result = Count
If Result <> 0 Then Exit For
Next
ConvertNameToNumber = Result
End Function
Private Function IsSpace(ByVal CodePoint)
Const NULL_CHAR = 0
Const VERTICAL_TAB = 9
Const EOM = 25
Const WHITE_SPACE = 32
Const NO_BREAK_SPACE = 160
Const OGHAM_SPACE_MARK = &H1680
Const MONGOLIAN_VOWEL_SEPARATOR = &H180E
Const EN_QUAD = &H2000
Const HAIR_SPACE = &H200A
Const NARROW_NO_BREAK_SPACE = &H202F
Const MEDIUM_MATHEMATICAL_SPACE = &H205F
Const IDEOGRAPHIC_SPACE = &H3000
Select Case CodePoint
Case NULL_CHAR, WHITE_SPACE, VERTICAL_TAB, EOM, NO_BREAK_SPACE, OGHAM_SPACE_MARK, MONGOLIAN_VOWEL_SEPARATOR, NARROW_NO_BREAK_SPACE, MEDIUM_MATHEMATICAL_SPACE, IDEOGRAPHIC_SPACE
IsSpace = True
Case Else
IsSpace = CodePoint >= EN_QUAD And CodePoint <= HAIR_SPACE
End Select
End Function
Private Function NewToken(ByVal Text, ByVal Kind, ByVal Suffix)
Set NewToken = New Token
With NewToken
.Text = Text
.Kind = Kind
.Suffix = Suffix
.Line = RunningLine_
.Column = FrozenColumn_
End With
End Function
Private Function ReadComment(ByVal Mark)
Const MAX_LENGTH = 1013
Dim Count
Dim Ch
Dim Buffer
Count = Len(Mark)
Buffer = Mark
Do Until File_.AtEndOfStream
If Count = MAX_LENGTH Then Fail "Comment too long"
Ch = GetChar
If Ch = vbLf Then Exit Do
Count = Count + 1
Buffer = Buffer & Ch
Loop
Set ReadComment = NewToken(Buffer, tkComment, vbNullChar)
End Function
Rem The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF.
Private Function IsHighSurrogate(ByVal Character)
IsHighSurrogate = Character >= -10240 And Character <= -9217 Or Character >= 55296 And Character <= 56319
End Function
Rem The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF.
Private Function IsLowSurrogate(ByVal Character)
IsLowSurrogate = Character >= -9216 And Character <= -8193 Or Character >= 56320 And Character <= 57343
End Function
Private Function IsSurrogate(ByVal Character)
IsSurrogate = IsLowSurrogate(Character) Or IsHighSurrogate(Character)
End Function
End Class
Now, all you have to do is copy the code above, put it in a file, and make sure its extension is .VBS.
Then, write some VB code in another file, save it as UCS-2 LE (a.k.a UTF-16 LE), and drag and drop it over the .VBS file.
It will create an .HTML file with colored keywords, comments, and strings.
Or you'll get an unhandled error. Ugh!
Next week, we'll take a break from our transpiler journey.
Andrej Biasic
2020-09-16