Let's build a transpiler! Part 12
This is the twelfth post in a series of building a transpiler (and one of the longest. Be warned!)You can find the previous ones here.
Last time I said we would deal with contextual keywords. As you may know by now, they are identifiers that have special meaning in certain contexts.
For instance, you can have a Step variable and use it in a For/Next loop like this:
Dim Step As Integer
(...)
For X = 1 To 100 Step Step
The code I came out with is so convoluted I'll have to present it to you piece by piece.
Let's deal with the easy parts first:
As I said previously, we'll have to tell the "binary" dot operator apart from the "unary" one.
When we see A.B in VB, the dot means B is a member of A, and A can be a function call or a Get property returning a class or a type, or class or a type variable or argument.
But if we get A .B instead, it means B is a member of whatever is in a previous With statement, and A is either a sub, a function, or a Declare call.
So, that space between the A and the dot changes things a bit.
By the way, the same happens to the bang operator ("!").
To deal with it, we'll have a Static LastToken variable. After doing all the processing in the current token, we'll set LastToken to it, so next time, when processing a new token and it happens to be a dot or a bang, we can see if it was preceded by a space.
If this is the case, then we'll change that token's text to "~." or "~!" respectively.
Public Function TokenFrom(ByVal Scanner As Scanner) As Token
Static LastToken As Token
Dim Token As Token
Set Token = Scanner.GetToken
Select Case Token.Kind
Case tkOperator
If LastToken.Kind = tkWhiteSpace Then
If Token.Text = "." Then
Token.Text = "~."
ElseIf Token.Text = "!" Then
Token.Text = "~!"
End If
End If
End Select
Set LastToken = Token
Set TokenFrom = Token
End Function
As we are talking about dots and bangs, they play a role in fixing another glitch we have in our Scanner: We cannot have a keyword after a dot or a bang.
We'll do something similar to what we have done above: We'll add a static Downgrade variable and anytime we see a bang or a dot, we'll set it to True.
Whenever we get a keyword, we'll check if we have to "downgrade" it to a regular identifier.
Let's adapt the code above to that:
Public Function TokenFrom(ByVal Scanner As Scanner) As Token
Static LastToken As Token
Static Downgrade As Boolean
Dim Token As Token
Set Token = Scanner.GetToken
Select Case Token.Kind
Case tkOperator
Downgrade = Token.Text = "." Or Token.Text = "!"
If LastToken.Kind = tkWhiteSpace Then
If Token.Text = "." Then
Token.Text = "~."
ElseIf Token.Text = "!" Then
Token.Text = "~!"
End If
End If
Case tkKeyword
If Downgrade Then
Downgrade = False
Token.Kind = tkIdentifier
End If
Case Else
Downgrade = False
End Select
Set LastToken = Token
Set TokenFrom = Token
End Function
So far, so good. Now, let's deal with another kind of ambiguity: String and Date identifiers can be either data types or function calls.
Here is an example:
Dim S As String
Dim D As Date
S = String(50, "@")
D = Date
How can we detect it? I first tried going with "If it is followed by an opening parenthesis, then it is a function call", but got bitten by the following situation:
Function Abcd() As String()
It went wrong. So, we'll have to go with "If there's a preceding As then it is a keyword, otherwise it is a regular identifier."
By now you know the drill: Declare a static variable (WasAs), set it to True when we see an As keyword, and check it when we get a Date or String keyword to decide whether that token will remain a keyword or will be demoted to a regular identifier.
We just need to ensure to set WasAs back to False if what we got is not a Date or String.
Public Function TokenFrom(ByVal Scanner As Scanner) As Token
Static LastToken As Token
Static Downgrade As Boolean
Static WasAs As Boolean
Dim Token As Token
Set Token = Scanner.GetToken
Select Case Token.Kind
Case tkOperator
WasAs = False
Downgrade = Token.Text = "." Or Token.Text = "!"
If LastToken.Kind = tkWhiteSpace Then
If Token.Text = "." Then Token.Text = "~."
ElseIf Token.Text = "!" Then Token.Text = "~!"
End If
Case tkKeyword
If Downgrade Then
Downgrade = False
Token.Kind = tkIdentifier
Else
Select Case Token.Text
Case "As"
WasAs = True
Case "Date", "String"
If Not WasAs Then Token.Kind = tkIdentifier
End Select
End If
Case tkSoftLineBreak, tkHardLineBreak
WasAs = False
Case Else
Downgrade = False
End Select
Set LastToken = Token
Set TokenFrom = Token
End Function
The Declare statement has two contextual keywords, Lib and Alias.
We'll have a static variable (State) that will flag when we reach a Declare token. Then we will change it as we walk along with the statement and meet its contextual keywords.
If State is not set, then we know Lib and Alias are regular identifiers.
Here it is the code we'll insert in TokenFrom function:
Private Enum NarrowContext
NoContext
DeclareContext
DeclareLibContext
DeclareAliasContext
End Enum
Static State As NarrowContext
Dim Upgrade As Boolean
Dim Revoke As Boolean
Rem Inside "Select Case Token.Text":
Case "Declare"
If State = NoContext Then State = DeclareContext
Rem Inside "Select Case State":
Case DeclareContext
Upgrade = Token.Text = "PtrSafe"
If Upgrade Then
State = DeclareLibContext
ElseIf Not Upgrade Then
Upgrade = Token.Text = "Lib"
If Upgrade Then State = DeclareAliasContext
End If
Case DeclareLibContext
Upgrade = Token.Text = "Lib"
If Upgrade Then State = DeclareAliasContext
Case DeclareAliasContext
Upgrade = Token.Text = "Alias"
Revoke = True
Rem Below "End Select":
If Upgrade Then
Token.Kind = tkKeyword
If Revoke Then State = NoContext
End If
As already mentioned, For has the Step contextual keyword:
Private Enum NarrowContext
NoContext
DeclareContext
DeclareLibContext
DeclareAliasContext
ForNextContext
ForToContext
End Enum
Rem Inside "Select Case Token.Text":
Case "For"
If State = NoContext Then State = ForNextContext
Case "To"
If State = ForNextContext Then State = ForToContext
Rem Inside "Select Case State":
Case ForToContext
Upgrade = Token.Text = "Step"
Revoke = True
Options have some contextuals, too:
Private Enum NarrowContext
NoContext
DeclareContext
DeclareLibContext
DeclareAliasContext
ForNextContext
ForToContext
OptionContext
OptionCompareContext
End Enum
Rem Inside "Select Case Token.Text":
Case "Option"
If State = NoContext Then State = OptionContext
Rem Inside "Select Case State":
Case OptionContext
Upgrade = Token.Text = "Base"
If Not Upgrade Then Upgrade = Token.Text = "Explicit"
If Not Upgrade Then
Upgrade = Token.Text = "Compare"
If Upgrade Then State = OptionCompareContext
End If
Case OptionCompareContext
Upgrade = Token.Text = "Binary"
If Not Upgrade Then Upgrade = Token.Text = "Text"
If State = NoContext Then State = OptionContext
Error can be a keyword (On Error ...) or a regular identifier:
Rem Inside "Select Case Token.Text":
Case "On"
If State = NoContext Then State = OnContext
Rem Inside "Select Case State":
Case OnContext
Upgrade = Token.Text = "Error"
Revoke = True
Line, too, can be a keyword or a contextual, but we will need to read the next token to be able to decide whether Line is a keyword or not.
So, we'll have one more static variable (NextToken) to hold the next token, and we'll read it and clear it when needed:
Static NextToken As Token
If NextToken Is Nothing Then
Set Token = Scanner.GetToken
Else
Set Token = NextToken
Set NextToken = Nothing
End If
Rem Inside "Select Case State":
Case NoContext
Select Case Token.Text
Case "Line"
Set NextToken = Scanner.GetToken
Upgrade = NextToken.Kind = tkKeyword And NextToken.Text = "Input"
End Select
Let's deal with Name and Reset:
Rem Inside "Select Case Token.Text" that's inside "Case NoContext":
Case "Name", "Reset"
Set NextToken = Scanner.GetToken
Upgrade = Right$(NextToken.Text, 1) <> "="
If Upgrade Then Upgrade = SpareToken_.Kind <> tkKeyword Or SpareToken_.Text <> "As"
If Upgrade Then Upgrade = SpareToken_.Kind <> tkOperator
If Upgrade Then Upgrade = Not IsEndOfContext(SpareToken_)
And deal with Width:
Rem Inside "Select Case Token.Text" that's inside "Case NoContext":
Case "Width"
Set NextToken = Scanner.GetToken
Upgrade = NextToken.Kind = tkFileHandle
Now it comes the hard part... There's a bunch of non-keywords that act as kinda keywords in the Open statement.
There are also keywords there, but they serve a different purpose, like Write, for instance.
The Open statement is so complex that I'll draw a diagram to let you know how it must be parsed:
(Thanks adrian.ancona's post about Graphviz.)
Afraid yet?
Note that red words are contextual keywords, while blue ones are proper keywords.
Based on the diagram above, here are the rules to change state when faced with an Open statement:
- When we get an Open keyword, we change it to [Next Keyword Is For].
- When seeing a For keyword, we change State to [Next Keyword Is Input | Next Identifier Is Append, Binary, Output, or Random].
- If we get an Input keyword, we change State to [Next Keyword Is As or Shared | Next Identifier Is Access].
- If we get an Append, a Binary, an Output, or a Random identifier, we turn it into a keyword and change State to the same as above.
- Next, if we get an Access identifier, we turn it into a keyword and change State to [Next Keyword Is Access/Write | Next Identifier Is Access/Read].
- If we get a Read identifier, we turn it into a keyword and change State to [Next Keyword Is Access/Write, Lock, As, or Shared].
- If we get a Write keyword, see the second line below.
- If we get a Lock keyword or an Access identifier, see the remaining steps below.
- If we get a Write keyword, we change State to [Next Keyword Is Lock, As, or Shared].
- If we get a Lock keyword, we change State to [Next Keyword Is Lock/Write | Next Identifier Is Lock/Read].
- If we get a Read identifier, we turn it into a keyword and change State to [Next Keyword Is Lock/Write or As].
- If we get a Write keyword, we change State to [Next Keyword Is As].
- If we get a Shared keyword, we change State to [Next Keyword Is As].
- If we get an As keyword, we change State to [Next Token Is Filehandle].
- If we get a filehandle, we change State to [Next Identifier Is Len].
- If we get a Len identifier, we turn it into a keyword and change State to NoState.
- Whenever we get a Then or an Else keyword, or a soft or hard line-break, we change State to NoState.
Added code is highlighted below.
Private Enum NarrowContext
NoContext
DeclareContext
DeclareLibContext
DeclareAliasContext
ForNextContext
ForToContext
OptionContext
OptionCompareContext
OnContext
[Next Keyword Is For]
[Next Keyword Is Input | Next Identifier Is Append, Binary, Output, or Random]
[Next Keyword Is As or Shared | Next Identifier Is Access]
[Next Keyword Is Access/Write | Next Identifier Is Access/Read]
[Next Keyword Is Access/Write, Lock, As, or Shared]
[Next Keyword Is Lock, As, or Shared]
[Next Keyword Is Lock/Write | Next Identifier Is Lock/Read]
[Next Keyword Is Lock/Write or As]
[Next Keyword Is As]
[Next Token Is Filehandle]
[Next Identifier Is Len]
End Enum
Public Function TokenFrom(ByVal Scanner As Scanner) As Token
Static Downgrade As Boolean
Static WasAs As Boolean
Static LastToken As Token
Static State As NarrowContext
Static NextToken As Token
Dim Upgrade As Boolean
Dim Revoke As Boolean
Dim Token As Token
If NextToken Is Nothing Then
Set Token = Scanner.GetToken
Else
Set Token = NextToken
Set NextToken = Nothing
End If
If IsEndOfContext(Token) Then
State = NoContext
Else
Select Case Token.Kind
Case tkOperator
WasAs = False
Downgrade = Token.Text = "." Or Token.Text = "!"
If LastToken.Kind = tkWhiteSpace Then
If Token.Text = "." Then
Token.Text = "~."
ElseIf Token.Text = "!" Then
Token.Text = "~!"
End If
End If
Case tkKeyword
If Downgrade Then
Downgrade = False
Token.Kind = tkIdentifier
Else
Select Case Token.Text
Case "As"
WasAs = True
Select Case State
Case [Next Keyword Is As or Shared | Next Identifier Is Access], _
[Next Keyword Is Access/Write, Lock, As, or Shared], _
[Next Keyword Is Lock, As, or Shared], _
[Next Keyword Is Lock/Write or As], _
[Next Keyword Is As]
State = [Next Token Is Filehandle]
End Select
Case "Date", "String"
If Not WasAs Then Token.Kind = tkIdentifier
Case "Declare"
If State = NoContext Then State = DeclareContext
Case "For"
If State = NoContext Then
State = ForNextContext
ElseIf State = [Next Keyword Is For] Then
State = [Next Keyword Is Input | Next Identifier Is Append, Binary, Output, or Random]
End If
Case "Input"
If State = [Next Keyword Is Input | Next Identifier Is Append, Binary, Output, or Random] Then
State = [Next Keyword Is As or Shared | Next Identifier Is Access]
End If
Case "Lock"
Select Case State
Case [Next Keyword Is Access/Write, Lock, As, or Shared], _
[Next Keyword Is Lock, As, or Shared]
State = [Next Keyword Is Lock/Write | Next Identifier Is Lock/Read]
End Select
Case "Open"
If State = NoContext Then State = [Next Keyword Is For]
Case "Option"
If State = NoContext Then State = OptionContext
Case "On"
If State = NoContext Then State = OnContext
Case "To"
If State = ForNextContext Then State = ForToContext
Case "Shared"
Select Case State
Case [Next Keyword Is As or Shared | Next Identifier Is Access], _
[Next Keyword Is Access/Write | Next Identifier Is Access/Read], _
[Next Keyword Is Lock, As, or Shared]
State = [Next Keyword Is As]
End Select
Case "Write"
Select Case State
Case [Next Keyword Is Access/Write | Next Identifier Is Access/Read], _
[Next Keyword Is Access/Write, Lock, As, or Shared]
State = [Next Keyword Is Lock, As, or Shared]
Case [Next Keyword Is Lock/Write | Next Identifier Is Lock/Read], _
[Next Keyword Is Lock/Write or As]
State = [Next Keyword Is As]
End Select
End Select
End If
Case tkIdentifier
Downgrade = False
WasAs = False
Select Case State
Case NoContext
Select Case Token.Text
Case "Line"
Set NextToken = Scanner.GetToken
Upgrade = NextToken.Kind = tkKeyword And NextToken.Text = "Input"
Case "Name"
Set NextToken = Scanner.GetToken
Upgrade = Right$(NextToken.Text, 1) <> "="
Case "Reset"
Set NextToken = Scanner.GetToken
Upgrade = IsEndOfContext(NextToken)
Case "Width"
Set NextToken = Scanner.GetToken
Upgrade = NextToken.Kind = tkFileHandle
End Select
Case OptionContext
Upgrade = Token.Text = "Base"
If Not Upgrade Then Upgrade = Token.Text = "Explicit"
If Not Upgrade Then
Upgrade = Token.Text = "Compare"
If Upgrade Then State = OptionCompareContext
End If
Case OptionCompareContext
Upgrade = Token.Text = "Binary"
If Not Upgrade Then Upgrade = Token.Text = "Text"
Case DeclareContext
Upgrade = Token.Text = "PtrSafe"
If Upgrade Then
State = DeclareLibContext
ElseIf Not Upgrade Then
Upgrade = Token.Text = "Lib"
If Upgrade Then State = DeclareAliasContext
End If
Case DeclareLibContext
Upgrade = Token.Text = "Lib"
If Upgrade Then State = DeclareAliasContext
Case DeclareAliasContext
Upgrade = Token.Text = "Alias"
Revoke = True
Case ForToContext
Upgrade = Token.Text = "Step"
Revoke = True
Case OnContext
Upgrade = Token.Text = "Error"
Revoke = True
Case [Next Keyword Is Input | Next Identifier Is Append, Binary, Output, or Random]
Upgrade = Token.Text = "Append"
If Not Upgrade Then Upgrade = Token.Text = "Binary"
If Not Upgrade Then Upgrade = Token.Text = "Output"
If Not Upgrade Then Upgrade = Token.Text = "Random"
State = [Next Keyword Is As or Shared | Next Identifier Is Access]
Case [Next Keyword Is As or Shared | Next Identifier Is Access]
Upgrade = Token.Text = "Access"
If Upgrade Then
State = [Next Keyword Is Access/Write | Next Identifier Is Access/Read]
Else
Upgrade = Token.Text = "Shared"
If Upgrade Then State = [Next Keyword Is As]
End If
Case [Next Keyword Is Access/Write, Lock, As, or Shared], _
[Next Keyword Is Lock, As, or Shared]
Upgrade = Token.Text = "Shared"
If Upgrade Then State = [Next Keyword Is As]
Case [Next Keyword Is Access/Write | Next Identifier Is Access/Read]
Upgrade = Token.Text = "Read"
If Upgrade Then State = [Next Keyword Is Access/Write, Lock, As, or Shared]
Case [Next Keyword Is Lock/Write | Next Identifier Is Lock/Read]
Upgrade = Token.Text = "Read"
If Upgrade Then State = [Next Keyword Is Lock/Write or As]
Case [Next Identifier Is Len]
Upgrade = Token.Text = "Len"
Revoke = True
End Select
Case tkFileHandle
If State = [Next Token Is Filehandle] Then State = [Next Identifier Is Len]
Case tkSoftLineBreak, tkHardLineBreak
WasAs = False
End Select
If Upgrade Then
Token.Kind = tkKeyword
If Revoke Then State = NoContext
End If
End If
Set LastToken = Token
Set TokenFrom = Token
End Function
Private Function IsEndOfContext(ByVal Token As Token) As Boolean
Dim Result As Boolean
Result = Token.Kind = tkSoftLineBreak
If Not Result Then Result = Token.Kind = tkHardLineBreak
If Not Result Then Result = Token.Kind = tkRightParenthesis
If Not Result Then Result = Token.Kind = tkListSeparator
If Not Result Then Result = Token.Kind = tkPrintSeparator
If Not Result And Token.Kind = tkKeyword Then
Result = Token.Text = "Then"
If Not Result Then Result = Token.Text = "Else"
End If
IsEndOfContext = Result
End Function
Next week we'll polish some things and fix some omissions.
Andrej Biasic
2020-10-07