Past, present, and future
Last time I said we'd take a look at our transpiler's past, present, and future.Past
In which I lay down what I intended to do but did not as of now, and shortfallings I've seen so far.- I think it would be a good idea to collect all Scanner messages and move them to the Messages module too, as I did to Parser's messages.
- Decimal has no type declaration character. As all good ones were already taken, I've been musing the idea of using the plus sign ("+") to fill this role.
It would be not worst than MS's choice to use the circumflex sign ("^") - that's the power operator - as the type declaration character to LongLong / LongPtr. - The algorithm for ReadBin, ReadOctal, and ReadHexa is the same over and over again, changing only the characters that are allowed. I could come up with one-code-to-rule-them-all and parameterize the allowed characters. It would be (probably) more complex, so I'm not sure about the gains if any.
- So far we have four places where an object's member can be null/Nothing:
- There's no condition to be checked in an infinite Do/Loop, so DoConstruct's Condition member is left unset.
- In a Case Is (...), we instantiate a BinaryConstruct with an empty LHS member.
- Const.DataType may be Nothing sometimes because we cannot infer data types yet.
- Identifier.Project member.
- Expressionist's GetExpression method will gladly accept ByVal as an operator in any context, in any place, and as many times as provided.
This is not how it is supposed to be used. ByVal as an operator should only appear in Declare calls and once as the only operator to a variable name.
I'm yet to go back there and enforce this rule. - All our source files have been encoded in UTF-16.
I chose it because VB6's strings are encoded in it, but thinking over it now, I suppose we could have allowed UTF-8 too.
We would need to move the I/O parts to a new class, UTF16Reader, and create a UTF8Reader with the same API.
Then we would have to read a flag in the command line arguments to select the right one. It is doable. - This is a follow-up to a comment of mine in part 23 ("there's a third shortcoming I've not dealt with yet."): Our pretty-printer does not reconstitute the code faithfully. Notably, inline-comments (those that come between a backtick and a tick) are lost.
This is because we discard them right after reading them. I plan to add a Comments property to the Token class and fill it with any found inline-comments.
I haven't done it yet because it is a lot of trouble to minimum benefit. But I did not give up this idea yet. - We have identifiers and escaped identifiers. Escaped identifiers may be "crazy" escaped identifiers - the ones that are only allowed as Enum members.
To check if an escaped identifier is "crazy", we need to parse it - again! We have already parsed it before! This is dumb; what I should have done is introduce a tkCrazyIdentifier and mark crazy identifiers with it right after parsing them. I still intend to do it. - I'm yet to do something about the cases I mentioned in part 9: When we have a float number with an integer type character declaration, we should convert it to an integer.
It probably would be a lot of code, and I'm not ready to do it yet.
Present
In which I dump some statistics of our incomplete project.- So far we have 9187 LOC, including blank lines and comments.
- It takes around 6 seconds to transpile/revert its code in my machine.
- There are 87 classes and 7 modules.
- Twenty-five of these classes are skeletons. They are not complete yet.
- The most lengthy classes are Reverter, with 1072 LOC, Scanner, with 1353 LOC, and Parser, with 2576.
- The most nested procedure is NextToken, with 13 physical levels (9 logical levels.)
- NextToken has a cyclomatic complexity of 119 if I understood how to count it right.
- Parser has 45 methods but is not the class having the greatest number of methods (yet?) This distinction goes to Reverter and its 60 methods.
- The longest cluster of blue words (regular and contextual keywords) used is For Binary Access Read Write As.
Let's make it a challenge! Can you come up with a longer cluster of blue words? My longest one comes to ten... - Our version of Visual Basic has 154 blue words. Thirteen of them are new.
- We're dealing with 41 operators - not counting the 16 compound ones - having 19 precedence levels. That's a lot...
- CallConstruct is the only class to implement both IStmt and IExpression interfaces.
Future
In which I go crazy and start daydreaming about features I would like to bring to light when we're done with transpiling.- The first of such features would be Select Case Sensitive/Insensitive.
If would have helped in ReadHash. Instead of
Select Case UCase$(Name)
Case UCase$(vIf), UCase$(vElseIf), UCase$(vElse), UCase$(vEnd), UCase$(vConst)
we would have
Select Case Insensitive Name
Case vIf, vElseIf, vElse, vEnd, vConst
-
Another variation would be comparing types:
Select Case TypeOf Expr
Case Is Literal, Is Symbol, Is UnaryExpression
(...)
Case Is BinaryExpression
(...)
Case Else
(...)
End Select
Or better yet:
Select Case TypeOf Expr
Case Is Literal, Is Symbol, Is UnaryExpression
(...)
Case Is Bin As BinaryExpression
If Bin.Operator.Value.IsOperator(opTo) Then
(...)
Case Else
(...)
End Select
Or even better:
Select Case TypeOf Expr
(...)
Case Is Bin As BinaryExpression When Bin.Operator.Value.IsOperator(opTo)
(...)
End Select
-
In VB.NET, sometimes I need to do the following:
While exception IsNot Nothing
(...)
exception = exception.InnerException
End While
Sometimes I wonder if I would like to do this instead:
While exception IsNot Nothing
(...)
exception .= InnerException
End While
After all, that dot is only an operator... right?
(By the way, did you notice we supported both Wend and End While in the previous post's ParseWhile?)
-
If you have a ParamArray and need to pass it to another procedure's ParamArray, that second procedure will get it as a single array parameter:
A 1, 2, 3
(...)
Sub A(ParamArray Args())
B Args
End Sub
Sub B(ParamArray Args())
Dim Arg As Variant
For Each Arg In Args
Debug.Print TypeName(Arg); " "; 'This will iterate only once and print "Variant() "
Next
End Sub
I'm not sure if it is doable, but would like to have a Spread operator:
A 1, 2, 3
(...)
Sub A(ParamArray Args())
B Spread Args
End Sub
Sub B(ParamArray Args())
Dim Arg As Variant
For Each Arg In Args
Debug.Print TypeName(Arg); " "; 'This will iterate three times and print "Integer Integer Integer"
Next
End Sub
-
It is a well-know issue that when accumulating function calls, one has to read them from inside out.
Example:
s = LCase$(Hex$(CInt(Trim$(a))))
If we had two new operators, |> (Ceci n'est pas une pipe operator) and -> (left to right assignment) we could write the same as
a |> Trim$ |> CInt |> Hex$ |> LCase$ -> s
which is much more readable. Unfortunately it is so only in simple cases, where the function receives only one parameter.
If it needs more, then I'm not sure if I like the syntax below, using ">|" as a placeholder.
a |> Trim$ |> Left$(>|, 1) |> UCase$ -> s
-
Finally, I find it elegant and compact to return from a procedure using the Return keyword.
In VB6, however, it is already taken and used to return from a GoSub call.
I experimented with several other keyword candidates, like Break, Quit, Yield, and Unflow, but none of them felt "right."
Recently I've been finding that Exit may be the one.
What do you think?
Rem We would be able to change the following block of code...
If Token.Kind = tkIdentifier Then
IsProperId = True
Exit Function
End If
Rem .. with this one:
If Token.Kind = tkIdentifier Then Exit True
Andrej Biasic
2021-04-21