Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

Let's build a transpiler! Part 3

This is the third post in a series of building a transpiler. You can find the previous ones here.

Last time we were left to scan a date literal, in any of its formats.
How do we do it if we don't know in which format it will be?
There are at least three ways to do it:
Let's try something and improve it as we go:

Private Function ReadDateTime(ByVal FileHandle As Integer) As String
Dim Result As String
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1

Rem Let's get the first number.
Number = ReadInteger(FileHandle)

If Number = "" Then
Rem Maybe we have a month name?
Rem TODO: Check it.
End If

Rem Let's get the first separator.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar

Rem It cannot be a letter.
If IsLetter(Cp) Then GoTo Fail

If Ch = ":" Then
Rem We are reading a time literal.
Result = ReadTime(FileHandle, Number)

Rem Date literal must end with a '#'.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail

ReadDateTime = "1899-12-30 " & Result
Exit Function
End If

Rem We'll suppose it is a valid separator.
Result = ReadDate(FileHandle, Number, Ch)

GoSub GetChar

Select Case Ch
Case " "
Rem We may have a date and time literal together.
Number = ReadTime(FileHandle)
If Number = "" Then GoTo Fail
Result = Result & " " & Number

If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail

Case "#"
Rem Literal does not have a time part. Let's add it.
Result = Result & " 00:00:00"

Case Else
GoTo Fail
End Select

ReadDateTime = Result
Exit Function

GetChar:
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

Private Function ReadDate( _
ByVal FileHandle As Integer, _
ByVal FirstNumber As String, _
ByVal Separator As String _
) As String
Dim SecondNumber As String
Dim ThirdNumber As String
Dim Cp As Integer
Dim Ch As String * 1

SecondNumber = ReadInteger(FileHandle)
If SecondNumber = "" Then GoTo Fail

Rem The next separator must match the first one.
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
If Ch <> Separator Then GoTo Fail

ThirdNumber = ReadInteger(FileHandle)
If ThirdNumber = "" Then GoTo Fail

Rem TODO: Now we need to figure out which number is which.
Exit Function

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

Private Function ReadTime(ByVal FileHandle As String, Optional ByVal FirstNumber As String) As String
Dim HH As Integer
Dim NN As Integer
Dim SS As Integer
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Pos As Long

On Error GoTo Fail
HH = CInt(FirstNumber)
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
NN = CInt(Number)

GoSub GetChar

If Ch = ":" Then
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
SS = CInt(Number)
Else
GoSub UngetChar
End If

Rem TODO: Deal with AM/PM.

Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then GoTo Fail
If NN < 0 Or NN > 59 Then GoTo Fail
If SS < 0 Or SS > 59 Then GoTo Fail

Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number

Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number

Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number

ReadTime = Number
Exit Function

GetChar:
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar
Return

UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

As you can see, the code is very straightforward, but we have three TODOs. They will complicate the code.
Let's start with the easier one: Deal with AM/PM. The added code is highlighted:

Private Function ReadTime(ByVal FileHandle As String, Optional ByVal FirstNumber As String) As String
Dim HH As Integer
Dim NN As Integer
Dim SS As Integer
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Pos As Long
Dim AP As String * 1

On Error GoTo Fail
HH = CInt(FirstNumber)
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
NN = CInt(Number)

GoSub GetChar

If Ch = ":" Then
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
SS = CInt(Number)
Else
SS = "00"
GoSub UngetChar
End If

If Not EOF(FileHandle) Then
GoSub GetChar

If Ch = " " Then
If Not EOF(FileHandle) Then
GoSub GetChar

If Ch = "a" Or Ch = "A" Then
GoSub GetChar

If Ch = "m" Or Ch = "M" Then
AP = "A"

Else
GoSub UngetChar
GoSub UngetChar
GoSub UngetChar
End If

ElseIf Ch = "p" Or Ch = "P" Then
GoSub GetChar

If Ch = "m" Or Ch = "M" Then
AP = "P"
Else
GoSub UngetChar
GoSub UngetChar
GoSub UngetChar
End If

Else
GoSub UngetChar
GoSub UngetChar
End If
End If
Else
GoSub UngetChar
End If
End If

Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then GoTo Fail
If NN < 0 Or NN > 59 Then GoTo Fail
If SS < 0 Or SS > 59 Then GoTo Fail

If AP = "A" Then
If HH = 12 Then HH = 0

ElseIf AP = "P" Then
If HH <> 12 Then HH = HH + 12
End If

Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number

Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number

Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number

ReadTime = Number
Exit Function

GetChar:
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return

UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

Now, let's attack the second TODO, dealing first with the year-month-date format:

Dim YYYY As Integer
Dim MM As Integer
Dim DD As Integer
Dim Result As String

If CInt(FirstNumber) > 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
Rem TODO: Still need to figure out what is what.
End If

Rem validate year.
If YYYY > 9999 Then GoTo Fail

Rem Validate month.
If MM < 1 Or MM > 12 Then GoTo Fail

Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then GoTo Fail

Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then GoTo Fail
Else
If DD > 28 Then GoTo Fail
End If

Case Else
If DD > 31 Then GoTo Fail
End Select

Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"

If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"

If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)

ReadDate = Result

Not bad, but not good either. We substituted a TODO with code that still has a TODO.
The easiest way to get rid of it is to assume the date is in month/day/year format. So let's do that.

But keep in mind that we may have a two-digit year, instead of a four-digit one.
I'm going with the same solution Windows employs. Maybe you won't agree with my choice.

(...)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)

If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If
(...)

Phew! We're almost there. Now comes the harder part.
We need to read the month name, in full or abbreviated, and convert it to a number.
This will complicate our code even further.
Function to read an arbitrary month name:

Private Function ReadMonthName(ByVal FileHandle As Integer) As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Result As String
Dim Pos As Long

Do While Not EOF(FileHandle)
Get #FileHandle, , Cp
Ch = ToChar(Cp)

Select Case Ch
Case "#", vbCr, vbLf, ",", ";", ")", " "
GoSub UngetChar
Exit Do

Case "0" To "9"
Rem We safely can assume we read two characters more than needed.
GoSub UngetChar
GoSub UngetChar
Result = Left$(Result, Len(Result) - 1)
Exit Do

Case Else
Result = Result & Ch
End Select
Loop

ReadMonthName = Result
Exit Function

UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return
End Function

Function to convert a month name to its number:

Private Function ConvertNameToNumber(ByVal Name As String) As Integer
Dim MonthNames As Variant
Dim MonthName As Variant
Dim Result As Integer
Dim Count As Integer

MonthNames = Array("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

For Each MonthName In MonthNames
Count = Count + 1

If Name = MonthName Then
Result = Count
Exit For
End If
Next

If Result = 0 Then
Count = 0

For Each MonthName In MonthNames
Count = Count + 1

If Name = Left$(MonthName, 3) Then
Result = Count
Exit For
End If
Next
End If

ConvertNameToNumber = Result
End Function

Finally, let's adapt the old functions to use the new ones.

Private Function ReadDateTime(ByVal FileHandle As Integer) As String
Dim Result As String
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim N As Integer
Dim S As String

Rem Let's get the first number.
Number = ReadInteger(FileHandle)

If Number = "" Then
Rem Maybe we have a month name?
S = ReadMonthName(FileHandle)
If S = "" Then GoTo Fail

N = ConvertNameToNumber(S)
If N = 0 Then GoTo Fail

Number = CStr(N)
End If

Rem Let's get the first separator.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar

Rem It cannot be a letter.
If IsLetter(Cp) Then GoTo Fail

If Ch = ":" Then
Rem We are reading a time literal.
Result = ReadTime(FileHandle, Number)

Rem Date literal must end with a '#'.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail

ReadDateTime = "1899-12-30 " & Result
Exit Function
End If

Rem We'll suppose it is a valid separator.
Result = ReadDate(FileHandle, Number, Ch)

GoSub GetChar

Select Case Ch
Case " "
Rem We may have a date and time literal together.
Number = ReadTime(FileHandle)
If Number = "" Then GoTo Fail
Result = Result & " " & Number

If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail

Case "#"
Rem Literal does not have a time part. Let's add it.
Result = Result & " 00:00:00"

Case Else
GoTo Fail
End Select

ReadDateTime = Result
Exit Function

GetChar:
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

Private Function ReadDate( _
ByVal FileHandler As Integer, _
ByVal FirstNumber As String, _
ByVal Separator As String _
) As String
Dim SecondNumber As String
Dim ThirdNumber As String
Dim Cp As Integer
Dim Ch As String * 1
Dim YYYY As Integer
Dim MM As Integer
Dim DD As Integer
Dim Result As String

SecondNumber = ReadInteger(FileHandle)
If SecondNumber = "" Then GoTo Fail

Rem The next separator must match the first one.
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
If Ch <> Separator Then GoTo Fail

ThirdNumber = ReadInteger(FileHandle)
If ThirdNumber = "" Then GoTo Fail

If CInt(FirstNumber) >= 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)

If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If

Rem Validate year.
If YYYY > 9999 Then GoTo Fail

Rem Validate month.
If MM < 1 Or MM > 12 Then GoTo Fail

Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then GoTo Fail

Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then GoTo Fail
Else
If DD > 28 Then GoTo Fail
End If

Case Else
If DD > 31 Then GoTo Fail
End Select

Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"

If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"

If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)

ReadDate = Result
Exit Function

Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function

Note: Don't mind the repetitiveness of the GetChar, UngetChar, and Fail. We'll refactor them later.
Also, we cannot immediately infer that we are going to read a date when we first see a "#". It can be a filehandle.
We'll need to refactor these functions too to account for that.

Next week, operators.

Andrej Biasic
2020-07-29