Let's build a transpiler! Part 3
This is the third post in a series of building a transpiler. You can find the previous ones here.Last time we were left to scan a date literal, in any of its formats.
How do we do it if we don't know in which format it will be?
There are at least three ways to do it:
- You can take it as it is always in the same format, like month/day/year
- You can read it and try to figure out what part is the day, what is the month, and what is the year. Try this with 01/02/03...
- You may have a default format that gives you a hint of what are the positions for day, month, and year.
Private Function ReadDateTime(ByVal FileHandle As Integer) As String
Dim Result As String
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Rem Let's get the first number.
Number = ReadInteger(FileHandle)
If Number = "" Then
Rem Maybe we have a month name?
Rem TODO: Check it.
End If
Rem Let's get the first separator.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
Rem It cannot be a letter.
If IsLetter(Cp) Then GoTo Fail
If Ch = ":" Then
Rem We are reading a time literal.
Result = ReadTime(FileHandle, Number)
Rem Date literal must end with a '#'.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail
ReadDateTime = "1899-12-30 " & Result
Exit Function
End If
Rem We'll suppose it is a valid separator.
Result = ReadDate(FileHandle, Number, Ch)
GoSub GetChar
Select Case Ch
Case " "
Rem We may have a date and time literal together.
Number = ReadTime(FileHandle)
If Number = "" Then GoTo Fail
Result = Result & " " & Number
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail
Case "#"
Rem Literal does not have a time part. Let's add it.
Result = Result & " 00:00:00"
Case Else
GoTo Fail
End Select
ReadDateTime = Result
Exit Function
GetChar:
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
Private Function ReadDate( _
ByVal FileHandle As Integer, _
ByVal FirstNumber As String, _
ByVal Separator As String _
) As String
Dim SecondNumber As String
Dim ThirdNumber As String
Dim Cp As Integer
Dim Ch As String * 1
SecondNumber = ReadInteger(FileHandle)
If SecondNumber = "" Then GoTo Fail
Rem The next separator must match the first one.
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
If Ch <> Separator Then GoTo Fail
ThirdNumber = ReadInteger(FileHandle)
If ThirdNumber = "" Then GoTo Fail
Rem TODO: Now we need to figure out which number is which.
Exit Function
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
Private Function ReadTime(ByVal FileHandle As String, Optional ByVal FirstNumber As String) As String
Dim HH As Integer
Dim NN As Integer
Dim SS As Integer
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Pos As Long
On Error GoTo Fail
HH = CInt(FirstNumber)
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
NN = CInt(Number)
GoSub GetChar
If Ch = ":" Then
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
SS = CInt(Number)
Else
GoSub UngetChar
End If
Rem TODO: Deal with AM/PM.
Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then GoTo Fail
If NN < 0 Or NN > 59 Then GoTo Fail
If SS < 0 Or SS > 59 Then GoTo Fail
Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number
ReadTime = Number
Exit Function
GetChar:
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar
Return
UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
As you can see, the code is very straightforward, but we have three TODOs. They will complicate the code.
Let's start with the easier one: Deal with AM/PM. The added code is highlighted:
Private Function ReadTime(ByVal FileHandle As String, Optional ByVal FirstNumber As String) As String
Dim HH As Integer
Dim NN As Integer
Dim SS As Integer
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Pos As Long
Dim AP As String * 1
On Error GoTo Fail
HH = CInt(FirstNumber)
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
NN = CInt(Number)
GoSub GetChar
If Ch = ":" Then
Number = ReadInteger(FileHandle)
If Number = "" Then GoTo Fail
SS = CInt(Number)
Else
SS = "00"
GoSub UngetChar
End If
If Not EOF(FileHandle) Then
GoSub GetChar
If Ch = " " Then
If Not EOF(FileHandle) Then
GoSub GetChar
If Ch = "a" Or Ch = "A" Then
GoSub GetChar
If Ch = "m" Or Ch = "M" Then
AP = "A"
Else
GoSub UngetChar
GoSub UngetChar
GoSub UngetChar
End If
ElseIf Ch = "p" Or Ch = "P" Then
GoSub GetChar
If Ch = "m" Or Ch = "M" Then
AP = "P"
Else
GoSub UngetChar
GoSub UngetChar
GoSub UngetChar
End If
Else
GoSub UngetChar
GoSub UngetChar
End If
End If
Else
GoSub UngetChar
End If
End If
Rem Validate hour, minute, and second.
If HH < 0 Or HH > 23 Then GoTo Fail
If NN < 0 Or NN > 59 Then GoTo Fail
If SS < 0 Or SS > 59 Then GoTo Fail
If AP = "A" Then
If HH = 12 Then HH = 0
ElseIf AP = "P" Then
If HH <> 12 Then HH = HH + 12
End If
Rem Put it together in HH:NN:SS format.
Number = CStr(SS)
If SS < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(NN) & Number
If NN < 10 Then Number = "0" & Number
Number = ":" & Number
Number = CStr(HH) & Number
If HH < 10 Then Number = "0" & Number
ReadTime = Number
Exit Function
GetChar:
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return
UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
Now, let's attack the second TODO, dealing first with the year-month-date format:
Dim YYYY As Integer
Dim MM As Integer
Dim DD As Integer
Dim Result As String
If CInt(FirstNumber) > 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
Rem TODO: Still need to figure out what is what.
End If
Rem validate year.
If YYYY > 9999 Then GoTo Fail
Rem Validate month.
If MM < 1 Or MM > 12 Then GoTo Fail
Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then GoTo Fail
Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then GoTo Fail
Else
If DD > 28 Then GoTo Fail
End If
Case Else
If DD > 31 Then GoTo Fail
End Select
Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"
If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"
If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)
ReadDate = Result
Not bad, but not good either. We substituted a TODO with code that still has a TODO.
The easiest way to get rid of it is to assume the date is in month/day/year format. So let's do that.
But keep in mind that we may have a two-digit year, instead of a four-digit one.
I'm going with the same solution Windows employs. Maybe you won't agree with my choice.
(...)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)
If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If
(...)
Phew! We're almost there. Now comes the harder part.
We need to read the month name, in full or abbreviated, and convert it to a number.
This will complicate our code even further.
Function to read an arbitrary month name:
Private Function ReadMonthName(ByVal FileHandle As Integer) As String
Dim Cp As Integer
Dim Ch As String * 1
Dim Result As String
Dim Pos As Long
Do While Not EOF(FileHandle)
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Select Case Ch
Case "#", vbCr, vbLf, ",", ";", ")", " "
GoSub UngetChar
Exit Do
Case "0" To "9"
Rem We safely can assume we read two characters more than needed.
GoSub UngetChar
GoSub UngetChar
Result = Left$(Result, Len(Result) - 1)
Exit Do
Case Else
Result = Result & Ch
End Select
Loop
ReadMonthName = Result
Exit Function
UngetChar:
Pos = Seek(FileHandle)
Seek #FileHandle, Pos - 2
Return
End Function
Function to convert a month name to its number:
Private Function ConvertNameToNumber(ByVal Name As String) As Integer
Dim MonthNames As Variant
Dim MonthName As Variant
Dim Result As Integer
Dim Count As Integer
MonthNames = Array("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
For Each MonthName In MonthNames
Count = Count + 1
If Name = MonthName Then
Result = Count
Exit For
End If
Next
If Result = 0 Then
Count = 0
For Each MonthName In MonthNames
Count = Count + 1
If Name = Left$(MonthName, 3) Then
Result = Count
Exit For
End If
Next
End If
ConvertNameToNumber = Result
End Function
Finally, let's adapt the old functions to use the new ones.
Private Function ReadDateTime(ByVal FileHandle As Integer) As String
Dim Result As String
Dim Number As String
Dim Cp As Integer
Dim Ch As String * 1
Dim N As Integer
Dim S As String
Rem Let's get the first number.
Number = ReadInteger(FileHandle)
If Number = "" Then
Rem Maybe we have a month name?
S = ReadMonthName(FileHandle)
If S = "" Then GoTo Fail
N = ConvertNameToNumber(S)
If N = 0 Then GoTo Fail
Number = CStr(N)
End If
Rem Let's get the first separator.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
Rem It cannot be a letter.
If IsLetter(Cp) Then GoTo Fail
If Ch = ":" Then
Rem We are reading a time literal.
Result = ReadTime(FileHandle, Number)
Rem Date literal must end with a '#'.
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail
ReadDateTime = "1899-12-30 " & Result
Exit Function
End If
Rem We'll suppose it is a valid separator.
Result = ReadDate(FileHandle, Number, Ch)
GoSub GetChar
Select Case Ch
Case " "
Rem We may have a date and time literal together.
Number = ReadTime(FileHandle)
If Number = "" Then GoTo Fail
Result = Result & " " & Number
If EOF(FileHandle) Then GoTo Fail
GoSub GetChar
If Ch <> "#" Then GoTo Fail
Case "#"
Rem Literal does not have a time part. Let's add it.
Result = Result & " 00:00:00"
Case Else
GoTo Fail
End Select
ReadDateTime = Result
Exit Function
GetChar:
Get #FileHandle, , Cp
Ch = ToChar(Cp)
Return
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
Private Function ReadDate( _
ByVal FileHandler As Integer, _
ByVal FirstNumber As String, _
ByVal Separator As String _
) As String
Dim SecondNumber As String
Dim ThirdNumber As String
Dim Cp As Integer
Dim Ch As String * 1
Dim YYYY As Integer
Dim MM As Integer
Dim DD As Integer
Dim Result As String
SecondNumber = ReadInteger(FileHandle)
If SecondNumber = "" Then GoTo Fail
Rem The next separator must match the first one.
If EOF(FileHandle) Then GoTo Fail
Get #FileHandle, , Cp
Ch = ToChar(Cp)
If Ch <> Separator Then GoTo Fail
ThirdNumber = ReadInteger(FileHandle)
If ThirdNumber = "" Then GoTo Fail
If CInt(FirstNumber) >= 100 And Separator = "-" Then
YYYY = CInt(FirstNumber)
MM = CInt(SecondNumber)
DD = CInt(ThirdNumber)
Else
MM = CInt(FirstNumber)
DD = CInt(SecondNumber)
YYYY = CInt(ThirdNumber)
If YYYY < 100 Then
YYYY = YYYY + 1900
If YYYY < 1950 Then YYYY = YYYY + 100
End If
End If
Rem Validate year.
If YYYY > 9999 Then GoTo Fail
Rem Validate month.
If MM < 1 Or MM > 12 Then GoTo Fail
Rem Validate day.
Select Case MM
Case 4, 6, 9, 11
If DD > 30 Then GoTo Fail
Case 2
If YYYY Mod 4 = 0 And YYYY Mod 100 <> 0 Or YYYY Mod 400 = 0 Then
If DD > 29 Then GoTo Fail
Else
If DD > 28 Then GoTo Fail
End If
Case Else
If DD > 31 Then GoTo Fail
End Select
Rem Put it together in YYYY-MM-DD format.
If YYYY < 1000 Then Result = "0"
If YYYY < 100 Then Result = Result & "0"
If YYYY < 10 Then Result = Result & "0"
Result = Result & CStr(YYYY)
Result = Result & "-"
If MM < 10 Then Result = Result & "0"
Result = Result & CStr(MM)
Result = Result & "-"
If DD < 10 Then Result = Result & "0"
Result = Result & CStr(DD)
ReadDate = Result
Exit Function
Fail:
Err.Raise vbObjectError + 13, , "Invalid literal"
End Function
Note: Don't mind the repetitiveness of the GetChar, UngetChar, and Fail. We'll refactor them later.
Also, we cannot immediately infer that we are going to read a date when we first see a "#". It can be a filehandle.
We'll need to refactor these functions too to account for that.
Next week, operators.
Andrej Biasic
2020-07-29