Metamorphing Machine I rather be this walking metamorphosis
than having that old formed opinion about everything!

A case of casings

As I said last week, we are going to implement case-changing functions like UCase and LCase.
I've been avoiding it since the beginning because I had no clear picture of how to approach it.
VB6 does not have a way to inline a list of values as its BASIC counterpart with the DATA statement.

Then I remembered about resource files. I could use them and, in case I choose to translate the code to C in the future, I would just use C's array literal.
With that in mind, I've dusted out an old project having my first attempts to tackle upper and lower case letters.
I used it to create a .CSV file with three columns: Lower case codepoint, upper case codepoint, and title case codepoint.

Something like this:

223,83,83
224,192,192
225,193,193
226,194,194
(...)

(It is available to download here.)
With that, I can search for a lower case letter in the first column. If I find it, I have its upper case version in the next column and its title version in the next one.

Then I opened the file in Excel and copied the second column to the fourth column.
The fifth column was filled with the spreadsheet's line number minus one, then I ordered the fourth column together with the fifth column.
Now I can also search for an upper case letter in the fourth column. If I find it, I have its lower case version's index in the next column.

Now, it's to time to convert it to a binary file. Here is the code:

Public Sub Main()
Dim Codes As String
Dim Values() As String

Open "Casing.csv" For Input Access Read As #1
Open "Casing.bin" For Binary Access Write As #2

While Not EOF(1)
Line Input #1, Codes
Values = Split(Codes, ",")
Put #2, , CInt("&H" & Hex$(Values(0)))
Put #2, , CInt("&H" & Hex$(Values(1)))
Put #2, , CInt("&H" & Hex$(Values(2)))
Put #2, , CInt("&H" & Hex$(Values(3)))
Put #2, , CInt("&H" & Hex$(Values(4)))
Wend

Close
End Sub

The next step is to add it as a resource file to our project.
Having done that, this is the module with our first versions of UCase and LCase:

Module StringCentral
Private Declare Function CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
(ByVal Dest As LongPtr, ByVal Src As LongPtr, ByVal Length As Long) As Long
Private CodePoints_() As Integer
Private IsInit_ As Boolean


Private Sub Init()
Dim Bytes() As Byte
Dim Size As Long

IsInit_ = True
Bytes = LoadResData(101, "CUSTOM")
Size = UBound(Bytes) + 1
ReDim CodePoints_(0 To Size \ 2 - 1)
CopyMemory VarPtr(CodePoints_(0)), VarPtr(Bytes(0)), Size
End Sub


Public Function ToUpperChar(ByVal Ch As String) As String
Dim Cp As Integer
Dim Index As Long
Dim Result As String

If Len(Ch) = 0 Then Err.Raise 5
Ch = Left$(Ch, 1)

Select Case Ch
Case "A" To "Z"
Result = Ch

Case "a" To "z"
Cp = AscW(Ch)
Result = ChrW$(Cp - 32)

Case Else
If Not IsInit_ Then Init
Cp = AscW(Ch)
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=0, Step:=5)
If Index = -1 Then Result = Ch Else Result = ChrW$(CodePoints_(Index + 1))
End Select

ToUpperChar = Result
End Function


Public Function ToLowerChar(ByVal Ch As String) As String
Dim Cp As Integer
Dim Index As Long
Dim Result As String

If Len(Ch) = 0 Then Err.Raise 5
Ch = Left$(Ch, 1)

Select Case Ch
Case "A" To "Z"
Cp = AscW(Ch)
Result = ChrW$(Cp + 32)

Case "a" To "z"
Result = Ch

Case Else
If Not IsInit_ Then Init
Cp = AscW(Ch)
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=3, Step:=5)

If Index = 2 Then
Result = Ch
Else
Index = CodePoints_(Index + 1) * 5
Result = ChrW$(CodePoints_(Index))
End If
End Select

ToLowerChar = Result
End Function


Public Function ToTitleChar(ByVal Ch As String) As String
Dim Cp As Integer
Dim Index As Long
Dim Result As String

If Len(Ch) = 0 Then Err.Raise 5
Ch = Left$(Ch, 1)

Select Case Ch
Case "A" To "Z"
Result = Ch

Case "a" To "z"
Cp = AscW(Ch)
Result = ChrW$(Cp - 32)

Case Else
If Not IsInit_ Then Init
Cp = AscW(Ch)
Rem Search for a lower case character.
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=0, Step:=5)

If Index = -1 Then
Rem Search for an upper case character.
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=3, Step:=5)

If Index = 2 Then
Result = Ch
Else
Index = CodePoints_(Index + 1) * 5
Result = ChrW$(CodePoints_(Index + 2))
End If
Else
Result = ChrW$(CodePoints_(Index + 2))
End If
End Select

ToTitleChar = Result
End Function
End Module


Private Function BinarySearch( _
ByRef SourceArray As Variant, _
ByVal Target As Variant, _
Optional ByVal FirstIndex As Integer, _
Optional ByVal Step As Integer = 1 _
) As Long
Dim LeftPoint As Long
Dim RightPoint As Long
Dim MiddlePoint As Long
Dim ResultIndex As Long
ResultIndex = FirstIndex - 1

RightPoint = UBound(SourceArray) - Step + 1 + FirstIndex
LeftPoint = FirstIndex

Do While LeftPoint <= RightPoint
MiddlePoint = (LeftPoint + RightPoint) \ (2 * Step)
MiddlePoint = MiddlePoint * Step + FirstIndex

Select Case SourceArray(MiddlePoint)
Case Is < Target
LeftPoint = MiddlePoint + Step

Case Is > Target
RightPoint = MiddlePoint - Step

Case Else
ResultIndex = MiddlePoint
Exit Do
End Select
Loop

BinarySearch = ResultIndex
End Function
End Module

This was the first draft. What we'll do now is evolve ToUpperChar, ToLowerChar, and ToTitleChar to deal with whole strings.
(We'll move Scanner's IsLetter, IsLowSurrogate, IsHighSurrogate, and IsSurrogate functions to StringCentral module because I think they fit there better.)

Public Function ToUpper(ByVal Text As String) As String
Dim Cp As Integer
Dim Pos As Long
Dim Index As Long
Dim Ch As String * 1
Dim Result As String

Result = Text

For Pos = 1 To Len(Text)
Ch = Mid$(Result, Pos, 1)

Select Case Ch
Case "A" To "Z"
Rem Nothing to do

Case "a" To "z"
Cp = AscW(Ch)
Ch = ChrW$(Cp - 32)

Case Else
If Not IsInit_ Then Init
Cp = AscW(Ch)
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=0, Step:=5)
If Index <> -1 Then Ch = ChrW$(CodePoints_(Index + 1))
End Select

Mid$(Result, Pos, 1) = Ch
Next

ToUpper = Result
End Function


Public Function ToLower(ByVal Text As String) As String
Dim Cp As Integer
Dim Pos As Long
Dim Index As Long
Dim Ch As String * 1
Dim Result As String

Result = Text

For Pos = 1 To Len(Text)
Ch = Mid$(Result, Pos, 1)

Select Case Ch
Case "A" To "Z"
Cp = AscW(Ch)
Ch = ChrW$(Cp + 32)

Case "a" To "z"
Rem Nothing to do

Case Else
If Not IsInit_ Then Init
Cp = AscW(Ch)
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=3, Step:=5)

If Index <> 2 Then
Index = CodePoints_(Index + 1)
If Index <> -1 Then Ch = ChrW$(CodePoints_(Index * 5))
End If
End Select

Mid$(Result, Pos, 1) = Ch
Next

ToLower = Result
End Function


Public Function ToTitle(ByVal Text As String) As String
Dim ToUp As Boolean
Dim Cp As Integer
Dim Pos As Long
Dim Index As Long
Dim Ch As String * 1
Dim Result As String

Result = Text
ToUp = True

For Pos = 1 To Len(Text)
Ch = Mid$(Result, Pos, 1)
Cp = AscW(Ch)

If IsLetter(Cp) Then
If ToUp Then
ToUp = False

Select Case Ch
Case "A" To "Z"
Rem Nothing to do

Case "a" To "z"
Ch = ChrW$(Cp - 32)

Case Else
If Not IsInit_ Then Init
Rem Search for a lower case character.
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=0, Step:=5)

If Index = -1 Then
Rem Search for an upper case character.
Index = BinarySearch(CodePoints_, Cp, FirstIndex:=3, Step:=5)

If Index <> 2 Then
Index = CodePoints_(Index + 1) * 5
Ch = ChrW$(CodePoints_(Index + 2))
End If
Else
Ch = ChrW$(CodePoints_(Index + 2))
End If
End Select
Else
Ch = ToLower(Ch)
End If
Else
ToUp = True
End If

Mid$(Result, Pos, 1) = Ch
Next

ToTitle = Result
End Function

I don't know if I'm covering all cases (no pun intended), but as a not-so-old saying goes
"The fast way to get a correct piece of information is to post a wrong one on the Internet",
I'll just state we are done with them!

Next week we are not going back to our transpiler yet. Let's have some fun first!

Andrej Biasic
2021-01-13