Quantcast
Channel: VBForums
Viewing all articles
Browse latest Browse all 16028

Almost found another UTF-8 codec

$
0
0
This might have been nice... if it worked.

Instead this classic ASP helper object is worthless to us.

Code:

Option Explicit

'Needs a reference to "oleprn 1.0 Type Library" and three multiline TextBox controls.

Private Function ToHex(ByRef Raw As String) As String
    Const HexDigits As String = "0123456789ABCDEF"
    Dim HexText As String
    Dim I As Long
    Dim RawByte As Byte
   
    HexText = Space$((LenB(Raw)) * 3)
    For I = 1 To LenB(Raw)
        RawByte = AscB(MidB$(Raw, I, 1))
        Mid$(HexText, (I - 1) * 3 + 1) = Mid$(HexDigits, RawByte \ &H10& + 1, 1)
        Mid$(HexText, (I - 1) * 3 + 2) = Mid$(HexDigits, RawByte Mod &H10& + 1, 1)
    Next
    ToHex = Left$(HexText, Len(HexText) - 1)
End Function

Private Sub Form_Load()
    Const CP_UTF8 As Long = 65001
    Dim Unicode As String
    Dim UTF8 As String

    Unicode = "¶¶" & vbCrLf _
            & "abcdefghijklmnopqrstuvwxyz " _
            & "0123456789 " _
            & "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    Text1.Text = Unicode
    With New OLEPRNLib.OleCvt
        UTF8 = .ToUtf8(Unicode)
        Text2.Text = ToHex(UTF8)
        Text3.Text = .ToUnicode(UTF8, CP_UTF8) 'Convert back.
    End With
End Sub

Private Sub Form_Resize()
    If WindowState <> vbMinimized Then
        With Text1
            .Move 0, 0, ScaleWidth, ScaleHeight / 3
            Text2.Move 0, .Height, ScaleWidth, .Height
            Text3.Move 0, Text2.Top + Text2.Height, ScaleWidth, ScaleHeight - 2 * .Height
        End With
    End If
End Sub

Hex dump of the "converted" data:

Code:

C2 00 B6 00 C2 00 B6 00 0D 00 0A 00 61 00 62 00
63 00 64 00 65 00 66 00 67 00 68 00 69 00 6A 00
6B 00 6C 00 6D 00 6E 00 6F 00 70 00 71 00 72 00
73 00 74 00 75 00 76 00 77 00 78 00 79 00 7A 00
20 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00
37 00 38 00 39 00 20 00 41 00 42 00 43 00 44 00
45 00 46 00 47 00 48 00 49 00 4A 00 4B 00 4C 00
4D 00 4E 00 4F 00 50 00 51 00 52 00 53 00 54 00
55 00 56 00 57 00 58 00 59 00 5A 00

This is "correct" except that it stuffs a NUL after each character. I guess the problem was that the IIS developers who made this thing didn't really understand BSTRs.

How very odd. So close but yet so wrong.

At least it is consistent though, converting the weird "stuffed UTF-8" back works using its ToUnicode() method.

Viewing all articles
Browse latest Browse all 16028

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>