Unicode characters in PDF information fields
[This followup was posted to digital-metaphors.public.reportbuilder.general and a copy was sent to the cited author.]
Hello,
whenever I need to produce a PDF using Report Builder (version 15.04) I
need to take care of content of information fields in PDFSettings
(Title, Subject, Creator etc.) to avoid using of unicode characters as
they will produce an unpredictable result in a final PDF.
The problem is not so significant, but end users sometimes need lot of
imagination to guess what was the original name of the document or what
was a real name of an author.
Some example:
Author: ?en?k
(Cenek with a hook (or caron/wedge) above C and the second E)
PDF will contain: Eenik
(The first E and the I with an accent grave above them, but I it looks
like I am not able to put these characters to this post).
This "effect" is caused by simplified processing in function
TppPDFUtils.StrToHex (unit ppPDFUtils) which corrupts all non-ASCII
characters. It simply takes just a lower byte of any character and
converts it to a hexadecimal representation which leads to the wrong
representation of those "special" characters in a final document.
For example in Czech language there is such a character in almost every
word.
The original code is:
class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
begin
Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));
end;
My suggestion is to change it to support UTF16 when needed:
class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
{$ifdef UNICODE}
Buff1, Buff2: TBytes;
{$endif}
begin
{$ifdef UNICODE}
for liIndex := 1 to Length(aString) do
if (Ord(aString[liIndex]) > 127) then
begin
Buff1 := TEncoding.BigEndianUnicode.GetPreamble;
Buff2 := TEncoding.BigEndianUnicode.GetBytes(aString);
SetLength(Result, (Length(Buff1) + Length(Buff2)) * 2);
BinToHex(Buff1, PChar(@Result[1]), Length(Buff1));
BinToHex(Buff2, PChar(@Result[1 + Length(Buff1) * 2]), Length
(Buff2));
Exit;
end;
{$endif}
Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));
end;
However this change doesn't solve the problem for encoded documents
because a similar method has been used for encoding procedures. So other
code has to corrected to avoid such problems but I hope that at least
this small change could help to lot of users from non-English countries.
--
Best regards,
Igor Gottwald
Hello,
whenever I need to produce a PDF using Report Builder (version 15.04) I
need to take care of content of information fields in PDFSettings
(Title, Subject, Creator etc.) to avoid using of unicode characters as
they will produce an unpredictable result in a final PDF.
The problem is not so significant, but end users sometimes need lot of
imagination to guess what was the original name of the document or what
was a real name of an author.
Some example:
Author: ?en?k
(Cenek with a hook (or caron/wedge) above C and the second E)
PDF will contain: Eenik
(The first E and the I with an accent grave above them, but I it looks
like I am not able to put these characters to this post).
This "effect" is caused by simplified processing in function
TppPDFUtils.StrToHex (unit ppPDFUtils) which corrupts all non-ASCII
characters. It simply takes just a lower byte of any character and
converts it to a hexadecimal representation which leads to the wrong
representation of those "special" characters in a final document.
For example in Czech language there is such a character in almost every
word.
The original code is:
class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
begin
Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));
end;
My suggestion is to change it to support UTF16 when needed:
class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
{$ifdef UNICODE}
Buff1, Buff2: TBytes;
{$endif}
begin
{$ifdef UNICODE}
for liIndex := 1 to Length(aString) do
if (Ord(aString[liIndex]) > 127) then
begin
Buff1 := TEncoding.BigEndianUnicode.GetPreamble;
Buff2 := TEncoding.BigEndianUnicode.GetBytes(aString);
SetLength(Result, (Length(Buff1) + Length(Buff2)) * 2);
BinToHex(Buff1, PChar(@Result[1]), Length(Buff1));
BinToHex(Buff2, PChar(@Result[1 + Length(Buff1) * 2]), Length
(Buff2));
Exit;
end;
{$endif}
Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));
end;
However this change doesn't solve the problem for encoded documents
because a similar method has been used for encoding procedures. So other
code has to corrected to avoid such problems but I hope that at least
this small change could help to lot of users from non-English countries.
--
Best regards,
Igor Gottwald
This discussion has been closed.
Comments
Thank you for the information and code. We will research this solution
and add the changes for the next release of ReportBuilder.
Nico Cizik
Digital Metaphors
http://www.digital-metaphors.com