Home General
New Blog Posts: Merging Reports - Part 1 and Part 2

Unicode characters in PDF information fields

edited October 2014 in General
[This followup was posted to digital-metaphors.public.reportbuilder.general and a copy was sent to the cited author.]

Hello,

whenever I need to produce a PDF using Report Builder (version 15.04) I
need to take care of content of information fields in PDFSettings
(Title, Subject, Creator etc.) to avoid using of unicode characters as
they will produce an unpredictable result in a final PDF.

The problem is not so significant, but end users sometimes need lot of
imagination to guess what was the original name of the document or what
was a real name of an author.

Some example:
Author: ?en?k
(Cenek with a hook (or caron/wedge) above C and the second E)
PDF will contain: Eenik
(The first E and the I with an accent grave above them, but I it looks
like I am not able to put these characters to this post).

This "effect" is caused by simplified processing in function
TppPDFUtils.StrToHex (unit ppPDFUtils) which corrupts all non-ASCII
characters. It simply takes just a lower byte of any character and
converts it to a hexadecimal representation which leads to the wrong
representation of those "special" characters in a final document.
For example in Czech language there is such a character in almost every
word.

The original code is:

class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
begin

Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));

end;


My suggestion is to change it to support UTF16 when needed:

class function TppPDFUtils.StrToHex(aString: String): string;
var
liIndex: Integer;
{$ifdef UNICODE}
Buff1, Buff2: TBytes;
{$endif}
begin

{$ifdef UNICODE}
for liIndex := 1 to Length(aString) do
if (Ord(aString[liIndex]) > 127) then
begin
Buff1 := TEncoding.BigEndianUnicode.GetPreamble;
Buff2 := TEncoding.BigEndianUnicode.GetBytes(aString);
SetLength(Result, (Length(Buff1) + Length(Buff2)) * 2);
BinToHex(Buff1, PChar(@Result[1]), Length(Buff1));
BinToHex(Buff2, PChar(@Result[1 + Length(Buff1) * 2]), Length
(Buff2));
Exit;
end;
{$endif}
Result := '';
for liIndex := 1 to Length(aString) do
Result := LowerCase(Result + IntToHex(Ord(AnsiString(aString)
[liIndex]), 2));

end;

However this change doesn't solve the problem for encoded documents
because a similar method has been used for encoding procedures. So other
code has to corrected to avoid such problems but I hope that at least
this small change could help to lot of users from non-English countries.

--
Best regards,
Igor Gottwald

Comments

This discussion has been closed.