Unicode Pipeline

rbuser · July 2013

Subject : Unicode Pipeline

In the text file which is in the Pipeline I have a Unicode string.

The string is "-1" meaning minus one.
In hex = 2D003100

I have the code

iNum := StrToIntDef(ppPipeline.Fields[0].AsString, 0);

However the Field contents is "" and not "-1".

In the Help it says the TppTextPipeline "access data from an ASCII text
file".

How do I get it to access a Unicode text file?

Regards,
Peter Evans

nardmoseley · July 2013

You can use the TextPipeline.Encoding property to assign an encoding.

Example:

uses
SysUtils;

myTextPipeline.Encoding := TEncoding.Unicode;

When no TextPipeline.Encoding is specified, the TextPipeline tries to
auto-detect the encoding by examining the first several bytes of the file -
sometimes referred to as the 'preamble' or the BOM, Byte Order Mark.
(https://en.wikipedia.org/wiki/Byte_order_mark).

--

Nard Moseley
Digital Metaphors
www.digital-metaphors.com

rbuser · July 2013

On 18/07/2013 12:35 AM, Nard Moseley (Digital Metaphors) wrote:

Thanks for that advice.
I found I had to set the TextPipeline.Encoding and write out the BOM.

It did not work just setting the TextPipeline.Encoding alone.

My suggestion is that you :
1) update the documentation as currently it states "access data from an
ASCII text file". That is wrong. It also turns away potential customers.
2) make Encoding a published Property. I had to manually change many
pipelines throughout my code.
3) make Unicode the default setting. No excuses for not doing so in this
day and age.

Regards,
Peter Evans

nardmoseley · July 2013

Thanks for the feedback.

When TextPipeline.Encoding is specified, a BOM should not be required. This
will be fixed for the next maintenance release.

We will also update the documentation. I notice the Encoding property is not
documented.

We are following the Unicode VCL standard and I believe it is well thought
out and has proven to work well based on customer feedback. There is a
Delphi help topic 'Using TEncoding for Unicode Files' that explains how it
works. Basically when a BOM is present, it is used, otherwise the
TEncoding.Default encoding is used, which is the Windows ANSI code page.
This works well for handling Ansi text data - which is quite common and for
which there is no BOM - there is no such thing. Unicode on the other hand,
has different encodings, UTF-16, UTF-8, etc. For example UTF-8 is used
frequently - internet files, OSX, iOS. When Unicode text is written to a
file, a BOM should be used. On my Windows 7 machine when I use Notepad to
create text files, the default is Ansi - no BOM. In the Save dialog I can
specify an encoding - in which case Notepad writes a BOM. When Notepad
opens a text file, it looks for a BOM to determine how the text is encoded,
otherwise it assumes Ansi.

-
Nard Moseley
Digital Metaphors
www.digital-metaphors.com

rbuser · July 2013

On 18/07/2013 3:09 PM, Peter Evans wrote:

I found the above worked for some basic tests.
Since then I have tested for other situations.

I have had make further changes. (For non Unicode I could write a one
line text string to a Pipeline. With no CR LF characters at the end.
That approach worked.)

The changes I have had to now make are:-
1) also write out the CR LF at the end of the last line.
2) every time I perform ppTextPipeline1.Fields[0].AsString, or similar,
I have to strip off any trailing CR LF or whatever.

Regards,
Peter Evans

nardmoseley · July 2013

If you would like to create a simple example that I can build and run here
in the Delphi debugger, I can research it. Perhaps you can create a couple
of test files - one ANSI, one Unicode. Please send any examples to support@

-
Nard Moseley
Digital Metaphors
www.digital-metaphors.com

Unicode Pipeline

Comments