Home General
New Blog Posts: Merging Reports - Part 1 and Part 2

Unicode Pipeline

edited July 2013 in General
Subject : Unicode Pipeline

In the text file which is in the Pipeline I have a Unicode string.

The string is "-1" meaning minus one.
In hex = 2D003100

I have the code

iNum := StrToIntDef(ppPipeline.Fields[0].AsString, 0);

However the Field contents is "" and not "-1".

In the Help it says the TppTextPipeline "access data from an ASCII text
file".

How do I get it to access a Unicode text file?

Regards,
Peter Evans

Comments

  • edited July 2013

    You can use the TextPipeline.Encoding property to assign an encoding.

    Example:

    uses
    SysUtils;

    myTextPipeline.Encoding := TEncoding.Unicode;

    When no TextPipeline.Encoding is specified, the TextPipeline tries to
    auto-detect the encoding by examining the first several bytes of the file -
    sometimes referred to as the 'preamble' or the BOM, Byte Order Mark.
    (https://en.wikipedia.org/wiki/Byte_order_mark).


    --

    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com

    Best regards,

    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com
  • edited July 2013
    On 18/07/2013 12:35 AM, Nard Moseley (Digital Metaphors) wrote:


    Thanks for that advice.
    I found I had to set the TextPipeline.Encoding and write out the BOM.

    It did not work just setting the TextPipeline.Encoding alone.

    My suggestion is that you :
    1) update the documentation as currently it states "access data from an
    ASCII text file". That is wrong. It also turns away potential customers.
    2) make Encoding a published Property. I had to manually change many
    pipelines throughout my code.
    3) make Unicode the default setting. No excuses for not doing so in this
    day and age.

    Regards,
    Peter Evans
  • edited July 2013
    Thanks for the feedback.

    When TextPipeline.Encoding is specified, a BOM should not be required. This
    will be fixed for the next maintenance release.

    We will also update the documentation. I notice the Encoding property is not
    documented.

    We are following the Unicode VCL standard and I believe it is well thought
    out and has proven to work well based on customer feedback. There is a
    Delphi help topic 'Using TEncoding for Unicode Files' that explains how it
    works. Basically when a BOM is present, it is used, otherwise the
    TEncoding.Default encoding is used, which is the Windows ANSI code page.
    This works well for handling Ansi text data - which is quite common and for
    which there is no BOM - there is no such thing. Unicode on the other hand,
    has different encodings, UTF-16, UTF-8, etc. For example UTF-8 is used
    frequently - internet files, OSX, iOS. When Unicode text is written to a
    file, a BOM should be used. On my Windows 7 machine when I use Notepad to
    create text files, the default is Ansi - no BOM. In the Save dialog I can
    specify an encoding - in which case Notepad writes a BOM. When Notepad
    opens a text file, it looks for a BOM to determine how the text is encoded,
    otherwise it assumes Ansi.


    -
    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com

    Best regards,

    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com
  • edited July 2013
    On 18/07/2013 3:09 PM, Peter Evans wrote:


    I found the above worked for some basic tests.
    Since then I have tested for other situations.

    I have had make further changes. (For non Unicode I could write a one
    line text string to a Pipeline. With no CR LF characters at the end.
    That approach worked.)

    The changes I have had to now make are:-
    1) also write out the CR LF at the end of the last line.
    2) every time I perform ppTextPipeline1.Fields[0].AsString, or similar,
    I have to strip off any trailing CR LF or whatever.

    Regards,
    Peter Evans
  • edited July 2013
    If you would like to create a simple example that I can build and run here
    in the Delphi debugger, I can research it. Perhaps you can create a couple
    of test files - one ANSI, one Unicode. Please send any examples to support@


    -
    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com

    Best regards,

    Nard Moseley
    Digital Metaphors
    www.digital-metaphors.com
This discussion has been closed.