HomeForum
Welcome, Guest

XML processing UTF-16 Character Encoding
(1 viewing) (1) Guest
  • Page:
  • 1
  • 2

TOPIC: XML processing UTF-16 Character Encoding

XML processing UTF-16 Character Encoding 1 year, 1 month ago #2677

  • andyyoung
  • OFFLINE
  • Fresh Boarder
  • Posts: 5
  • Karma: 0
The XML files I need to process are encoded in the UTF-16 character set.
This is not changeable in the application that is creating them.

When trying to process these files with Advanced ETL Processor Pro, I get an error message:

"Invalid XML Element: malformed tag found (no valid name) at position 2"

This message is erroneous. If I retype *exactly* the same text in UTF-8 format then it works fine.

All my my text is standard ASCII, with no complex characters. I do not need to deal with non western character sets. Is there any way I can deal with this within AETL Pro? Even a workaround would be great if it can't deal directly wth UTF-16 then a conversion to a single byte latin character set ISO-8859-1 or UTF-7 would work fine for me.

Using ETL Pro v 4.2.4.13

Re: XML processing UTF-16 Character Encoding 1 year, 1 month ago #2679

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
Andy

Can you zip and post example of the file here and we will do our best to help you

Mike

Re: XML processing UTF-16 Character Encoding 1 year, 1 month ago #2680

  • andyyoung
  • OFFLINE
  • Fresh Boarder
  • Posts: 5
  • Karma: 0
Thankyou for the fast response. An example file is zipped and attached.
Attachments:

Re: XML processing UTF-16 Character Encoding 1 year, 1 month ago #2682

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
Andy

Our software currently supports utf8 encoding only
We will add support for utf-16 in the next release(it will take 1-2 weeks)

However even if we add support for utf-16 there is a problem with your file

Byte order mark is missing
en.wikipedia.org/wiki/Byte_order_mark

if you have a look at the files here
they do have t
www.w3schools.com/XML/xml_encoding.asp

Mike

Re: XML processing UTF-16 Character Encoding 1 year, 1 month ago #2683

  • andyyoung
  • OFFLINE
  • Fresh Boarder
  • Posts: 5
  • Karma: 0
Thanks Mike. Reading further into that page, it looks like the identifer bytes at the start of UTF format text are optional:

"The Unicode standard states, The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian."


The first byte is hex 3c which is ascii "<" which would make me think it's correct XML.

Thanks for looking into UTF-16 support, it's appreciated that you are fast moving with regards to user requests and problems.

Re: XML processing UTF-16 Character Encoding 1 year ago #2684

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
We will investigate it as well

Mike
  • Page:
  • 1
  • 2
Time to create page: 0.19 seconds

Testimonials

"I've been very impressed with Advanced ETL Processor. It is extremely powerful and can validate and transform practically any data set you work with."

David Gig,
Director of Information Technology

User Login

You only need to log in or register to use our support forum



Our customers

BP

BBC

HSBC


Databases we work with

Go to top