v Vitalij
on

 

Hi P21 team,

 

could you please clarify why this ERROR message  named in the subject line has been added as i did not find anything in CDISC Define Specification? 

 

Thanks,

Vitalij

Forums: Define.xml

j Jozef
on January 17, 2020

OD0010 (Missing XML declaration) is a complete OVERINTERPRETATION of as well the Define-XML standard as of the W3C XML standard itself.
There is no such rule in the XML world.

Of course having the XML declaration is a good habit, but it is not absolutely necessary, i.e. it is optional.
It is nothing more than an INDICATION for the parser (the software that processes the XML), that what is following might be XML, with the "encoding" giving an indication about what encoding might have been used for the characters (UTF-8 is the default anyway).

 

See e.g. https://stackoverflow.com/questions/7007427/does-a-valid-xml-file-require-an-xml-declaration/7007781

As the article explains, the confusion may come from a misinterpretation of the word "should" in the English language. "Should" means an expectation in the standards (W3C) world, not an obligation.

I am teaching XML for 25 years now (I have been a professor in medical informatics), and had hoped that such basic knowledge was generally present at software and standards developers ...

Sergiy
on January 20, 2020

Hi Vitalij, 

Please see CDISC Define-XML v2.0 documentation on page 51:

"5.3. Define-XML Specification Details

5.3.1. XML Header
All XML files must begin with an XML header, so the first line of a define.xml file must be an XML header. The XML header indicates to applications that the remainder of the file is XML and specifies character encoding it uses.
5.3.1.1. Example XML Header

<?xml version="1.0" encoding="UTF-8"?>


This example shows a define.xml using the "UTF-8" character encoding."

Note that the 2 most common reasons for OD0010 valdiation message are

1. Missing XML declaration (header)

2. Invisible characters in front of it

Kind Regards, 

Sergiy

P.S. Note that OD0010 rule is Reject issue for PMDA submissions.

Lex
on January 20, 2020

I think, technically, and formally looking at the XML specification, Jozef is right. Having said that, I don’t see a good reason to omit the XML declaration. Maybe that is why we required it in the Define-XML specifications 2.0 and also still in 2.1. Or maybe we did not realize that, formally, it is not required.

Anyway, I like the following article, and completely agree with it: https://www.ibm.com/developerworks/library/x-tipdecl/index.html

This is just my personal view.

Best, Lex

j Jozef
on January 21, 2020

Thanks Lex! And indeed, a very good article!

So my proposal is to make it a "warning" - it should not cause panic or stop people/machines to parse the define.xml file.

Invisible characters before the XML declaration: XML is all about characters, but one can indeed have e.g. a carriage return before the xml declaration, which indeed makes the XML essentially "not well formed". This can however easily be detected by opening the define.xml in an XML editor (not a bad idea ...), and check that the xml declaration starts on the first line and is completely to the left.

The most serious problems with XML files I have seen in the last 30 years (yes, that's my experience with XML) is that people set encoding="UTF-8" in the header, but that the real encoding of the characters is another (e.g. Latin). My experience is that this often happens when people start from an Excel or Word or similar file.

j Jennifer
on February 17, 2020

I am getting this message when there is a proper XML declaration that is the first entry in the file and with no leading characters.

<?xml version="1.0" encoding="UTF-8"?>

 

I also ran the same define.xml through v2.2.0 of the define.xml validator and didn't get this message.

thoughts?

 

j Jozef
on February 17, 2020

Hi Jennifer,

As always, a good way to find out whether something is really wrong in your define.xml, or whether it is a bug in the software, is to open the define.xml in an XML editor. Each such XML editor that I know has a button "check whether valid XML" or "check whether well-formed". Additionally, you can of course always do an XML-Schema validation.
If there are some (even hidden) invalid characters or so, the XML editor will find out, and show you where.

j Jennifer
on February 17, 2020

I was just using MS XML Notepad, no errors detected, and nothing I could see in simple Notepad either.  Out of frustration, I ended up deleting the text and retyping it back in, and now no error from Pinnacle, so I guess there was something "hidden" that the simple editors I used don't display.  Thanks very much for your reply.

 

j Jozef
on February 17, 2020

Ok - solved! But of course curious people also want to know why ... ;-)
Does MS XML Notepad still exist? I used it many many years ago ...
If you need some information about good XML editors (at an affordable price), just drop me a mail.
But one can also check XML validity programatically (Java, C#, SAS, ...), For the latter, Lex can probably tell you more...

Sergiy
on February 17, 2020

Hi Jennifer, 

OD0010 is a new issue which has just been found and fixed recently. I am not Java programmer. Therefore, here is my best interpretation:

Most likely, your Define.xml file contains a BOM (Byte Order Marker) character immediately before the XML header. You cannot see this character in XML or text editor. You need to use hex editor instead. This is an uncommon but valid scenario and the BOM is being incorrectly flagged as illegal due to a bug in the Java Stream implementation (https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058)

P21 validation engine is Java-based and inherits this old java bug. We had a special procedure to work around it. Recently when we have changed an application packaging process, it missed one of those changes relates to the Java libraries used to open and parse XML files.

A bug fix will be available in the next patch or major release whatever will be first.

Meanwhile, you can either explain this OD0010 validation message as a bug or remove BOM character:

  1. open your Define.xml in Notepad or other text editor
  2. select all text and copy it
  3. open a new document in Notepad
  4. paste copied text and save document

Sorry for inconvenience.

Kind regards, 
Sergiy

 

j Jozef
on February 18, 2020

As a Java and XML specialist with over 25 years of experience in both, I must disagree with Sergiy about the statement that a (always hidden) BOM character cannot be detected by an XML editor. See e.g. https://www.oxygenxml.com/doc/versions/22.0/ug-editor/topics/preferences-encoding.html.

Also be aware that the bug in Java has been resolved and closed for many years. The link that Sergiy provides states "This bug is not available". I don't know what exact Java libraries or version the validator uses, but the one blamed to be the cause for OD0010 is surely outdated. For parsing in Java, I would strongly recommend Saxon-HE or higher.

Furthermore, I would STRONGLY discourage people using MS NotePad for editing XML files, as my long experience is that it easily introduces BOM or encoding errors. Use a real XML editor or at least NotePad++. An XML editor such as oXygen, Liquid XML Studio or any other is always the better choice.
Here is a good overview: https://en.wikipedia.org/wiki/Comparison_of_XML_editors.

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.