n nalin
on

 

Hello.

I wanted to start by saying that OpenCDISC is a great tool and does most of the heavy lifting for both SDTM validation and creation of define.xml... Great effort.

Having said that I have the following questions/comments on the define.xml file: 

1. For the variables where the controlled terminology name (eg country) is from the IG, dont we need to have the name in 'Controlled Terminology' column hyperlinking to the 'Controlled Terminology' section in the define.xml like the sponsor defined CT?  I see checks on the data compared to the standard CT (eg. Country) but there is nothing in the 'Controlled Terminology' column in the variable level metadata column.

2. Dont we need to have the keys column populated at the dataset level metadata?  How can this be done?

3. I believe the 'Description' column of the dataset level metadata section pulls the dataset labels from SDTM IG.  I was wondering if it would make sense to pull this information from the dataset metadata and have check to confirm that the values provided match the IG.  I am reviewing some SDTM datasets provided by a vendor and they dont use the same labels for the datasets as provided in the IG and hence there is a disconnect between the define.xml and the actual data.  Few more advantages to using actual dataset metadata:

    a. Currently custom/sponsor defined domain dont have any label in the define.xml

    b. SUPPXX datasets dont have any labels in the define.xml

Thanks in advance.

Nalin

Forums: General Discussion

t Tim
on September 8, 2010

Hi Nalin,

 

First off, thank you for using the OpenCDISC Validator. We're very happy that you've found it useful, and we hope to continue improving it with each release.

 

Now, let me see if I can address some of your points:

  1. Generally, it's expected that sponsors define codelists that are specific to their data. Therefore, the define.xml generator will not automatically any create codelists for you. However, if you fill out codelists.xls and include it during the generation, codelists will be generated for you. Those codelists will be accessible via hyperlink when using the CDISC-provided define.xml stylesheet. I'm sure there's room for improvement here though, so we'll look into how to improve this process in the future.
  2. At the moment you'll need to edit the XML manually to add the value for DomainKeys. There's a spot for it, but this information is not currently collected when generating define.xml, so the generator doesn't know what to fill in.
  3. All of the metadata is pulled from the configuration that you choose as part of the generation, which is based on the SDTM IG. We try to avoid taking data as much as possible, because it allows us to validate the dataset metadata against define.xml more accurately. When the metadata comes from a verified, independent source, it's possible to see where the dataset metadata is incorrect. If we took the data from the datasets themselves, we would not be able to identify the discrepancies between the data files and define.xml, since any mistakes would have automatically carried over.

    As you've noted, this isn't always a perfect system from the generation side, since it requires manual editing of the define.xml file. However, this independent entry provides the benefits listed above, so we feel that it's a suitable system. Admittedly the SUPPQUALs could be handled better though (since we do know the label data for those variables), and that's something we're working on improving.

Regards,

Tim

a Anthony
on September 30, 2010

Tim,

 

For generating define.xml from XPT, will you kindly also consider allowing users to re-use the dataset label stored within the dataset files in addition to the current method of, I think, drawing the information in config-sdtm-3.1.2.xml?  The former may be the default behavior. (Side note: perhaps, same goes to variable labels, order of variable, etc.)

 

Yes, it seems to be really useful to have a central control file for inputting the metadata neccessary to create a fully qualified define.xml, e.g., the inter-relationship between variable to value-level metadata to codelist; ordering; and, external dictionaries, etc.

 

Regards,

Anthony

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.