m Mike
on

 

Hi,

After reading the CRT-DDS and the newly released CDSIC Metadata Submission Guidelines (MSG), I'm curious about the CodeList and its overall representation. Therefore, here are some questions that I was hoping this forum could assist with.

  1. Extensible CT CodeList: For example, in LB, there will be test parameters that match the CT (e.g. Glucose) and those that will not. As such, the assumption is that CT of LBTEST is an extensible list, and only those values that match the CT (and don't) will be represented in the CodeList, but not all possible values from the CT, correct?
  2. Non-Extensible CT CodeList: Should all values from a non-extensible CT be included in the define.xml's CodeList. For example, if NY CT is utilized, but actual data only collects the value "Y", then should "N", "U" and "NA" be included as well?

The MSG and CRT-DDS are a little vague when it comes to this and interpretation can be misleading sometimes. I'd like to get some opinions on this, as well as, ask if the OpenCDISC development team would include such a valdiation crosscheck?

Thank you in advance!

Forums: Define.xml

s Sergiy
on March 15, 2012

Define file and aCRFs describe and explain your data and data collection process. Terminology code lists should be used for design of data collection process and (may be) during SDTM data conversion/mapping. A general rule is to include: I. ONLY values which were used during the data collection (e.g., only actually collected Lab Tests, Units, etc. or values presented on CRFs) II. ALL values used during data collection (e.g., all options presented on CRF, but not only values presented in collected data). This is very common issue in submission data. Often sponsors/data vendors created define.xml with code lists which represents collected data, rather than a data collection process. It's much easy to generate such incomplete code lists using collected data and simple programming technique (e.g., SAS Proc Freq, SQL Select Distinct, etc.), than use the different sources of data collection metadata (EDC configurations, CRF design specs, etc.). For example, CRF AE Severity may have Mild, Moderate, Severe options; only Mild and Moderate AEs were collected during the study conduct. Some sponsors create define.xml with (Mild;Moderate) values. It’s easy to do, but it’s wrong. You should include everything presented on CRF: (Mild;Moderate;Severe). Regarding your two questions: 1. All collected, only collected or pre-specified by protocol LB tests should be included in your define.xml codelist. 2. The same as above. In your example, if Y is the only option on CRF, then Y is the only value in your codelist. If CRF includes Y and N options, then your codelist should also be (N, Y) regardless of actually collected data (Y value only)

l Lex
on March 15, 2012

Excellent points you make. I believe the issues are indeed happening when the define.xml is seen as an artifact that has to be created from the data. In a well established end-to-end process the define.xml would be the rendition of metadata that has been used to drive the process.
Most of the metadata in the define.xml can not be derived just from the data, but needs to be actively managed in a consistent way across studies and end-to-end.

 

Disclaimer: The opinions expressed above are my personal thoughts and may not reflect the opinions of my employer  (SAS ) or CDISC.

m Mike
on March 16, 2012

Sergiy/Lex,

I appreciatethe feedback and this make a lot of sense; albeit, quite a cumbersome endeavour. Also, regarding validation implementation within OpenCDISC would be impossible for inclusion. I'm sure this question will come up often, especially as sponsors begin to utilize the newly released MSG. Moreover, how more in-depth define.xml will become with the future release of ODM 1.3, whenever that happens.

Thank you!

s Sergiy
on March 16, 2012

Yes, a current define.xml standard has limitations which need to be fixed. One issue is that now a CodeList is a variable-level entity, but it should be value-level one. E.g., for each –TESTCD value you may have separate codelists for –STRESC, ----RESU, etc. or may not have any codelists for particular –TESTCD values in the same domain. For example, some EG results (Interpretations) need control terminology (as specified in SDTM IG). However other EG results can be numeric (QT, PR intervals, Heart Rate, etc.) and those records should not have any pre-defined or descriptive code lists. Some numeric values could actually be categorical data (coded values, scale, etc.) and they need to be described by codelists as well. All those details are expected to be provided in define.xml file. The current standard is not ready for this. The good news is that new version is under development.
m Mike
on March 16, 2012

Sergiy,

Yes, I agree with the enumertated value-level specifications. It creates both a more refined and valid correlation on a micro-level, which one would assume is the primary purpose of the define to beign with.

Thanks

l Lex
on March 16, 2012

Even the current define.xml standard allows you to attach a CodeList to a ValueList item.
Both variable-level and value-level entities get their attributes and sub-elements from the XML ItemDef construct. 

s Sergiy
on March 16, 2012

Lex, It's not clear for me from define.xml document. How I can create a codelist for VSSTRESU when VSTESTCD=='TEMP'? E.g., when VSTESTCD='TEMP', then a codelist for VSORRESU is 'F', 'C', a codelist for VSSTRESU is 'C' when VSTESTCD='HEIGHT', then codelist for VSORRESU is 'cm', 'in', a codelist VSSTRESU is 'cm' Is there any good reference or giundance on this topic? Thanks
l Lex
on March 16, 2012

Ok, I see what you are saying.

For define 1.0 we do not have the metadata to explicitely define what you want.
We can attach valuelists to VSORRESU and VSSTRESU, but we would not have the metadata to tell us explicitely which item in those valuelists is associated with which VSTESTCD.
You can do it by convention, but not explicitely. Define 2.0 will allow you to do that.

I do not think there are any references or guidances besides the CRT-DDS spec and the Metadata Submission Guideline.

a Anthony
on March 17, 2012

Further, what about when I have both core and local labs? I will have a consistent LBORRESU for each LBCAT, LBTESTCD pair. Therefore, I will have no problem expressing that in the define.

Not so true for local labs for the same LBCAT, LBTESTCD pair. Yes, that's the reason to derive LBSTRESC, LBSTRESN, and LBSTRESU. But, there are no codelists for LBORRESU, because they are not meant to be controlled.

s Sergiy
on March 19, 2012

Good point!

Do you think that the check CT0050 "Value for --ORRESU not found in UNIT controlled terminology codelist" should be removed?

m Mike
on March 19, 2012

All,

This would be part of UNIT CT, which is extensible in any case. Having it present should not harm the validity of the data, just an explanation justifying its existence within it.

 

s Sergiy
on March 19, 2012

Anthony's point is that there is no much scientific sense to provide codelits for non-standardized data. Why do you need to bother about something like this? The codelist for all used local lab units will not add any value, but creates a data noise and reduce overall quality of define.xml content.

d Daniel
on March 22, 2012

I believe i should be removed.  Others have stated it here already - for local labs, there's a lot of noise.

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.