Error and warning in validating define.xml with pinnacle 2.2

March 16, 2017

Hi,

When validating my define.xml with CT version 2016-03-25 I get the follow error and warning I do not understand:

2 CodedValue, Code  ng/mL, C67306 DD0028  Term/NCI Code mismatch in Codelist 'PK Units of Measure' Terminology Warning

2 NCICode      C85494 DD0033  Unknown NCI Code value for Codelist 'PK Units of Measure' Terminology Error

 

<CodeList OID="CL.PKUNIT" Name="PK Units of Measure" DataType="text">
        <CodeListItem CodedValue="ng/mL">
          <Decode>
            <TranslatedText xml:lang="en">Nanogram per Milliliter </TranslatedText>
          </Decode>
          <Alias Name="C67306" Context="nci:ExtCodeID" />
        </CodeListItem>
        <Alias Name="C85494" Context="nci:ExtCodeID" />
</CodeList>


PKUNIT has codelist C85494, with C67306 value for ng/mL.

What is the problem with my define code? It looks good to me.

 

Thanks,

Anja

 

 

Forums

Hi Anja,

There is a new RESTful web service for doing such tests based on CDISC SHARE content. RESTful webservices are meant for machine-machine communication, but can also be used in the browser. These web services are described at: http://xml4pharmaserver.com/WebServices/index.html
In your case, when you submit one of the following:
http://xml4pharmaserver.com:8080/CDISCCTService/rest/CodedValuesFromCodeListNCICode/C85494
or http://xml4pharmaserver.com:8080/CDISCCTService/rest/CodedValuesFromCodeListNCICode/C85494/{codelistversion}   where {codelistversion} is the version (a date)
or http://www.xml4pharmaserver.com:8080/CDISCCTService/rest/CodedValuesFromCodeListName/PKUNIT
or http://www.xml4pharmaserver.com:8080/CDISCCTService/rest/CodedValuesFromCodeListName/PKUNIT/{codelistversion}

you should get a complete list (as XML or JSON) of all the allowed values with their NCI codes for the given codelist. You (or your application) can then check whether the coded values is allowed in the given codelist.

You can of course also use this in your own computer applications (that is what RESTful web services are essentially meant for).

Hi Anja, 

I assume that your Codelist is assigned to PCSTRESU or PCORESU variables. If this is true, then your validation message is our fault and may be considered as a bug.

In P21 specs (PKUNIT) CT is assigned only to PP domain, while PC domain uses (UNIT) CT. NCI Code C67306 represents different terms in those two CT Codelist

  • (PKUNIT): “ng/mL
  • (UNIT): “ug/L

P21 performs validation of CT based on “expected” standard terminology assigned to particular standard variable, rather than be driven by metadata in define.xml file. For example, if you refer to (RACE) CT for DTHFL variable, we ignore this and use (Y) CT  (“Y” only)  for validation.

Historically, original (PKUNIT) CT was designed only for PP domain and did not work for PC domain. Therefore, after multiple complains from our users we started utilizing (UNIT) CT for PC domain. However, recently those gaps in (PKUNIT) CT have been fixed. So, we should use this CT for units in PC domain.

There are 2 basic options for you to handle this case:

  • Explain this validation message as a P21 bug in Reviewer’s Guide
  • Use (UNIT) CT for PC domain. Be prepare for some changes soon

Sorry for inconvenience.

Regards,

Sergiy

Hi Sergiy,

 

PCSTRESU and PCORESU have units “ng/mL” in raw data. (UNIT) CT does not have NCI Code for “ng/mL”. So assigning (UNIT) CT, do we have to leave NCI Code blank because it does not have NCI Code for “ng/mL”?

Also, the second last paragraph is bit unclear to me. Does it mean that we could use (PKUNIT) CT for PC domain?

 

Thank you.

Regards,
Wick

Hi Wick,

According to recent SDTM IG versions PCORRESU and PCSTRESU should follow CDISC CT (UNIT) Codelist. While PPORRESU and PPSTRESU should be populated using (PKUNIT) Codelist.

"ug/L" is a standard term for units in PC domain and should be utilized for PCORRESU and PCSTRESU instead of "ng/mL" during conversion of raw data into SDTM format.

Kind Regards,

Sergiy 

Sergiy,

 

Do we have to put related CDISC Synonym in Controlled Terminology into Decode Value section under Codelists?

For instance, "ng/mL" in this case might be Decode Value for "ug/L".

Won't PCORESU is supposed to contain the original units coming from raw data And we describe standard terms in PCSTRESU?

 

Thanks,

Wick

This is the kind of discussion that makes me so sad ...

If your data was collected in ng/mL, I would also put it like that in --ORRESU: original is original! And by the way, according to my lists "ng/mL" IS a valid unit from the CDISC codelist (C67306). And please take into acccount that the UNIT codelist is extensible so that if your original unit (as collected) is not in the list, just add it as an extended value (def:ExtendedValue="Yes" in  define.xml). So if I have collected something in units of "moles of cows", then I would put it in --ORRESU just like that, and add a line to the UNIT codelist like:
<EnumeratedItem CodedValue="moles of cows" def:ExtendedValue="Yes"/>

What I mean is, I personally would NEVER start doing conversions on --ORRES just for the sake of complying to the UNIT codelist. If you do such conversions, you are preprogramming errors, and traceability to the CRF is completely lost. Some of the SDTM team members will however disagree with me. For --STRES/--STRESU, of course try to use one of the units that is in the list, but if that is not possible, again, extend the codelist in the define.xml.

And don't let disturb you if extending the UNIT codelist provides a warning in the validation software. In such a case, just document it in the RG as a false positive (bug). @wick_sran: if you want to use "decode" for adding a synonym, you can indeed so, there is no rule for that at all. Remember however that "Decode" was never meant for that: you can only have one "Decode" under "CodeListItem", but a coded unit may have different synonyms.

Again: don't start manipulating collected data just for the sake of compliance to software that produces false positives. AND: the define.xml is YOUR truth about the submission, not that of anyone else.

It is really time that CDISC moves to UCUM (unitsofmeasure.org), then all these discussions would finally come to an end ...

Please note the guidance in SDTM-IG 3.2, LB section, Assumption 7:

The variable, LBORRESU uses the UNIT codelist. This means that sponsors should be submitting a term from the column, “CDISC Submission Value,” in the published Controlled Terminology List that is maintained for CDISC by NCI EVS. When sponsors have units that are not in this column, they should first check to see if their unit is a synonym of an existing unit and submit their lab values using that unit. If this is not the case, then a New-Term Request From should be submitted.

This is a critical consideration for all UNIT terms, regardless of whether original or standardized, and regardless of domain, and is really just a specifically called-out case for unit that is no different from any other collected data, original or otherwise. I think of it this way: when a variable is subject to CT and synonym exists in terminology, the general expectation is that the synonym is to be used.  This is true for ALL data.  

So for absolute numeric results, whether a laboratory test, weight measurement, etc., the original result value is to be represented as it is, since there is no CT for it.

But this is clearly not true for other data.  Consider a data capture system that collect gender as 1=Male, 2=Female. Or MALE/FEMALE, or Male/Female, etc.  You are expected to map any such "originally collected" terms to the established synonyms of "M" and "F" for representation in SDTM.  Or consider questionnaire data, where QSORRES captures the question text, and the standardized variables (QSSTRESC/QSSTRESN) contain the numeric value.

While units can pose some challenges in terms of identifying the appropriate synonym, or determining if there isn't one, the values are no different than any other "originally" collected values subject to controlled terminology.

 

Regards,

Carlo

Hi Carlo,

Yes, I do know this guidance in the SDTM-IG 3.2, LB section, Assumption 7, but I consider it a stupid rule ...
"ORRESU" means "original units", and if I need to take a "synonym", it means that "original" is "not original" at all anymore. As a reviewer, I would expect that what I see in ORRESU is exactly the same as on the CRF. The rule thus breaks traceability to the CRF.

An argument for this rule has always been that the reviewer would like (according to CDISC) that all the data for a specific test have the same units (or a minimum of different ones), but that is why we have --STRESU for.

The rule also leads to a lot of problems. For example, in Belgium, blood pressure is measured in centimeters mercury column. As the latter is not in CDISC-CT, people need to start doing unit conversions (in this case multiplication by 10), which is error prone, and again breaks traceability to the CRF. Also, electronic health records worldwide use mm[Hg] which is the UCUM notation (www.unitsofmeasure.org) which is THE worldwide standard for units. In such a case, "mm[Hg]" or "cm[Hg]" (Belgium and some other countries) must be replaced by or converted to mmHg, with all consequences and dangers, and loss of traceability.

CDISC (or better, the SDTM team) still refuses to allow UCUM notation for ORRESU and STRESU ("not invented here"). UCUM allows to automate conversions e.g. using RESTful web services (see e.g. http://www.xml4pharma.com/UCUM/UCUM_for_CDISC_tutorial.pdf).
I already see it happen that the FDA mandates UCUM (as they recently did for LOINC), as it already does for SPL, causing panic at CDISC and its users (as is currently happening for LOINC).

It is high time that CDISC starts understanding that there are other international standards that are considerably better for certain SDTM fields (such as LOINC and UCUM), allow these to be used (or even mandates their use on a longer term), instead of reinventing the wheel (the CDISC one being a square one ...).

While not disagreeing broadly on the benefit of moving toward more robust solutions that may be out there, several points you make are incorrect and potentially, dangerously misleading when applied to the concept of standards.

  • Regardless of whether a given requirement is sensible or not, advising non-conformance or masking the real issue is counterproductive. Implying that organizations can pick and choose which part of a standard they need to adhere to based on personal preference undercuts the whole point of standards in the first place.
  • Simply saying "original means original" without any supporting reference is in no way compelling.  It is clear from any number of concepts and representations within the SDTM standard that "original" does NOT prohibit the mapping of colllected values to established terminology.
  • It is incorrect to claim that mapping values to synonym terms breaks traceability. There are a number of mechanisms that can be used (e.g., SUPPQUAL, Define-XML, SDRG) to show the mapping details.
  • If you feel CDISC is misrepresenting FDA preferences, you have mechanisms available to ask for confirmation directly. In the absence of such efforts you are only casting doubt to try to bolster your opinions.
  • Your whole paragraph regarding blood pressure units is misleading in the extreme. You either do not understand a basic SDTM expectation (the details of which I called out in my previous comment) or you are intentionally misstating the issue. No one should convert the original result value to align to some unit, nor should anyone standardize original units to a single term. Doing EITHER is a violation of SDTM expectations. Using synonyms to adhere to terminology is a mapping exercise, not a conversion one.
  • While considering issues like UCUM is a great topic of how SDTM could or should evolve, it is completely irrelevant to anyone asking "what do I do now?"

Polemic just to "stir the pot" does none of us any good.

Dear Carlo,

As you can understand, I cannot let this stand as is ...
a) I mostly agree with the first statement. The case with "UNIT" is however special as the codelist is extensible, so how do I deal with "mm[Hg]" from my EHR? Do I consider it as a "synonym"? My computer system doesn't know as it is not in the synonyms list published by CDISC. Or do I extend the codelist as I am allowed to and describe that in the define.xml? Or do I make a "new term" request (as stated by the SDTM-IG) for it, and wait many months for a decision (I did and it was turned down).
b) I cannot agree on the second. I understand "original" as "as collected originally". Anything else does not make sense to me. Otherwise I propose that we rename --ORRESU in --MORESU (mapped from original result unit ;-) ). The comparison with "1=male" is not applicable, as the variable (DM.SEX) does not claim to be "original".
c) Putting the really original unit in a SUPPQUAL does not make sense. Suppqual sucks (personal opinion) and SDRG is not machine-readable. I wouldn't know how I would add that information in define.xml except as a comment (not interpretable by machines).
d) FDA preferences. This statement came from the head of the SDTM team (I have it on my e-mail). I also always ask myself "which FDA?" It often depends on who you are talking to at the FDA (personal experience).
e) The example about blood pressure is real. If it was collected in cm[Hg] what should I do? Wait until the CT team accepts it as a new term? And if it was rejected (as is happening)? 6 months lost and no solution? Or extend the codelist and still get a validation error or warning in the validation software used by the FDA (panic in the RA department). I do check submissions of customers and what I usually see is that they then do a conversion.
I would love to see some clear and exact statements from the SDTM team how to deal with such situations.
f) I agree
"Stir the pot": there is no intention from my side to do so. But if no one keeps hammering that we need to make progress, we will not make progress. We are discussing LOINC and UCUM for more than 8 years now (I have presentations from 2009 on the topic) and made zero progress. FDA was "seeing the LOINC light" before the SDO CDISC did it (I am so ashamed about this). CDISC keeps refusing to seriously discuss allowing UCUM (I again can prove that from mails I received). Questioning existing things is the first step in making progress, otherwise we would still live in caves and go hunt for food.

I think that our major differences we have can be reduced to that I would like to see (considerable) higher efficiency and accuracy (exactness) and inherent traceability by using modern computer methods and standards whereas (many of) the SDTM team want to keep the direction that was chosen almost 20 years ago (tables, PDF documents). If I compare what is happening in healthcare (CDA and FHIR) we at CDISC are really 20 years behind.

But, as this discussion is more and more running away from the original thread, I propose that we move the discussion to e.g. LinkedIn. Maybe on the "SDTM Experts" or on the "CDISC" group?

With best regards,

Jozef

@Jozef,

I agree other forums may be appropriate, but the co-oping of this discussion to other issues began and continues with your posts, posts that cloud the issues of today with arguments about tomorrow.

Regarding:

b) I cannot agree on the second. I understand "original" as "as collected originally". Anything else does not make sense to me. Otherwise I propose that we rename --ORRESU in --MORESU (mapped from original result unit ;-) ). The comparison with "1=male" is not applicable, as the variable (DM.SEX) does not claim to be "original".

While it seems reasonable to interpret "original" this way, it is DIRECTLY contradicted by the IG.  Beyond the explicit language around LBORRESU, see SDTM IG-3.2, Section 4.1.5.1.1 Original and Standardized Results (bolding my own):

When the original measurement or finding is a selection from a defined codelist, in general, the --ORRES and --STRESC variables contain results in decoded format, that is, the textual interpretation of whichever code was selected from the codelist. In some cases where the code values in the codelist are statistically meaningful standardized values or scores, which are defined by sponsors or by valid methodologies such as SF36 questionnaires, the --ORRES variables will contain the decoded format, whereas, the --STRESC variables as well as the --STRESN variables will contain the standardized values or scores.

You may feel that the above was the incorrect approach, or you mare argue for a different interpretation, but the bottom line is that CDISC materials must be the basis for defining the standards, period.  

The standard identifies a codelist associated with the original unit variable, and so you are required to process your data as subject to a codelist. I fundamentally disagree with the idea that this is not established in the standard.  Perhaps it could be more clear, or better justified, but these are separate considerations.

That is not to say there are no gray areas and challenges.  I completely agree that often the standards are ambiguous or inconsistent, and reasonable people can disagree or a best approach in a bad situation, but that is not the case with some of the issues you raise, this one specifically in my opinion.

Regarding:

e) The example about blood pressure is real. If it was collected in cm[Hg] what should I do? Wait until the CT team accepts it as a new term? And if it was rejected (as is happening)? 6 months lost and no solution? Or extend the codelist and still get a validation error or warning in the validation software used by the FDA (panic in the RA department). I do check submissions of customers and what I usually see is that they then do a conversion.
I would love to see some clear and exact statements from the SDTM team how to deal with such situations.  

I disagree that this is all that controversial, and am unsure to what extent the industry as a whole find this challenging.  To my mind, you should:

  1. Identify which version of CDISC terminology is to be used.  More (or most) recent is encouraged, but is not required, and there may be business/operational needs that drive the use of an earlier version.  There is no prohibition either way.
  2. Ensure the term (unit in this case) is not a synonym of an existing term.  
  3. Assuming it is not, represent the term in the data (original unit variable).  There is an open question as to whether it is best to leave it as is "cm[Hg]", or align it to the published CDISC unit term and symbol conventions, so "cmHg", but ultimately, there are no strict conformance considerations in that question, and it is often decided by operational concerns.
  4. While encouraged, it is at a company's discretion as to whether to submit to CDISC terminology team, and doesn't truly impact the process here.  Even if CDISC were to add the term to a future version, you may not decide to upversion, in which case the term is still an extended one.
  5. Whether the codelist is extensible or non-extensible is important, but as you often say, it is also important to represent the truth of your data. If faced with having collected data in a way that violates a non-extensible codelist, it is often best to explain that violation rather than modify the collection concept of the data to adhere to the constraint.  In short, it is already too late, and you need to take responsibility for the way in which the data were collected.
  6. Document any validation findings as appropriate.  It should be understood that messages along the lines of "you have extended an extensible codelist" are largely informational, and should be documented that way in the SDRG.  For example, "The UNIT codelist was extended to accommodate collected terms not represented in the established terms."
  7. If the RA department panics over the presence of check violations, provide them a variety of industry references to show that not all violations are the same, and that it is impossible under the current ruleset to have a completely "clean" report.  That said, it is their call as the sponsor to determine how to proceed.

There are edge conditions, of course, but at a certain point, this needs to be about generally robust processes, and edge conditions can be resolved on a case by case basis.

 

Regards,

Carlo