We're having issues with validation findings about variable label mismatches, and in a few of the cases I'm not sure what's wrong. I'm not sure if this is possible, but is there any way for the P21 report, when it says that "X is not the value we expect it to be" to also report what that expected value is? It's clear that for a particular variable there's a specific string P21 is expecting in the label, and if this expected result were part of the report then troubleshooting and fixing would be a lot more straightforward.
That said, we're particuarly having issues with the labels of study day variables that are not in the SDTM IG 3.2 but that the FDA expects to be included in datasets.
Our default assumption is that if xxDTC is labeled "Date/Time of ABCD", then xxDY should be "Study Day of ABCD", and that if xxSTDTC is labeled "Start Date/Time of ABCD" then xxSTDY is labeled "Study Day of Start of ABCD".
(1) One case where this fails is SE -- we have SESTDY labeled "Study Day of Start of Element" and SEENDY labeled "Study Day of End of Element", and these are triggering SD0063A, but it's unclear what label this rule is actually expecting for those fields.
(2) Another case where we hit issues is with start and end dates in MH. Here it's a bit more complicated -- the simple conversion from "Start Date/Time" to "Study Day of Start" would result in a label longer than the 40-character limit--"Study Day of Start of Medical History Event" is 43 characters long and "Study Day of End of Medical History Event" is 41 characters is long. We dropped "Event" to result in labels of "Study Day of Start of Medical History" and "Study Day of End of Medical History", but those are triggering SD0063A. I can see other ways to shorten the label -- "Study Day of Start of MH Event", "Study Day of Start of Med History Event", etc. -- but aside from an iterative process of trial and error I see no way to figure out which of the possible abbreviations is the one Pinnacle 21 is expecting.
Aside from study day variables, we're running into this problem in two other cases:
(3) We're getting a SD0063 warning on TI.IETESTCD with a label of "Inclusion/Exclusion Criterion Short Name"; that's the label used for IE.IETESTCD, and the label for TI.IETESTCD seems to be oddly truncated in the SDTM IG 3.2 ("Incl/Excl Criterion Short Name e"). This may be an SDTM IG issue, at least as far as the trailing "E".
(4) We're getting a SD0063A notice on TS.TSVAL1 with a label of "Parameter Value 1". We believe this is the correct label for adding TSVALn columns to the TS dataset, but it's not clear what Pinnacle 21 expects the label to be here.
Any suggestions for how to further troubleshoot these issues, and how to avoid them in the future?
Just doing a followup here -- the CDISC Wiki's errata page for SDTM IG 3.2 indicates that the correct label for TI.IETESTCD should be identical to the label for IE.IETESTCD:
So it would be good if the labels used for validation were updated to reflect this correction.
"is there any way for the P21 report, when it says that "X is not the value we expect it to be" to also report what that expected value is?"
Go to ...\components\config folder and open P21C xml validation specs with IE.
I couldn't find "P21C.xml", but I know that all the rules are described in files in the folder components/config.
If I e.g. take the file "SDTM 3.2.xml", and look for the rule SD0063, then I find at line 25835:
<val:Match ID="SD0063" PublisherID="FDAC033" Target="Metadata" oce:IgnoreContext="Yes" Message="SDTM/dataset variable label mismatch" Description="Variable Label in the dataset should match the variable label described in SDTM IG. When creating a new domain Variable Labels could be adjusted as appropriate to properly convey the meaning in the context of the data being submitted." Category="Metadata" Type="Warning" Variable="LABEL" Terms="%Variable.Label%" Delimiter="~" When="VARIABLE == '%Variables$Present$Core:Permissible|Expected|Required%'"/>
In other of these rule descriptions, I can see that in the "Message" part, one has "variables" (starting and ending with '%'). I haven't tried this yet, but you can try to insert (using an XML editor, or NotePad++) "what is expected" in the message itself by finding out what the variable name for that is. You can find examples of such things all over the place e.g. in line 25817 where the "message" part is:
Missing value for %Domain%VAMTU, when %Domain%VAMT is populated
The real problem is that for variables that are not in SDTM-IG, there is no uniquely defined rule what the label should be. I.m.o. labels should be allowed to deviate somewhat as to give the sponsor the possibility to better explain what the variable is about. See my blog: "SDTM labels: freedom or slavery", at: http://cdiscguru.blogspot.com/2015/12/sdtm-labels-freedom-or-slavery.html.
If I do understand it well, this "exact match" rule has been dropped in SDTM 1.5.
"I couldn't find "P21C.xml", but I know that all the rules are described in files in the folder components/config."
"SDTM 3.2.xml" file you used in your example is P21 xml spec I refer to.
Maybe you can also explain to Dave (user scocca) how he can edit line 25836 so that the the label that is expected is shown in the error/warning message.
You can also see this as an improvement request - many of my customers too struggle with this rule, costing them many hours (and thus $$$) of "trial and error" to get the labels right, especially in case of non-SDTM-IG variables.
OK--so I've figured out that by viewing the configuration file SDTM 3.2.xml in my browser, I can look at the tables of variables for each dataset that are in there and see what labels are being expected. And I can probably figure out how to update our configuration file so that it uses the labels I want. But as a CRO, we're going to have sponsors running validation on their own and so any solution that depends on hand-updating our configuration file is only a partial solution at best.
So, given this context, what I'd like to do is request some updates to some of the variable labels that currently appear in the validator so that when the STDTC/ENDTC label is not the default "Start of Observation", the STDY/ENDY label matches the STDTC/ENDTC label. Suggested changes:
SESTDY to "Study Day of Start of Element"
SEENDY to "Study Day of End of Element"
MHSTDY to "Study Day of Start of Med History Event" (*)
MHENDY to "Study Day of End of Med History Event" (*)
CESTDY to "Study Day of Start of Clinical Event"
CEENDY to "Study Day of End of Clinical Event"
HOSTDY to "Study Day of Start of Healthcare Enc" (*)
HOENDY to "Study Day of End of Healthcare Enc" (*)
LBSTDY to "Study Day of Start of Specimen Coll" (*) (note currently LBSTDY label does not match LBENDY label)
PCENDY to "Study Day of End of Specimen Collection" (note currently PCSTDTC and PCENDTC labels do not match)
Variable names with a (*) require some abbreviation to make the study day label match the associated date/time label and still fit within the 40-character limit.
But, that said, the label for TSVAL1 in the config file is "Parameter Value 1", which is exactly the label that we're seeing rejected. Any idea what might be going on with that test that's causing the false positive?
Extra spaces or other invisible characters?
1. Our approach is to use generic labels from SDTM Model for "model permissible" variables.
2. Your suggestion LBSTDY to "Study Day of Start of Specimen Coll" is questionable. It looks like an invalid use of LBSTDY variable instead of LBDY in Findings domain (see SDTM IG #188.8.131.52). This is an example, why SD0063A check still exists. Manual review of validation report is expected to confirm correct implementation.
In reply to . Our approach is to use generic labels from SDTM Model by Sergiy
The SDTM IG, section 3.1, says "The models also show how a standard variable from a general observation class should be adjusted to meet the specific content needs of a particular domain, including making the label more meaningful, specifying controlled terminology, and creating domain-specific notes and examples. Thus the domain models demonstrate not only how to apply the model for the most common domains, but also give insight on how to apply general model concepts to other domains not yet defined by CDISC."
To me, this reads as a pretty specific statement that the generic labels should NOT be used without modification when there is a clear way to make the label more meaningful. In particular, now that the Study Day variables are FDA expected, it seems obvious that when a DTC label has been customized, the same customization should apply to the matching DY variable.
For example, when SESTDY contains the study day derived from SESTDTC, it makes no sense for SESTDTC to be labeled "...Start of Element" and SESTDY to be labeled "...Start of Observation".
Regarding LBSTDY -- I have not actually used that variable, and I recognize that there are few cases where one should use LBSTDTC and LBSTDY. However, if someone does use LBSTDTC, then (a) the FDA expectation is that they will also use LBSTDY, and (b) it makes sense for the label for LBSTDY to have the same customization as the label for LBSTDTC.
The strange thing is that, until now, I have not been able to find a rule in the SDTM-IG that states that variables in submitted datasets should be exactly as in the SDTM-IG. Please correct me when I am wrong. In some cases in the past, labels in text parts did not even correspond 100% to those on the tables in the IG. So these "rules" have been invented by people who over-interprete the IG, driven my some "validation-mania".
Why do have variables labels? To explain the reviewer what the variable is about isn't it? So why fix them so strictly? If there must be a 1:1 relationship with the variable name, we should not submit them, as they can also easily be retrieved automatically e.g. by a web service. My opinion is that sponsors should be allowed to adapt the labels to better explain what the variable is about in their study. Also see my blog "SDTM labels: freedom or slavery?" at: http://cdiscguru.blogspot.com/2015/12/sdtm-labels-freedom-or-slavery.html.
Note SDTM-IG Section 3.2.2: Conformance
Conformance with the SDTMIG Domain Models is minimally indicated by:
- Following the complete metadata structure for data domains
- Following SDTMIG domain models wherever applicable
- Using SDTM-specified standard domain names and prefixes where applicable
- Using SDTM-specified standard variable names
- Using SDTM-specified variable labels for all standard domains
- Using SDTM-specified data types for all variables
- Following SDTM-specified controlled terminology and format guidelines for variables, when provided
- Including all collected and relevant derived data in one of the standard domains, special-purpose datasets, or general-observation-class structures
- Including all Required and Expected variables as columns in standard domains, and ensuring that all Required variables are populated
- Ensuring that each record in a dataset includes the appropriate Identifier and, Timing variables, as well as a Topic variable
- Conforming to all business rules described in the CDISC Notes column and general and domain-specific assumptions.
and section 2.6 Creating a New Domain, Item 3.g
Adjust the labels of the variables only as appropriate to properly convey the meaning in the context of the data being submitted in the newly created domain. Use title case for all labels (title case means to capitalize the first letter of every word except for articles, prepositions, and conjunctions).
Taken together, you should conform to labels specified for standard variables in standard domains (i.e., those variables specified in the IG or TAUG). Labels for variables NOT specified in the IG or TAUGs (i.e., model permissible variables from the model document, and variables in custom domains) may be adjusted.
This is complicated by errata in the IG and TAUGs (labels too long, etc.). Also note that some labels in the model document that are longer than 40 characters (i.e., --TESTCD label of "Short Name of Measurement, Test or Examination") are obviously intended to be adjusted appropriately.
So while a rule evaluating labels makes sense, there is truth to the charge that evaluating ALL labels is an over-interpretation of conformance to the standard.
See CDISC's published rules, Rule ID CG0303 in particular (Variable Label = IG Label).
The expectations around variable labels in terms of conformance is an ongoing discussion within CDISC, and particular attention is being given to clearing up errata in labels for SDTM-IG 3.3.
I indeed oversaw that in section 3.2.2 ...
But I still am of the opinion that sponsors should have some freedom to (slightly) change the labels from those variables that are mentioned in the IG, so as to better explain what the variable means in their specific study. I have had customers struggling with this because the "standard" label did not well represent their situtation, but they were not allowed to change even a single character in it due to this rule.
I observed that the rule you mention from 3.2.2 was not present anymore in the draft SDTM-IG 3.3, and I sincerely hope it is definitely dropped, and replaced by a sentence stating that sponsors are allowed to slightly change the labels.
Thanks again for correcting me!