Creating Define.xml 2.0 with OpenCDISC – Webinar Q&A

  1. Does the annotated blank CRF page have to be in a format or tagged in order to populate the Pages when Origin=CRF?
    To pull in annotations, the CRF should use PDF text annotations. A less optimal solution is form fields. Please use annotations for optional scanning of CRF page numbers.
  2. How is the dataset tab populated with the imported xpt datasets? Where it gets the structure, key variables populated?
    Structure, key variables are populated based on the standard (in the example it was SDTM-IG 3.1.2). These pre-populated metadata are considered as a template and a reference to Standard only. Actual information for dataset structure and key variables varies across studies and is expected to be populated based on study data specifications, rather than from the CDISC standard. However this is a good example when custom Sponsor specific standards can be applied to overwrite metadata from CDISC examples.
  3. Is the Dataset Description populated off of the dataset label in the xpt file? If so, then when you add a dataset Description for the define.xml it would still be missing in the xpt file?
    The dataset description is populated from the XPT file, that is why it was blank in the example shown during the presentation. It represents bad data.  If you fixed the XPT file and re-uploaded it, it would be correct. An easier solution is to enter the data directly into Define.xml Designer. Data fix is still expected. The later validation of data vs. define.xml file will catch any inconsistency.
  4. Many people use the data type column to populate the attributes of the sas variables. The sas attributes are num/char, as is in the cdisc sdtm definitions. Using text/integer, etc., in the define.xml causes inconsistency between the sdtm specs and the define.xml. What is the rational for using integer/text, etc.?
    We populate the datatypes that are required by the Define.xml 2.0 spec.  These are differently defined than the sas attributes.
  5. How did you import the dictionaries and the documents? Did you copy & paste?
    Yes, the copying/pasting of dictionaries and documents was simulating the copying and pasting from excel into OpenCDISC Define.xml Designer.
  6. “I’m trying to understand the value of uploading codelist not found in the data feature. We have always heard that the define.xml should be data driven. Has it been practice to add additional codelists that is not available in the data? Is this to help expand picklists found on our CRF pages?”
    Section 4.1.3.3 in sdtm ig 3.2  says that controlled terminology or a link to the controlled terminology should be included in the define.xml wherever applicable. All values in the permissible value set for the study should be included, whether they are represented in the submitted data or not. Note that a null value should not be included in the permissible value set.
  7. Why were the pages populated only for some variables and not all?
    Some of the variables may not be actually collected on CRF, but rather be defined by Protocol (e.g., STUDYID), Derived (e.g., –BLFL), Assigned (e.g., AEDECOD), received from external sources (e.g., LBORRES), etc. OpenCDISC Define.xml Designer populates CRF pages only for variables which are actually present as annotations on CRF.
  8. How will OpenCDISC Enterprise work if my study has a study specific dataset?
    Generally a custom domain is handled the same way as a standard domain, we just cannot populate some fields in the Domains tab.  If your organization creates a standard which defines this, it would then be populated for you.
  9. What is a “good” Score? Can something less than 100% still be submittted without risk being queried by agencies?
    We find no study is 100%. Some checks are of higher importance than others.  OpenCDISC Enterprise allows you to measure your submission’s score in advance, and our service professionals at Pinnacle 21 have experience on what is generally considered good or normal vs. below expectations.
  10. I see that the community tool is not considered GxP and 21 CFR Part 11 compliant. How does this affect it’s acceptibility as a validation\define.xml tool to the FDA?
    FDA recommends using OpenCDISC Validator, per their website.  This does not change the fact that sponsors are responsible for following 21 CFR Part 11. Pinnacle 21 does provide support packages for OpenCDISC Validator, or you can validate it yourself if using the community version.
  11. Does the FDA accept the define.xml file generated by the OpenCDISC Validator?
    Interesting question, but I need to re-base the question. It is the deliverable that matters.  The define.xml validation step is what is key to pass the quality checks.  How you generate the Define.xml is not as important. We consider the previous version of OpenCDISC Validator Generate Define.xml far inferior to our new approach toward generating Define.xml.
  12. When you imported CRF page numbers is this due to specific PDF requirements? I.e. searching for strings of annotations? Also, if this is dependant on specific annotation requirements, wouldn’t it make sense to also pull in CT = all given terms in CRF for e.g. AESER.
    It would be great to pull in the terms from codelists, however we find these generally are not annotated and we have not seen this to be feasible (yet). Another challenge is that, in many cases, values collected on CRF are not the same terms presented in SDTM data (e.g., difference in Char Case, additional mapping, etc.) In general, an import of used Control Terminology from study specifications is the way to go.
  13. Can you configure the enterprise version to add additional operational columns to the define repository data model?
    Not at this time.  We have discussed this in the past and we were curious if people would be interested in this.  We were thinking it might be nice to include your SDTM specification metadata with this metadata, but we would like to hear your other ideas. Meanwhile, a support for additional operational metadata is limited to 2 extra column per table.
  14. It is not very clear how to put the comments in the variable info metadata.
    One can copy/paste your standard comments from excel specs, and then re-use them.  Or one can type them in directly into the Comments tab, and then apply them to Datasets, Variables, and Value Level item comments.
  15. General question: When is the right time to generate define.xml — when the study is ongoing (to meet tight timeline), or at the end of study?
    It depends on your organization process (partially the size and scale).  E.g., for a smaller company, it may be easier to develop it after the fact.  But if you have a larger organzation and run many trials, we see benefit at that scale to plan Define.xml out at the onset.  It can provide efficiencies with your partners, and even enable the exchange of data between your partners and you, to establish a handshake of what is expected by each party.
  16. Let’s say we are creating Define.xml for an ongoing study. How should we refresh CT and VLM through OpenCDISC at the end of the study?
    There are 3 basic functionalities:
    A) The tool can scan new XPT files,
    B) or import your external specifications for the study CT.
    C) Manual review of study metadata is still expected.
    This is a complex question with many aspects that may take a conversation with your organization.
  17. How do you handle codelist for lab tests if the lab is central lab?
    If the data is present in XPT files, we will bring in these codelists. Otherwise, you can import the available items (if you can get them from your CRO, central lab, or EDC vendor) via copy/paste or CSV import.
  18. Is this a web application?
    OpenCDISC Enterprise is a Web-based, hosted, software as a service (SaaS) application. OpenCDISC Community is a desktop application.
  19. What if aCRF needs an update ?
    Fix it and re-import it if needed! Based on our experience, in most studies some fixes in aCRF are expected. There are always minor issues like typo or missing annotation. When working on define.xml you actually do an additional QC of annotations in CRFs. This QC process is “semi-automated” and very efficient.
  20. The Study Metadata score was 0, then you ran the validator and it became 100. Did I miss some step prior to the validator step?
    The step conducted was that I had a Define.xml now available, and I re-ran the validation which had previously failed (prior to my conducting the demo)
  21. Is there a summary page where all the issues are summarized rather than jumping from tab-to-tab?
    We are planning to produce a report of “what is invalid” upon Define.xml generation, so that you have a consolidated list of warnings and errors in one report. 
  22. Do you have a database interface to the define repository, so e.g. SAS programs can query the define repository directly?
    We could do this for a client — it would not be difficult — but we have not done so yet. No one has yet asked for this.
  23. Are there any special considerations/issues/features for ADaM?
    ADaM is similar to SDTM when importing Define.xml.  There are differences such as domain classes, ADaM having no value level metadata, and other differences.  These are addressed by the metadata standards for each. Creation of define.xml for SDTM may be highly automated by scanning XTP files, aCRFs and by usage of Standard metadata (both SDTM and CT). ADaM data usually do not have an Origin as CRF, but rather refer to SDTM or ADSL variables as Predecessors. ADaM Standard is not so pre-defined as SDTM. Therefore you need to use your programming specifications rather than rely on Standard metadata. This includes both Data Structure and Control Terminology. ADaM data has much higher utilization of Value Level metadata. Also, ADaM Value Level metadata are study and analysis specific. Having good programming specifications is a very important. Our tool helps you to import your existing metadata for ADaM.
  24. Is there support for ADaM IG versions in OpenCDISC Enterprise?
    ADaM v1.0 is the latest version of ADaM, and we have that standard in the system.
  25. How do you reference the Algorithm document from within variable comments?
    Algorithm documents are assigned to the study in the study properties screen.  Then documents can be referenced by methods.  Methods are then assigned to variables or value level metadata.
  26. Do we actually want codelist distinct items to be listed for QNAM for WhoDRUG terms or for MedDRA LLT, HLT, etc? The tool seems to have created the codelists.
    You are correct. MedDRA or WHODrug coding in SUPPQUAL domains should have a reference to External Dictionaries, rather an assigned Codelist. The tool prepares a draft document. You need to review automatically pre-populated metadata and correct them if needed. Also, there is an option to import your existing specs. However, manual review/ QC is still expected.
  27. No real time check for Codelist? It can be checked in accordance with the version of SDTM IG.
    Sorry, your question is not clear. Please contact OpenCDISC or Pinnacle 21 team.
  28. How about ADaM define.xml V2.0 first part of Analysis Results Metadata (efficacy tables)?
    Define.xml v2.0 standard cannot handle Analysis Results Metadata from ADaM v2.1. This is a limitation of current version of define.xml standard.
  29. Must the dataset ‘Description’ value match the dataset label attached to the dataset or are we free to make it longer than 40 chars, if needed? Must it match a value given in the CDISC standards, if defined?
    Yes, a dataset “Description” must be the same as a Label in SAS dataset (xpt), which is limited to 40 chars. It’s a good practice to use standard SDTM domains for intended purpose. Therefore an inconsistency in dataset Description/Label between Sponsor data and CDISC standards is not expected.
  30. Could origin be multiple? For example, the value of LBORRES comes CRF and external data. The origin should be “eDT, CRF”. Does Define-XML v2.0 or OpenCDISC support that?
    When Origin for Variable is multiple you need to use Value Level metadata, which allow you to delineate source of the variables.
  31. The MSG standard advises to annotate only unique CRF pages, and include them only. How will you retrieve the page numbers in these conditions?
    Only annotated pages will be pre-populated by CRF scanning.
  32. How do you populate attributes (dataset labels, variable labels, etc.) or ADaM datasets, since not all are pre-defined by CDISC?

    There are two basic approaches to create define.xml file:
    1. “Descriptive”, when you already have data.
    When you create define.xml from data, it will be “data driven” process. Everything already assigned in actual data will be populated in an define.xml draft.
    This is a basic concept: “define.xml describes actual data, not the standards”. If you have problem in your data, you need to fix data, but not define.xml.
    2. “Prescriptive”, when you create define.xml as specifications for future study data.
    You can use CDISC Standards, your company standards or previous studies to generate a draft for define.xml.
  33. If using standards built around SDTM, wouldn’t that help in consistently good define.xml data?
    Building EDC to SDTM around standards certainly allows for consistency and efficiency.  The process of generating Define.xml is a different process.  One must first decide whether to generate Define.xml presciptively (before the study), or postscriptively (after the study), and then build a process around that.
  34. Does the define.pdf have the same hyperlink as the define.xml?
    Define.pdf and Define.xml have basically the same hyperlinks.
  35. How do you maintain the consistency of the content of the SDTM dataset with define.xml?
    Define.xml is used during validation to ensure that the data collected conforms to it’s definition.
  36. Does the community version have the capability to create define.xml 2.0?
    OpenCDISC Community will create define.xml v2.0 only. It will also include a migration process from your define.xml v1.0 to v2.0.
  37. Minor question: I once had a tool to grab pages from aCRF. It worked for portrait, but couldn’t pick up from the landscape. Does your tool have that limitation?
    Import blank CRF function does not consider landscape vs. portrait.
  38. I assume if one page in blankcrf has a link to another annotated Page with in the blankcrf then the tool wouldn’t be able to populate the Origin with the Page number with the link. Please confirm. Thanks
    Import CRF function, at this point does not follow links. It grabs the page number where the annotation exists.
  39. How does your platform compare to the SAS/toolkit?
    The products are not in the same category.  
  40. When you mention versions — define.xml version 1.0 vs define.xml version 2.0. Are those references to industry/FDA specfication versions or are those your pinnacle tool versions?
    The references to Define.xml 1.0 and 2.0 were all referencing the Define.xml Standard specifications.
  41. Can the Reviewer Guide documents be named ‘sdrg.pdf’ and ‘adrg.pdf’, as suggested by PhUSE? Does it matter what the file is named as long as the link works?
    The reviewer guide name is flexible and not dictated by the system.
  42. What are the differences between validating the define.xml with OpenCDISC versus another tool such as Oxygen?
    The tools are really not in the same category so it is hard to compare. In short, Oxygen and similar XML editors validate XML structure only. OpenCDISC validates Define.xml structure and content, as well as it’s consistency with study datasets.
  43. Apart from OpenCDISC validation, is there any seperate validation needed? kind of manual checks?
    Sorry, your question is not clear. Could you contact OpenCDISC or Pinnacle 21 teams?
  44. How is variable length handled? Is it good to define in an Excel spec up-front ? Is there checking against actual data values later and updates made to the define automatically?
    Variable length is determined from the XPT file property when it is uploaded. For Value Level metadata Length is defined by actual maximum value length. E.g., SUPPAE.QVAL length may be 200, but SUPPAE.QVAL (where QNAM=”AETRTEM” (AE Treatment Emergent Flag)) length is “1”.
  45. Can short enough comments for “Derived” be moved to Comment instead of always requiring a hyperlink to Method section?
    No, according to define.xml v2.0 standard, all “Derived” variables must have a populated value for Method. Such approach ensure consistency in metadata.
  46. This is great, but, a lot of information to take in at once. Is there a user manual?
    There is a user manual available for OpenCDISC Enterprise clients. We are considering options for the OpenCDISC Community version.
  47. Is there a limitation to handle large data such as lab?
    We have loaded large production files.  The limit is at this point unknown, but certainly there are limits.  Generally, like we did with the Validator product, we address these on an as needed basis.
  48. Does the system include a check that the key variables must make all observations unique?
    We verify that all variables in the key sequence are variables for the domain, but we do NOT yet check that it makes unique records in the dataset. Such checks are in our implementation list. However it requires high quality define.xml files. Unfortunately based on our experience the industry is not ready now.
  49. What is the difference between define.xml and our specifications from raw to sdtm?
    While the define.xml and SDTM metadata specs share much metadata, there is a large difference between them. Recognizing many organizations have large specs often in Excel, defining EDC to SDTM conversions, we enabled Define.xml to be compatible with Excel in many ways (including cutting and pasting, so that you can import information from your excel specifications into Define.xml Designer). Specific examples you may be interested in are Datasets, Variables, Methods, Codelists, and Value Level. A typical example is that your “raw-to-sdtm” specifications provided derivation methods based on EDC variables, rather than SDTM.
  50. You can also remove codelist items, right? Just like in excel.
    Yes, any codelist item (or any other item) can be deleted in the application.
  51. What if the same information on CRF goes into mutliple vairalbes? Do we have to annotate all variables on that CRF page? EX: same date goes to RFSTDTC, EXSTDTC, DSSTDTC, SVSTDTC etc…
    Yes, you need to annotate the same CRF filed to different SDTM variables.
  52. What if profile picks up length of var as 52 by you actually made the var length as 60? Shouldn’t it include the 60?
    Yes, the variable length should be 60 in this case. The tool generates a variable length value based on length as defined in SAS XPORT file. It would be 60 in your example. For Value Level metadata Length is defined by actual maximum value length. E.g., SUPPAE.QVAL length may be 200, but SUPPAE.QVAL (where QNAM=”AETRTEM” (AE Treatment Emergent Flag)) length is “1”.

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.