The Do's and Don'ts of Define.xml

December 10, 2018

Define.xml is "arguably the most important part of the electronic dataset submission for regulatory review,” according to The FDA’s Technical Conformance Guide. It helps reviewers gain familiarity with study data, its origins and derivations, as well as sponsor-specific implementation of CDISC standards.

We recently hosted a webinar on the do's and don'ts. You can watch the recording or proceed to read the summary.

As the importance of define.xml during the review process increases, careful consideration should be applied to ensure all the information within the define.xml is clear and concise. CDISC standards are open to interpretation and sponsors often have their own internal standards that build upon CDISC theories and standards. The review team attempting to analyze the data package, who is not only unfamiliar with the study, but also unfamiliar with the sponsors' own internal standards, will need a define.xml that correctly and clearly describes all origins and derivations.

It can often be difficult to determine what content to include in a define.xml file. A file that is lacking or missing information will increase the amount of time it takes reviewers to familiarize themselves with the data package.

In common practice, the define.xml is created just prior to a submission, generally by someone who is very close to the data and derivations. In other cases, the define.xml is created while the study is still ongoing, where data and derivations often change as protocol amendments are made. Both cases require careful quality control and review processes to ensure common mistakes are avoided.

Below are Do's and Don'ts to serve as a checklist when creating and preparing your define.xml for submission.

General Do's and Don'ts


Do explain the data – the sole purpose of the define.xml is to explain the data, its origins and derivations, as well as sponsor-specific implementation of CDISC standards

Do keep it concise – take time to ensure only relevant information is provided in the define.xml


Don’t make your define.xml too complicated – remember that review team who are not familiar with your data or mappings will need to navigate your define.xml

Don’t assume the end users of your define.xml have CDISC knowledge - taking the time to ensure your define.xml is clear, concise, and consumable for review team is important

Don’t assume everyone reading your define.xml has programming knowledge – the define.xml is intended to be both machine and human readable and contains information that different members from review team might need to reference.

Derivations and Comments


Do concisely define all the derivations used – define.xml should provide derivations that are clear and concise to replicate with same results

Do describe all internal standards – to avoid confusion and ensure review team are given the necessary information to leverage Sponsor standards during the review process


Don’t cut corners and list a derived variable as Origin = Assigned – it is always best to provide the derivation

Don’t have any raw data references in your derivations or comments – the review team do not have access to your raw database

Don’t blindly copy out derivations and comments from the mapping spec into the define.xml – often there is coding language and raw data references



Do create codelists for each variable populated by a list of pre-defined terms – variables collected via drop down lists, or which have a pre-defined limited set of terms, should have an associated codelist within the define.xml

Do create codelists that describe the data collection process and include all planned terms for the variable – all CRF options should be included in the codelists, not just those present in the data


Don’t have one UNIT codelist for all unit variables/values across the data package – when a reviewer clicks on EXDOSU codelist, they want to see only the units in the EX domain, not units across EX, LB, VS, etc.

Don’t create codelists with all values from CDISC CT when many values are irrelevant to your data package - similarly, when a reviewer clicks on EXDOSU codelist, they do not want to see all 500+ units from CDISC CT.

Don’t complicate your define.xml with codelists meant for other data packages – having ADaM codelists in SDTM define.xml, and vice versa, will lead to complex and large define.xml files to navigate.

Think you’ve finished with your define.xml?


Do look at your finalized define.xml file using the stylesheet – incorporating this step into internal review process is sure to find some overlooked details

Do create a separate PDF file for large derivations that require formatting – the define.xml standard does not account for formatting (e.g., new line characters, numbered lists, bullet points, etc.)


Don’t just hit the “generate define.xml” button in your software tool – incorporating common-sense QC steps will go a long way. Spend time reviewing your define.xml through a web browser with the stylesheet applied

If you’re a Pinnacle 21 Enterprise user, then you already know about the many advantages and time-saving steps Define.xml Designer offers to help ensure your define.xml is ready for submission. If not, please contact our Customer Success team, and we will work with you on leveraging the tool and best practices.

If you’re not a Pinnacle 21 Enterprise user, please reach out for a demo and additional details on how the Define.xml Designer can help you get review ready.

Related Material:


Blog Main Page

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.