Using Real World Data in Clinical Study Submissions

June 17, 2024

Changing technology in health sciences means more data sources, more study designs that weren't available in the past. The good news? All this additional data provides greater evidence and insights into the benefits and risks of a medical product.

This sharp increase in the use of non-randomized clinical trials study designs, Electronic Health Records (EHRs), Registry, and other observational studies has placed a spotlight on the need to consider alternatives to the CDISC standards developed 20+ years ago to support marketing applications.

In simple terms, Real World Evidence (RWD) is the clinical data we generate that provides evidence on the use, potential benefits, and risks of medical products. It is clinical data that is not generated from a protocol. So where does RWD come from? It flows from a variety of sources, from electronic health records, medical claims and billing data, product disease and population-based registries, mobile devices, digital health technology devices such as CGM, or even social media sources.

In 2019, 75% of approved NDA deals included a RWE study. By 2021, that number had expanded to 95%, and given trends in incorporating RWE within the regulatory environment, it is safe to predict that almost all marketing applications will continue to include RWE.

Current submission standards for Real World Data

In 2021, the FDA issued a draft guidance on Data Standards for Drug and Biological Products, which was finalized in December of 2023 and contained guidance for submissions containing RWD. Essentially, this guidance outlines the standards required when submitting RWD and other non-traditional data in support of a marketing application. This guidance requires data be submitted using the Standards documented in the FDA Data Standards Catalog.

This means for now, all data must be submitted using CDISC standards, so RWD must be transformed or converted to meet to meet the CDSIC requirement standards specified in the guidance.

The FDA acknowledges that the current catalog may not reflect data derived from real world studies and other non-traditional study designs and is considering updates to the current catalog. This presents numerous challenges to sponsors as CDISC studies were designed for our data.

Challenges of submitting Real World Data

Randomized Clinical Trials (RCTs) data is collected under strict supervision of a protocol; it is monitored, cleaned, and there is an expectation of uniformity for data collected across sites. The biggest challenge is RWD is not collected under a protocol, which provides a blueprint for how a research study is to be conducted. The protocol documents when each assessment is conducted, where each assessment is conducted, and how each assessment is conducted.

Typically, a protocol includes:

  • a data management plan that provides granular details for collecting data
  • a copy of the annotated CRF
  • a description of all the data elements and their attributes
  • added checks for each data element
  • a data monitoring and data cleaning plan

On the other hand, RWD data is collected in a real world setting and contains none of the above documentation. It is not collected to address a specific research question, rather, it's collected under routine health care, and that means it's not monitored and it's not clean.

Challenge: Data Source/Data Provenance

RCT data comes from sponsor-designed systems and is collected under the supervision of the sponsor or sponsor's agent. Sponsors have control over the entire process, from designing the data collection system to transforming the data into CDISC standards and eventually submitting it.

Typically, RWD is not collected by the sponsor; rather, it is acquired. It's collected in real world settings during routine health care by a variety of clinical staff. This data is often curated by vendors, and this presents a challenge in documenting how the how the data traveled from source to submission.

Challenge: Submitting patient-level data

Sponsors must be able to submit patient-level data. In a clinical study, this is not an issue because sponsors have control of that data from beginning to end. This may be an issue for RWD because the vendor may only have an aggregate - they might not be able to provide patient-level data.

Challenge: Data use

Many remember RCTs with data collected uniformly across sites, with standard CRF elements and by site personnel trained to conduct an assessment and enter data in a uniform manner. This data was entered into a single EDC or other data collection system controlled by the sponsor. With RWD, there's no expectation that the data is uniform or standardized across health care systems.

Challenge: Harmonization in health care terminology

In health care, different concepts may refer to the same thing, and many similar concepts have different meanings and different controlled terminology. Terminology from some RWD studies must be recoded to meet current required terminology and submission standards. This can make source data verification and traceability a challenge for regulatory reviewers.

Gaps in current submission standards

  • Traceability: It's not clear what traceability is going to be required by regulatory reviewers, but we know that some traceability back to the source is needed. This is a challenge for reviewers as these standards haven't been established.
  • Data elements and domains: Currently, CDSIC requires certain domains and data elements contained in those domains when submitting either RCT data or RWD. However, there’s a need for additional domains and concepts found in RWD that are not found in randomized clinical trials. Furthermore, many data elements currently in CDISC really aren’t relevant for RWD.
  • Exposure: Exposure is key data when it comes to a marketing application, but data for exposure exists in multiple places when using RWD. It is essential to determine what variables are needed and what are the core variables to derive exposure and to determine the quality and the certainty of each of these records.
  • Trial Summary dataset: This dataset is required for regulatory submission, so additional parameters relevant for Real-World Studies should be defined. There may be some overlap.
  • Core variables: CDISC defines core variables for RCT study designs, but has not really defined a set of core variables for RWD. It is a vital need that needs to be addressed sooner than later.
  • Terminology: RWD designs contain different terminology that needs to be harmonized for marketing applications.

Recommendations to facilitate regulatory review

We can learn a lot by looking at some previously submitted RWE data, especially from studies the FDA deemed inadequate for regulatory decision making. The following recommendations can be used to avoid common mistakes that lead to RWE studies being rejected or not considered and avoid delays in regulatory review of marketing applications containing RWD.

Early communication with both FDA and vendors

RWD is not collected under a protocol and not collected to address a specific research question, so it's important to communicate with the FDA and assure them that the RWD is fit for purpose, non-biased, appropriately addresses the research question at hand, and that the population selected in the RWD is appropriate to address the study question.

Is your RWD fit for use? Is your study design adequate? Is your rationale for choosing the data source that you did acceptable? Can your data source address the study questions at hand? And of course, is the data reviewable? Notify the FDA that your submission will include RWD and submit whatever study plans or statistical analysis plans to get as much feedback as possible before starting a study.

In addition to communicating early with the FDA, it's also a good idea to communicate early – and often - with vendors.

Determine the study start date in Real World Data

To pass FDA technical rejection criteria, you must submit a valid TS domain with the study start date. This is straightforward for a randomized clinical trial, as it is typically the date the first patient signed informed consent or first patient visit.

With real world data, this can be more complex. You may be using historical data collected five to ten years ago, so what should the study start date be for a retrospective RWD study?

We recommend using an "administrative start date" for RWD studies, which should be determined by the sponsor. This will typically be when the inclusion/exclusion criteria are finalized or the date that the RWD study protocol was finalized.

Provide data confidence to reviewers

Remember, RWD is collected in real world settings. It is not monitored. It is not always clean.

Give regulatory reviewers confidence in the quality of submitted data by providing multiple sources for the same data. For RCTs, the protocol specifies in great detail who, how, and where assessments are performed. Furthermore, our data is cleaned and monitored as documented in the protocol.

However, RWD is not collected under a protocol, and it's not monitored. As a result, regulatory reviewers need more information about the context of an assessment, e.g. how it's performed, who it's performed by, where it's performed, etc. This will help increase their confidence in the data.

Clean up differences in terminology

Clinical research and health care terminology can have different names, different meanings for the same names, and different control terminology for each. One major issue is that current submission standards do not contain standard data elements to represent both the source and submission terminology or coding systems.

This makes source data verification and traceability a big challenge for regulatory reviewers. We recommend creating new supplemental variables in SDTM to capture the source coding system, the source verbatim term and the source code. Second, explain the process to convert the source terminology to submission compliant terminology.

Devise a Data Management Plan

A data management plan can provide regulatory reviewers with documentation on your approach for transforming source data to CDISC standards. You can describe your approach in a protocol and in a final study report. Documentation should include a data dictionary that documents the definition of every data element used.

The reviewer's guide is a suitable place to describe your overall approach, and the define file is where you should provide the technical details of how you mapped the source data to the submission format. It is worth repeating that the bar for traceability for real world data is the same as for randomized clinical data. This is something to keep in mind as you are preparing a submission containing real world data.

In addition to creating a data management plan for real world data, create an annotated CRF for the RWD you're going to submit; it will help reviewers understand the meaning of a given concept and show how various data elements are related to one another, and can be used to show the original and possible values of a data element.

Long term considerations and solutions

Current submission standards are inadequate for representing RWD because CDISC standards were developed for RCTs or interventional studies, and our current standards are built on other technologies such as SAS transport. But we've yet to identify what core variables are needed for RWD submission. Many of the business and validation rules for RCTs do not apply to RWD, and that means it must be repackaged and transformed to meet regulatory and submission requirements. And RWD can be more complex than randomized clinical data.

As an industry, we need better data linking solutions, as our current methods are both cumbersome and outdated. We should reexamine the current submission standards and ask some new questions like, "could CDISC standards adequately represent randomized clinical trials AND RWD AND other observational study designs?”

Submitting RWD presents several challenges for sponsors and the FDA due to gaps in current standards for RWD submissions. However, the bar for assessing RWD is the same as for data from randomized clinical study designs.

Better documentation on data, provenance, and source data elements is essential to facilitate regulatory review. Complex transformations place an unnecessary burden on sponsors and are likely to produce numerous errors. As an industry, we should take a step back and think. "Are we asking the right questions to solve these challenges?"

Your questions, answered

Q: Do you have any recommendations for checking standards compliance with validation tools such as P21 - it seems all the challenges described could mean that the output from a compliance tool is unmanageable (e.g. volume of data issues, terminology issues, etc)?

A: At this point in time, business rules have not been developed for RWD and most non-interventional designs. Once we develop "profiles" for submitting RWD, we can build business rules for RWD submissions and check for compliance with these rules in P21.

Q: Because RWD data are not fully compliant with CDISC rules, what would your advice be to make more efficient/less time consuming our P21 checks on RWD data?

A: Many of these non-compliance issues are similar across studies and marketing applications. Develop a list or dictionary of non-compliance issues and explanations that can then be re-used to explain issues in future studies. Also, depending on the data type and source, P21E's Data Exchange module may help get non-CRF data, especially that of external vendors, into a compliant format much faster.

Q: Is RWD data used so far in phase 1 and phase 3 studies in New Drugs or so far used in post marketing studies? Any stats on these?

A: Yes, over 90% of approved BLAs/NDAs included a RWE study in 2021.

Q: Any thoughts on how to deal with the challenge of FDA requesting patient-level data when data cannot be shared due to e.g. GDPR, privacy laws?

A: We recommend discussing this with FDA and agree on what will be submitted before starting the study while you share study plans and the SAP with FDA.

Q: Shall we submit RWE study data as a regular format?

A: You should submit RWD using CDISC standards and add as many non-standard variables and domains as needed. Non-standard variables/domains should be documented thoroughly in both the define file and reviewer's guides.

Q: Does FDA require both raw and analytic datasets when submitted RWD datasets?

A: Yes. The requirement is the same as for Randomized Clinical Trials: The raw data should be submitted using the CDISC SDTM standard and the analysis datasets should be submitted using the CDISC ADaM standard.

Q: What types of RWE in the regulatory submission would not require submission of patient-level data? (e.g., RWE that provide contextual information about unmet need or similar but isn't the evidence of efficacy/safety)

A: One example would be the raw data coming from a Digital Health Technology (DHT) device. The raw source data is often too large to submit. However, the raw data can be requested or audited by FDA at a later time.

Q: Are you aware of exceptions made by FDA to CDISC format since the final guidance was published in December 2023?

A: I am not aware of waivers granted since the final guidance was published in December 2023. If you would like to submit published natural history studies as supporting evidence in a format other than CDISC, I recommend discussing this with FDA as early as possible. Try to agree on the format for this data before starting the study while sharing study plans and the SAP with FDA.

Q: Do you have any information about BIMO requirements for RWD and site inspections to original medical record during an inspection?

A: This has not been documented yet in the series of RWE guidances published by FDA.

Q: Is Certara planning to provide validation tools for FHIR or OMAP standards, etc.?

A: Certara will continue to provide validation tools for all study data standards documented in the FDA Data Standards catalog. So, if FHIR or OMOP is added to the catalog, Certara tools will support validation for study data using these standards.

Q: Are all phase I-III studies also considered as RWD, or only phase 4 studies?

A: Yes, RWD is not limited to post-marketing. RWD can be used to support or provide evidence of safety and/or efficacy for Phase I-III studies.

Q: Must the terminology from RWD be recoded to MedDRA / WHO Drug?

A: Yes, that is the current requirement.

Q: On the slide discussing challenges with Exposure data, the suggestion was made to use custom domains. What kind of information would you suggest or envision being stored in a custom domain?

A: Exposure data from EHR records differs from exposure data in RCTs. Exposure is typically collected in 4 record types: Medication Requests, Medication dispensed, Medication Statement, and Medication Administered. Also, these data can be supplemented by Claims data. CDISC EX and EC domains can capture some of this data but were not designed to capture data from EHR records. Claims data may be the best example of data that should be captured in a custom domain. Also, reviewers are especially interested in the record type where the source exposure data came from. This should be submitted as a non-standard variable.

Q: Is the FDA accepting OMOP standards?

A: The FDA does not accept OMOP at this time, only standards documented in the Data Standards Catalog. For now, all clinical data must be submitted using CDISC standards.

About Jeff Abolafia

Jeff is the Director of Product Innovation here at Certara and an innovative life science professional with over 35 years of experience in data standards, statistical computing, and data management in both academic and nonacademic clinical research environments.

He is recognized consistently for leadership in clinical research, regulatory submission strategy, end-to-end standards implementation, team building, and innovative process improvements.


Blog Main Page

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.