At XClinical, we see the digital revolution’s potential to radically improve global health care, and we are prepared for it.
But this transformational shift also brings a sizable challenge: how to collect, store, and manage the proliferation of data created by digital tracking and measurement devices and derived from a growing number and type of disparate sources in an array of languages, platforms, and formats. Real-world data (RWD) and real-world evidence (RWE), have become increasingly more important and have a groundbreaking impact on clinical trials. Now consider the wearable device. As wearables become more affordable and accessible, collecting health information in real-time, they add to an already vast and varied data stream.
Large amounts of data calls for big solutions and at the forefront is the power of both artificial intelligence (AI) and machine learning (ML) technologies to provide scalable processing, data-driven decision-making, and advanced predictive analytics.
But for even the most powerful algorithms, the core of their success relies on providing them clean, coherent, and highly structured data.
So how do we ensure data is fit for purpose? Here, we look at several of the key issues that lie ahead.
What are the Data Challenges?
While we already know how to generate and gather data, how do we rigorously assure its quality? If we’re cognizant that even our most powerful AI machines are only as good as the data we provide them, then our first step toward the future is resolving our data deficiencies and challenges, including:
- Data collection disparity: Data is derived from a wide range of sources, from manual transcriptions and electronic health records to digital wearable devices, which means varying formats and conventions are not always transparent or even readily accessible.
- Data gone wild: Data keeps proliferating without standardization, in data collection, data formats, nomenclatures, and terminologies.
- Data quality issues: “Bad” data occurs in a myriad of ways during the clinical process, but one significant constant is the human factor. Our errors—like cut and paste mistakes— lead to differences in coding and recording information.
- Data cleaning: Ideally, data must undergo a cleaning process, by which inaccurate or incomplete fields and records are corrected. In reality, it requires tools, training, and time which can impact time-to-market.
- Data from RWD and RWE: While RWD and RWE have the potential to transform modern healthcare, pursuing them requires collaborative action to improve data collection, analytical methods, and shared data infrastructure.
Mapping Legacy Data to New Data Standards
We can improve future data with standardization compliance, but what about the volumes of data that already exists? That is, our legacy data, which we can’t leave behind.
To use legacy data, we need to map it to new data standards. Mapping is traditionally a time-consuming, labor-intensive manual process, but with ongoing improvements to AI, we can now develop sophisticated statistical models that may play a role in automating many of these associated tasks.
In contrast, in cases where legacy data is highly structured and already adheres to a set of standards, mapping this data to new standards becomes a more straightforward process.
Where Do We Go Next With Standards?
We face a complex but not insurmountable set of data issues. Let’s consider the banking industry as a viable model. The financial sector must track and execute transactions with billions of data points from all over the world, following different currencies and systems. It does so instantly, seamlessly, and effortlessly, largely because it enforces and adheres to a set of common data models (CDMs) and Application Programming Interfaces (APIs).
We will have similar success as we take on:
- Industry-wide adoption and commitment to well-defined data standards
- RDE and RWE, our next generation of data
- Smart and powerful tools to share and map data
Data standardization is central to our future and of utmost importance; however, it takes a very large commitment on the part of the clinical research industry. The Clinical Data Interchange Standards Consortium (CDISC) is arguably the most influential standards body that affects data standardization in clinical research. Its goal is to develop and support platform-independent data standards to permit information system interoperability and thereby inform patient care and safety through higher quality medical research.
While creating data standards is a significant first step it needs to be followed by many more.
The data standards used in different domains need to be harmonized, for example, between EHRs and electronic data capture (EDC) systems.
With more broad acceptance and adherence to standards, we can more effectively create better data—both next generation and legacy—and build crucial interoperability.
Image Credit: Ekaphon maneechot / Shutterstock.com