Natural Language Processing for Analysis of Complex Diseases in the Hospital Setting (Lupus Case)
Successful development and commercialization of treatments for heterogeneous conditions with diverse manifestations, such as lupus, require pharmaceutical and biotech companies to efficiently identify and precisely characterize patients in terms of demographics, disease history, lines of treatments, outcomes, etc. However, traditional large data sources fail to provide complete information, lacking clinical granularity and remaining complex, resource intensive, and often unsuccessful investments.
An interconnected hospital-specific natural language processing (NLP)-powered platform can successfully identify and characterize patients to provide pharmaceutical and biotech companies with fit-for-purpose real-world data (RWD) and access strategies.
The Lupus Challenge
Lupus is a complex, autoimmune, chronic, and inflammatory disease that mainly affects connective tissues.
The most common form of lupus is systemic lupus erythematosus (SLE) which affects multiple organs. Other common forms include cutaneous lupus erythematosus (CLE), which results in skin rash and lesions, and lupus nephritis (LN), which affects the kidneys. Other manifestations include anemia, arthritis, psychosis, seizures, and thrombocytopenia. Such a diverse set of manifestations translates into numerous patient types in clinical practice, with different disease and treatment pathways, as well as different lead medical specialties caring for patients, most often in the hospital setting. In addition, lupus can affect children at a young age. In children, symptoms are similar to those in adults, but can be more severe and with a greater frequency of renal manifestations.
Gaps in Real-World Data
There is little published information in terms of epidemiology of lupus disease, overall and by manifestations, and even more limited RWD on the burden of the disease and its long-term outcomes in adults and children.
The traditional method for identifying patients is to search claims databases and identify target patients via ICD 10-CM codes for lupus. However, this approach lacks granularity and does not allow for the confirmation of diagnosis or a study by sub-types. Although claims databases can be valuable to assess epidemiology, resource use, and overall economic impact and to generate hypotheses, they fail to provide detailed or complete information on the drugs and biological tests administered and/or prescribed in the patient setting, and lack the information contained in medical and paramedical notes.
Identifying, Classifying, and Characterizing Patients with Lupus REAL in the Hospital Setting with an Interconnected AI/NLP Research Platform
Alira Health and Sancare research teams, in collaboration with the Reims teaching hospital (CHU de Reims), under the direction of Professor Vuiblet, conducted a study of patients with lupus (Lupus REAL) to validate the use of Realli™, a solution that builds on proprietary artificial intelligence (AI) and NLP-powered software for optimization and automation of diagnosis-related group (DRG) coding.1 Realli harnesses near real time interconnected electronic medical records (EMRs) and claims data for clinical research from an expanding representative network of French hospitals and clinics. Leveraging the Realli solution with mixed claims and text string searches from patients’ EMRs between January 2018 and March 2023, our study identified and ascertained the diagnosis of patients with lupus and then classified them in specific clinical-oriented disease subgroups. We then retrieved the history of lupus-specific and other biomarker testing, searched for the presence of lupus disease activity scores and comorbidities in patients’ charts, and extracted biological tests and treatments from the structured datasets. The study was conducted according to applicable regulatory requirements, in accordance with the General Data Protection Regulation (GDPR).
This research contributes to a deeper understanding of lupus patients by uncovering information that is rarely documented in the literature. There were 187 adult patients identified. Patients may have presented with non-mutually exclusive forms of lupus but were segmented in mutually exclusive groups: lupus erythematosus (LE; or L93.X only), systemic lupus erythematosus (SLE; M32.X only, or in combination with L93.X). SLE patients were then classified into non-mutually exclusive sub-groups based on different lupus manifestations identified in text searches, i.e., LN and CLE.
We retrieved results for at least one biologic marker test in 82% of patients, and the Systemic Lupus Erythematosus Disease Activity (SLEDAI-2k) score, a comprehensive tool for evaluating clinical symptoms in SLE patients, in 9.6% of patient records, with an average score of 11.7 and a median score of 10.0 ([4.0 ; 17.0]).
The complex nature of lupus is closely associated with the high risk of developing complications and comorbidities. Our research showed the most prevalent conditions were cardiovascular (excluding hypertension, 63% of patients), metabolic (40%), polyarthritis (39%), hypertension (39%), and gastrointestinal (37%). Renal manifestations were identified in 35% of patients, including 92% and 100% in the LN and CLE+LN subgroups, respectively. Overall, 165 (88%) of patients suffered from at least one of these conditions.
The Benefit of Realli for Data Capture and Analysis for Pharma and Biotech
Realli has significant potential advantages for the study of complex diseases with multiple symptoms and comorbidities. The granularity of information retrieved is typically inaccessible through conventional data sources and retrieval methods, often necessitating the integration of numerous and disparate data sources. For instance, Realli includes not only the number of patients who were administered a test but also the test results. In addition, disease activity scores which are not in a structured format and are not usually readable in EMRs can be identified and extracted.
Leveraging the power of Realli, pharmaceutical and biotech companies can fill the existing real-world evidence (RWE) and clinical gaps and can better characterize patient profiles to develop new products and to support their regulatory and market access efforts.
Learn More About NLP and Lupus
If you would like to learn more about NLP for the analysis of complex diseases like Lupus, make sure you download the poster “Harnessing the potential of natural-language processing and interconnected data streams for complex diseases in the hospital setting: Lupus case study in France” presented at ISPOR Europe 2023.
Subscribe to our newsletter for the latest news, events, and thought leadership