While AI offers significant potential in life sciences, its implementation comes with several challenges, ranging from the pure size of medical databases, to mandatory regulatory compliance and the ethics of using black-box models in medical decision making.
Rulex Platform’s eXplainable AI has a profound impact on the implementation of AI in this sensitive sector, by producing transparent, human understandable results. This transparency enables medical experts to understand and explain any predictions made, while guaranteeing ethical data models and results, and adherence of privacy regulations. Simple interpretability is essential for gaining trust and understanding the rationale behind medical decisions, and enables a healthy balance in human-AI collaboration.
Rulex Platform can also easily gather, aggregate and analyse extremely large datasets in any format, and from any source, while integrating with underlying information systems, such as electronic health records, or laboratory information management systems, without causing disruption and upheaval. Results can also be produced in any format required, whether that is an e-mail with urgent results, a tailored spreadsheet saved on a common server, or an interactive dashboard to show colleagues.
For its inherent explainability and agility in data management, Rulex Platform has been chosen by medical healthcare and life sciences organizations to leverage medical records, resulting in improved health outcomes, enhanced clinical and operational decision-making, and pioneering research.
Table of contents
- Improving data quality in hospital discharge reports
- Tailoring Diagnostic Predictions for Primary Biliary Cholangitis
- Identifying Correlations with XAI to Improve Metabolic Control in Type 2 Diabetes
- Extracting Rules to Diagnose Pleural Mesothelioma
- Extracting a Simplified Gene Expression Signature for Neuroblastoma Prognosis
- Extracting Intelligible Rules in Neuroblastoma Prognosis
- Validating a New Classification for Multiple Osteochondromas Patients
- Predicting Obstructive Sleep Apnea in People with Down Syndrome
- Benchmarking LLM Performance on Standard Biomedical Datasets
1. Improving Data Quality in Hospital Discharge Reports
Health check systems in Italy are overseen by regional and local health authorities, who actively monitor and regulate the quality of healthcare services to ensure their appropriateness. Over time, numerous Italian regions have developed and revised guidelines and operational procedures aimed at scrutinizing hospital discharge reports and medical records.
The significance of accuracy in medical records cannot be overstated, as errors can lead to various repercussions, ranging from minor billing discrepancies to critical issues such as incomplete or incorrect diagnoses, or delays in scheduling surgical interventions.
In collaboration with Deimos, Rulex leveraged their eXplainable Artificial Intelligence (XAI) technologies to automate the scrutiny of coding in hospital discharge forms within the Alto Adige health authority. The primary focus of the study was to assess the feasibility of applying automatic checks, characterized as logical clinical checks, not only to ensure compatibility between sex-diagnosis or age-diagnosis, as traditionally done with formal logical checks, but also to explore the intricate relationships between clinical variables in hospital discharge reports. This approach aimed to automatically identify inconsistencies among diagnosis, surgery, medical procedures, and Diagnosis-Related Groups (DRGs).
The tested methodologies yielded promising results. Validation rules were defined, resulting in improved efficiency in automatic record checks and identification of probable location of errors, the personnel time required for record checking was significantly reduced, and automatic checks were carried out on all surgical hospital discharge records, not only a test subset. Overall, the innovative approach not only enhanced the precision of existing checks but also introduced a more comprehensive and nuanced evaluation of the relationships within medical records.
Related research paper (in Italian):
- Elisabeth Montel, Astrid Richter, Sabine Ladurner, Agata Malizia, Roberta Vanzetta, Damiano Verda, Marco Muselli, Pierluigi Santin, Paolo Vian, Controllo automatico delle variabili cliniche della scheda di dimissione ospedaliera (SDO), mediante l’utilizzo di tecniche di machine learning, Frg Editore, 2021.
2. Tailoring Diagnostic Predictions for Primary Biliary Cholangitis
Precision medicine seeks to customize the diagnosis, monitoring, and management of individuals based on their unique genetic and environmental backgrounds. This undertaking is particularly challenging due to the intricate nature of medical traits and the presence of multiple variants. The complexity is further amplified when addressing rare diseases, where limited historical data poses an additional hurdle.
In collaboration with the medical departments of Milano-Bicocca and Humanitas universities, Rulex conducted a pioneering study to assess the feasibility and precision of predicting the risk of Primary Biliary Cholangitis (PBC) using eXplainable AI (XAI). The focus was on identifying novel patient subgroups, disease sub-phenotyping, and risk stratification.
The XAI algorithm was applied to an extensive international dataset of PBC patients, divided into a training set, with 11,819 subjects, and a validation set, with 1,069 subjects, with a meticulous analysis of key clinical features. The primary outcome was a composite of liver-related death or liver transplantation, assessed through a combination of machine learning and standard survival analysis.
The analysis revealed four distinct patient clusters, each characterized by unique phenotypes and long-term prognoses. These findings represented a pivotal milestone in formulating a targeted treatment approach for PBC. Additionally, they laid the foundation for ongoing efforts in identifying and providing timely treatment for the relatives of patients, confirming the potential of XAI in advancing precision medicine for complex diseases.
Related research paper:
- Alessio Gerussi, Damiano Verda, Davide Paolo Bernasconi, Marco Carbone, Atsumasa Komori, Masanori Abe, Mie Inao , Tadashi Namisaki, Satoshi Mochida, Hitoshi Yoshiji, Gideon Hirschfield, Keith Lindor, Albert Pares, Christophe Corpechot, Nora Cazzagon, Annarosa Floreani, Marco Marzioni, Domenico Alvaro, Umberto Vespasiani-Gentilucci , Laura Cristoferi, Maria Grazia Valsecchi, Marco Muselli, Bettina E Hansen, Atsushi Tanaka, Pietro Invernizzi, Machine learning in primary biliary cholangitis: A novel approach for risk stratification, Wiley, Dec 2021.
3. Identifying Correlations with XAI to Improve Metabolic Control in Type 2 Diabetes
One of the primary goals of diabetologists is to establish an effective metabolic control in type 2 diabetes patients, measured through hematic levels of HbA1c, without causing weight gain.
The Italian diabetology association used Rulex’s proprietary XAI to extract and rank the factors most strictly associated to reducing HbA1c levels. The study involved vast amounts of raw data, including the medical records of 2 million diabetic patients, and the data collected from medical visits over a 10-year period, with over 137 variables per patient.
Significant correlations were identified, such as the use of specific receptor agonists, while it was established that HbA1c and weight-gain have different determinants. These results lead to more efficient patient care for diabetic patients.
Related research paper:
- Carlo Bruno Giorda, Federico Pisani, Alberto De Micheli, Paola Ponzani, Giuseppina Russo, Giacomo Guaita, Rita Zilich, Nicoletta Musacchio on behalf of the Associazione Medici Diabetologi (AMD) Annals Study Group, Determinants of good metabolic control without weight gain in type 2 diabetes management: a machine learning analysis, BMJ Open Diabetes Research and Care.
4. Extracting Rules to Diagnose Pleural Mesothelioma
Malignant pleural mesothelioma (MPM) is a rare and highly lethal tumor, with its incidence rising rapidly in developed countries due to past asbestos exposure in various environments. Accurate diagnosis of MPM faces challenges, as atypical clinical symptoms often lead to potential misdiagnoses with other malignancies (especially adenocarcinomas) or benign inflammatory or infectious diseases (BD) causing pleurisies. While cytological examination (CE) can identify malignant cells, a notable false negative rate may occur due to the prevalence of non-neoplastic cells. Additionally, a positive CE result alone may not distinguish MPM from other malignancies.
Various tumor markers (TM) have proven to be valuable complementary tools for MPM diagnosis. Recent studies focused on three tumor markers in pleural effusions: soluble mesothelin-related peptide (SMRP), CYFRA 21-1, and CEA. Their concentrations were analyzed in association with the differential diagnosis of MPM, pleural metastasis from other tumors (MTX), and BD. SMRP demonstrated the best performance in distinguishing MPM from both MTX and BD, while high CYFRA 21-1 values were linked to both MPM and MTX. Conversely, elevated CEA concentrations were primarily observed in patients with MTX. Combining information from the three markers and CE could form a classifier to separate MPM from both MTX and BD.
In this context, the Rulex Logic Learning Machine (LLM) was employed for the differential diagnosis of MPM by identifying straightforward and understandable rules based on CE and TM concentrations. Comparative analyses with other supervised methods, including Decision Trees, K-Nearest Neighbors, and Artificial Neural Networks, revealed that LLM consistently outperformed all competing approaches.
Related research paper:
- Parodi, R. Filiberti, P. Marroni, R. Libener, G.P. Ivaldi, M. Mussap, E. Ferrari, C. Manneschi, E. Montani, M. Muselli, Differential diagnosis of pleural mesothelioma using Logic Learning Machine, BMC Bioinformatics 16.S9 (2015): S3.
5. Extracting a Simplified Gene Expression Signature for Neuroblastoma Prognosis
The outcome of cancer patients is, in part, influenced by the gene expression profile of the tumor. In a prior study, a 62-probe set signature (NB-hypo) was identified for detecting tissue hypoxia in neuroblastoma. This signature effectively stratified neuroblastoma patients into good and poor outcome groups. Establishing a prognostic classifier was crucial for grouping patients into risk categories, aiding in the selection of tailored therapeutic approaches.
To enhance the accuracy of predictors and create robust tools for clinical decision support, novel classification and data discretization approaches were explored. In this study, Rulex was employed on gene expression data, specifically using the Attribute Driven Incremental Discretization technique to transform continuous variables into simplified discrete ones. This pre-processing step facilitated rule extraction through the Logic Learning Machine (LLM). The application of LLM yielded 9 rules, primarily based on the relative expression of 11 probe sets. These rules proved highly effective as predictors, validated independently and confirming the efficacy of the LLM algorithm on microarray data and patient classification.
The LLM demonstrated efficiency comparable to Prediction Analysis of Microarray and Support Vector Machine, surpassing other learning algorithms like C4.5. Rulex conducted feature selection, resulting in a new signature (NB-hypo-II) comprising 11 probe sets, identified as the most relevant in predicting outcomes. This comprehensive approach underscores the potential of utilizing LLM in the development of reliable prognostic classifiers for cancer patients.
Related research paper:
- Cangelosi, M. Muselli, S. Parodi, F. Blengio, J. Koster, A. Schramm, A. Garaventa, C. Gambini, l. Varesio, Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients, BMC bioinformatics 15.S5 (2014): S4.
6. Extracting Intelligible Rules in Neuroblastoma Prognosis
Neuroblastoma, the most common pediatric solid tumor, poses a significant challenge as approximately fifty percent of high-risk patients do not survive treatment. The urgent need for improved stratification strategies led to the exploration of new, more effective approaches. Hypoxia, characterized by low oxygen tension in poorly vascularized tumor areas, is associated with a poor prognosis. This study aimed to develop a prognostic classifier for neuroblastoma patients by integrating existing knowledge of clinical and molecular risk factors with the NB-hypo signature.
The focus was on creating classifiers that produce explicit rules easily applicable in a clinical setting. The Logic Learning Machine, known for its accuracy, seemed promising for achieving the study’s objectives. The algorithm was employed to classify neuroblastoma patients based on key risk factors: age at diagnosis, INSS stage, MYCN amplification, and NBhypo. The algorithm successfully generated clear classification rules that aligned well with established clinical knowledge.
To enhance stability, an iterative process identified and removed examples causing instability in the rules from the dataset. This refined workflow resulted in a stable classifier highly accurate in predicting outcomes for both good and poor prognosis patients. The classifier’s performance was further validated in an independent dataset. Notably, NB-hypo emerged as a crucial component of the rules, demonstrating a strength comparable to tumor staging. This comprehensive approach showcases the potential of the Logic Learning Machine in developing a robust prognostic classifier for neuroblastoma patients.
Related research paper:
- Verda, S. Parodi, E. Ferrari, and M. Muselli, Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods, BMC bioinformatics 20(9) (2019), 390.
7. Validating a New Classification for Multiple Osteochondromas Patients
Multiple osteochondromas (MO), formerly recognized as hereditary multiple exostoses (HME), is an autosomal dominant disorder marked by the development of benign cartilage-capped bone growths known as osteochondromas or exostoses. Despite various clinical classifications proposed, a consensus remains elusive. This study aimed to validate an “easy-to-use” tool, employing a machine learning approach, to categorize MO patients into three classes based on the number of affected bone segments, the presence of skeletal deformities, and/or functional limitations.
The proposed classification, assessed through the Switching Neural Network underlying the Logic Learning Machine technique, demonstrated a highly satisfactory mean accuracy. A comprehensive analysis of 150 variables across 289 MO patients facilitated the identification of ankle valgism, Madelung deformity, and limitations in hip extra-rotation as distinctive features (“tags”) of the three clinical classes. In summary, the proposed classification offers an effective system for characterizing this rare disease, enabling the definition of homogeneous patient cohorts for in-depth investigations into MO pathogenesis.
Related research paper:
- Mordenti, E. Ferrari, E. Pedrini, N. Fabbri, l. Campanacci, M. Muselli, l. Sangiorgi, Validation of a New Hereditary Multiple Exostoses Classification Through Switching Neural Networks, American Journal of Medical Genetics 161 (2013) 556–560 DOI: 10.1002/ajmg.a.35819.
8. Predicting Obstructive Sleep Apnea in People with Down Syndrome
Obstructive sleep apnea (OSA) is notably prevalent in individuals with Down Syndrome (DS), with reported rates ranging from 55% to 97%, a stark contrast to the 1–4% prevalence in the neurotypical pediatric population. However, conventional sleep studies are often uncomfortable, expensive, and poorly tolerated by those with DS.
To address this, a dataset encompassing over 460 observations was compiled for 102 Down syndrome patients. Each patient underwent a polysomnogram, and the dataset included diverse information such as clinical visit findings, parent surveys, wristband oximeter data, urine proteomic analysis, lateral cephalogram results, and 3D digital photos.
Utilizing the Logic Learning Machine (LLM), a predictive model was developed to ascertain the occurrence of obstructive sleep apnea in individuals with Down syndrome. This approach aimed to offer an alternative to uncomfortable and costly tests like polysomnograms.
The LLM classification task successfully identified a predictive model represented by a set of simple rules, exhibiting a high predictive value of 81.5% for negative cases. Additionally, the Feature Ranking task allowed for the identification of the most relevant variables, assigning a quantitative score to their importance in the predictive model. This innovative methodology not only facilitates a more comfortable diagnosis for individuals with DS but also provides a streamlined and effective means of identifying obstructive sleep apnea.
Related research paper:
- G. Skotko, E.A. Macklin, M. Muselli, L. Voelz, M.E. McDonough, E. Davidson, V. Allareddy et al, A predictive model for obstructive sleep apnea and Down syndrome, American journal of medical genetics Part A 173, no. 4 (2017): 889-896.
9. Benchmarking LLM Performance on Standard Biomedical Datasets
In this study, we employed Rulex’s Logic Learning Machine on three benchmark datasets related to distinct biomedical issues. These datasets were sourced from the UCI archive, a repository of data used for machine learning benchmarking. The datasets are as follows:
- Objective: Diagnosing diabetes based on the values of 8 variables.
- Patient Characteristics: All 768 patients considered are females, at least 21 years old, and of Pima Indian heritage.
- Cases and Controls: Out of the 768 patients, 268 are effective cases of diabetes, while the remaining 500 are controls.
- Heart disease:
- Objective: Detecting heart disease using a set of 13 input variables related to patient status.
- Sample Size: The total sample comprises 270 elements, with 120 cases of effective heart disease and 150 controls.
- Donor/acceptor DNA:
- Objective: Recognizing acceptors and donors’ sites in primate gene sequences with a length of 60 (basis).
- Dataset Composition: The dataset consists of 3186 sequences categorized into three classes: acceptor, donor, and none.
The performance of the Rulex Logic Learning Machine (LLM) was compared to other supervised methods, including Decision Trees (DT), Artificial Neural Networks (ANN), Logistic Regression (LR), and K-Nearest Neighbor (KNN). The conducted tests revealed that the results obtained by LLM surpassed those of ANN, DT (which generates rules), and KNN. Moreover, LLM’s performance was found to be comparable to that of LR.
Related research paper:
- M. Muselli, Extracting knowledge from biomedical data through Logic Learning Machines and Rulex, EMBnet Journal 18B (2012), 56–58.