Artificial intelligence has many possible applications in the medical research and healthcare sector, but they are sometimes undermined by a widespread problem: the results lack understandable and transparent explanations due to the use of black box algorithms.
This limitation is unacceptable, especially in contexts where automated decisions have a significant impact on people’s lives. Rulex Platform is particularly appreciated since it overcomes this issue. By exploiting eXplainable AI, it provides explanations for every single result or decision, creating a climate of collaboration and trust between medical and research teams.
Read on for some of Rulex’s many success stories in this area:
- Using machine learning techniques to automatically check the clinical variables found in hospital discharge forms
- Machine learning in primary biliary cholangitis: A novel approach for risk stratification
- Identification of the factors involved in achieving good metabolic control without weight gain in type 2 diabetes management
- Benchmarking LLM performances on standard biomedical datasets
- Extraction of rules for pleural mesothelioma diagnosis
- Extraction of a simplified gene expression signature for neuroblastoma prognosis
- Extraction of intelligible rules concerning the prognosis of neuroblastoma
- Validation of a new classification for multiple osteochondromas patients
- Predicting obstructive sleep apnea in people with down syndrome
Using machine learning techniques to automatically check the clinical variables found in hospital discharge forms
Health check systems in Italy are implemented by regional and local health authorities and involve monitoring and controlling the quality of healthcare and the appropriateness of services provided. Over the years, many Italian regions have created, and updated guidelines and operative procedures for checking hospital discharge reports and medical records.
In an experiment carried out by the Alto Adige Health Authority, artificial intelligence techniques were used to automatically check the coding used in hospital discharge reports. The study particularly focused on evaluating whether it would be possible to apply automatic checks (which could be defined as logical clinical checks) not only to the compatibility of sex-diagnosis or age-diagnosis, as was previously the case with formal logical checks, but to also investigate existing relationships between clinical variables in hospital discharge reports, in order to automatically identify inconsistencies between diagnosis, surgery, medical procedures and DRGs. The tested methodologies have shown interesting results, envisaging their possible routine adoption in hospital discharge reports checking activities.
Machine learning in primary biliary cholangitis: A novel approach for risk stratification
Background & Aims
Machine learning (ML) provides new approaches for prognostication through the identification of novel subgroups of patients. We explored whether ML could support disease sub-phenotyping and risk stratification in primary biliary cholangitis (PBC).
Method
ML was applied to an international dataset of PBC patients. The dataset was split into a derivation cohort (training set) and a validation cohort (validation set), and key clinical features were analyzed. The outcome was a composite of liver-related death or liver transplantation. ML and standard survival analysis were performed.
Results
The training set was composed of 11,819 subjects, while the validation set was composed of 1,069 subjects. ML identified four clusters of patients characterized by different phenotypes and long-term prognosis. Cluster 1 (n = 3566) included patients with excellent prognosis, whereas Cluster 2 (n = 3966) consisted of individuals at worse prognosis differing from Cluster 1 only for albumin levels around the limit of normal. Cluster 3 (n = 2379) included young patients with florid cholestasis and Cluster 4 (n = 1908) comprised advanced cases. Further sub-analyses on the dynamics of albumin within the normal range revealed that ursodeoxycholic acid-induced increase of albumin >1.2 x lower limit of normal (LLN) is associated with improved transplant-free survival.
Conclusions
Unsupervised ML identified four novel groups of PBC patients with different phenotypes and prognosis and highlighted subtle variations of albumin within the normal range. Therapy-induced increase of albumin >1.2 x LLN should be considered a treatment goal.
Identification of the factors involved in achieving good metabolic control without weight gain in type 2 diabetes management
One of the primary goals of any diabetologist is to reduce HbA1c values within the target range without causing weight gain or hypoglycemic episodes.
This is a quite challenging objective, since a large selection of scientific literature reports that only 40-50% of the diabetic population attains the HbA1c value goal alone.
The aim of the analysis is to identify those factors which are most strictly associated to a rapid and effective metabolic control in type 2 diabetes patients. Hypoglycemic episodes have not been included in our analysis since they were not tracked in the AMD (*Associazione Medici Diabetologi (AMD) is the largest professional association for diabetes in Italy) records database used for this analysis, so the combined goal was limited to two aspects: HbA1c values within the target range and no weight gain. AMD chose Rulex due to its ability to support an augmented decision process by creating a synergy between the domain specific expertise of the diabetologist and artificial intelligence.
The Data
2 million diabetic patients EMRs (Electronic Medical Records)
Data collected from medical visits carried out in a 10 year period (137 variables for each patient, such as clinical data, patient records, medication)
Data dimension: 2,224,000,000 raw data
Analysis of 6 Subgroups
All patients under 75, all patients over 75, obese patients under 75, obese patients over 75, presence/absence of cardiovascular disease, presence/absence of kidney disease. The relevant factors associated to patient at risk of not reaching the combined goal (glycemic control -hba1c <= 7 & no weight gain) are listed in the following tables. Relevant factors are grouped into categories determined by the diabetologists
BIOLOGICAL FACTORS ASSOCIATED TO HYPERGLYCEMIA | THRESHOLD VALUE |
FASTING BLOOD SUGAR | < 132 mg/dl |
HbA1c DROPPING SPEED | > 0,38 p.ti HbA1c/YEAR |
DISTANCE FROM TARGET 7% | ≤ 0,02 p.ti HbA1c |
CO-MORBILITIES | YES/NO |
NEPHROPATHY | NO |
ALBUMINURIA | NO |
RETINOPATHY | NO |
EPATOPATIA | NO |
CURRENT DIABETES THERAPIES | YES/NO |
INSULIN (ALONE OR ASSOCIATED TO ORAL HYPOGLYCEMIC AGENTS) | NO |
HISTORY OF PATIENT CARE | THRESHOLD VALUE |
YEARS OF OBSERVATION | < 6 |
MONTHS SINCE INITIATION OF CURE | < 19 |
INTERVAL BETWEEN VISITS (YEARS) | < 0,9 |
SCOREQ | > 29 |
SEX | > Male |
Benchmarking LLM performances on standard biomedical datasets
In this study, Rulex Logic Learning Machine was applied to three benchmark datasets regarding different biomedical problems. The datasets are taken from the UCI archive, a collection of data for machine learning benchmarking, and include: Diabetes: it regards the problem of diagnosing diabetes starting from the values of 8 variables: all the 768 considered patients are females at least 21 years old of Pima Indian heritage: 268 of them are cases whereas remaining 500 are controls. Heart: it deals with the detection of heart disease from a set of 13 input variables concerning patient status; the total sample of 270 elements is formed by 120 cases and 150 controls. DNA: it has the aim of recognizing acceptors and donors’ sites in a primate gene sequences with length 60 (basis); the dataset consists of 3186 sequences, subdivided into three classes: acceptor, donor, none. The performances of LLM were compared to those of other supervised methods, namely Decision Trees (DT), Artificial Neural Networks (ANN), Logistic Regression (LR) and K-Nearest Neighbor (KNN). These tests showed that LLM results are better than those of ANN, DT (that produce rules) and KNN and are comparable with those of LR.Extraction of rules for pleural mesothelioma diagnosis
Malignant pleural mesothelioma (MPM) is a rare highly fatal tumor, whose incidence is rapidly increasing in developed countries due to the widespread past exposure to asbestos in environmental and occupational settings. The correct diagnosis of MPM is often hampered by the presence of atypical clinical symptoms that may cause misdiagnosis with either other malignancies (especially adenocarcinomas) or benign inflammatory or infectious diseases (BD) causing pleurisies. Cytological examination (CE) may allow to identify malignant cells, but sometimes a very high false negative proportion may be encountered due to the high prevalence of non-neoplastic cells. Moreover, in most cases a positive result from CE examination only does not allow to distinguish MPM from other malignancies. Many tumor markers (TM) have been demonstrated to be useful complementary tools for the diagnosis of MPM. In particular, recent investigations analyzed the concentrations of three tumor markers in pleural effusions, namely: the soluble mesothelin-related peptide (SMRP), CYFRA 21-1 and CEA, and their association with a differential diagnosis of MPM, pleural metastasis from other tumors (MTX) and BD. SMRP showed the best performance in separating MPM from both MTX and BD, while high values of CYFRA 21-1 were associated to both MPM and MTX. Conversely, high concentrations of CEA were mainly observed in patients with MTX. Taken together, these results indicate that information from the three considered markers and from CE might be combined together in order to obtain a classifier to separate MPM from both MTX and BD. In this context, LLM has been applied for the differential diagnosis of MPM by identifying simple and intelligible rules based on CE and TM concentration. LLM results have been compared to those obtained by other supervised methods showing that LLM outperforms all the competing approaches (Decision Trees, K-Nearest Neighbors and Artificial Neural Networks).Extraction of a simplified gene expression signature for neuroblastoma prognosis
Cancer patient’s outcome is written, in part, in the gene expression profile of the tumor. In this study, a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma was previously identified and showed to stratify neuroblastoma patients in good and poor outcome. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. In this paper, Rulex was applied to gene expression data; in particular the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones was employed as a pre-processing step for rule extraction by means of Logic Learning Machine. The application of LLM produced 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the LLM algorithm applied to microarray data and patients’ classification. The LLM performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting.Extraction of intelligible rules concerning the prognosis of neuroblastoma
Neuroblastoma is the most common pediatric solid tumor. About fifty percent of high-risk patients die despite treatment making the exploration of new and more effective strategies for improving stratification mandatory. Hypoxia is a condition of low oxygen tension occurring in poorly vascularized areas of the tumor associated with poor prognosis. The aim of this study was the development of a prognostic classifier of neuroblastoma patients’ outcome blending existing knowledge on clinical and molecular risk factors with the prognostic NB-hypo signature. Classifiers outputting explicit rules that could be easily translated into the clinical setting, are particularly interesting in this context. Logic Learning Machine exhibits a good accuracy and promises to fulfill the aims of the work. This algorithm was utilized to classify NB-patients on the bases of the following risk factors: Age at diagnosis, INSS stage, MYCN amplification and NBhypo. The algorithm generated explicit classification rules in good agreement with existing clinical knowledge. Through an iterative procedure, the examples causing instability in the rules were identified and removed from the dataset. This workflow generated a stable classifier, very accurate in predicting good and poor outcome patients. The good performance of the classifier was validated in an independent dataset. NB-hypo was an important component of the rules with a strength similar to that of tumor staging.Validation of a new classification for multiple osteochondromas patients
Multiple osteochondromas (MO), previously known as hereditary multiple exostoses (HME), is an autosomal dominant disease characterized by the formation of several benign cartilage-capped bone growth defined osteochondromas or exostoses. Various clinical classifications have been proposed but a consensus has not been reached. The aim of this study was to validate (using a machine learning approach) an ‘‘easy to use” tool to characterize MO patients in three classes according to the number of bone segments affected, the presence of skeletal deformities and/or functional limitations. The proposed classification has been validated (with a highly satisfactory mean accuracy) by analyzing 150 different variables on 289 MO patients through Switching Neural Network, the model underlying the Logic Learning Machine technique. This approach allowed us to identify ankle valgism, Madelung deformity and limitation of the hip extra-rotation as ‘‘tags” of the three clinical classes. In conclusion, the proposed classification provides an efficient system to characterize this rare disease and is able to define homogeneous cohorts of patients to investigate MO pathogenesis.Predicting obstructive sleep apnea in people with down syndrome
Obstructive sleep apnea (OSA) occurs frequently in people with Down Syndrome (DS) with reported prevalence ranging between 55% and 97%, compared to 1–4% in the neurotypical pediatric population. Sleep studies are often uncomfortable, costly, and poorly tolerated by individuals with DS.
A dataset including more than 460 observations (concerning clinical visit findings, parent survey, wristband oximeter, urine proteomic analysis, lateral cephalogram and 3D digital photos) for 102 patients with Down syndrome, for each of which a polysomnogram has been performed, establishing if he/she has obstructive sleep apneas.
Logic Learning Machine was then used to develop a predictive model for determining the occurrence of obstructive sleep apnea in people with Down syndrome, thus avoiding the use of uncomfortable and expensive tests (e.g. polysomnogram).
The LLM classification task identified a predictive model, described by a set of simple rules, showing a high predictive value (81.5%) on negative cases.
The Feature Ranking task allowed to retrieve the most relevant variables, giving a quantitative score of their importance.
References
- Elisabeth Montel, Astrid Richter, Sabine Ladurner, Agata Malizia, Roberta Vanzetta, Damiano Verda, Marco Muselli, Pierluigi Santin, Paolo Vian, Controllo automatico delle variabili cliniche della scheda di dimissione ospedaliera (SDO), mediante l’utilizzo di tecniche di machine learning, Frg Editore, 2021.
- Alessio Gerussi, Damiano Verda, Davide Paolo Bernasconi, Marco Carbone, Atsumasa Komori, Masanori Abe, Mie Inao , Tadashi Namisaki, Satoshi Mochida, Hitoshi Yoshiji, Gideon Hirschfield, Keith Lindor, Albert Pares, Christophe Corpechot, Nora Cazzagon, Annarosa Floreani, Marco Marzioni, Domenico Alvaro, Umberto Vespasiani-Gentilucci , Laura Cristoferi, Maria Grazia Valsecchi, Marco Muselli, Bettina E Hansen, Atsushi Tanaka, Pietro Invernizzi, Machine learning in primary biliary cholangitis: A novel approach for risk stratification, Pubmed.gov, Jan 7.
- Carlo Bruno Giorda, Federico Pisani, Alberto De Micheli, Paola Ponzani, Giuseppina Russo, Giacomo Guaita, Rita Zilich, Nicoletta Musacchio on behalf of the Associazione Medici Diabetologi (AMD) Annals Study Group, Determinants of good metabolic control without weight gain in type 2 diabetes management: a machine learning analysis, BMJ Open Diabetes Research and Care.
- G. Skotko, E.A. Macklin, M. Muselli, L. Voelz, M.E. McDonough, E. Davidson, V. Allareddy et al, A predictive model for obstructive sleep apnea and Down syndrome, American journal of medical genetics Part A 173, no. 4 (2017): 889-896.
- Parodi, R. Filiberti, P. Marroni, R. Libener, G.P. Ivaldi, M. Mussap, E. Ferrari, C. Manneschi, E. Montani, M. Muselli, Differential diagnosis of pleural mesothelioma using Logic Learning Machine, BMC Bioinformatics 16.S9 (2015): S3.
- Cangelosi, M. Muselli, S. Parodi, F. Blengio, J. Koster, A. Schramm, A. Garaventa, C. Gambini, l. Varesio, Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients, BMC bioinformatics 15.S5 (2014): S4.
- Cangelosi, F. Blengio, R. Versteeg, A. Eggert, A. Garaventa, C. Gambini, M. Conte, A. Eva, M. Muselli, l. Varesio, Logic Learning Machine creates explicit and stable rules stratifying neuroblastoma patients, BMC Bioinformatics 14:S12 (2013).
- Mordenti, E. Ferrari, E. Pedrini, N. Fabbri, l. Campanacci, M. Muselli, l. Sangiorgi, Validation of a New Hereditary Multiple Exostoses Classification Through Switching Neural Networks, American Journal of Medical Genetics 161 (2013) 556–560 DOI: 10.1002/ajmg.a.35819.
- Muselli, Extracting knowledge from biomedical data through Logic Learning Machines and Rulex, EMBnet Journal 18B (2012), 56–58.
- Verda, S. Parodi, E. Ferrari, and M. Muselli, Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods, BMC bioinformatics 20(9) (2019), 390.