Skip to content

Approximate-Reasoning Artificial Intelligence Against Biased Training Data

Investigators: Liang Dong, PhD

Summary

Today’s artificial intelligence (AI) systems are not as objective as we thought.  We will encounter undesirable issues if we apply AI to healthcare.  Firstly, the training samples may not be evenly distributed in the data space.  An AI system, for instance, is trained with millions of images of common disease cases.  Their ability to detect signs of common disease on scans is excellent.  Unfortunately, it is not always able to parse unusual images or scans because it ​​has less exposure to them during training.  Secondly, many AI systems are trained on potentially biased data we humans provide.  Much of this data includes subjective human opinions.  In healthcare, the data for AI training may reflect doctors’ opinions.  AI tends to replicate biases and mistakes in human decision-making.  AI systems for healthcare should continue to cooperate and be updated by expertise from doctors, who gradually accumulate experience when assessing treatment options.  Thirdly, prejudices in medical records may be treated as facts and replicated in the AI systems that learn from them.  These issues of decision support systems and their training raise ethical questions for AI in healthcare.  Should patients be preemptively treated based on AI results without a definitive diagnosis?  Should doctors develop aggressive treatments if AI labels patients at high risk for cancer?  How to avoid unnecessary biopsies and surgical overdiagnosis?  How not to waste medical resources?  If a life-changing decision is being made against someone, she/he should be entitled to an explanation of the decision or at least some reason why the decision was made.

In this project, we intend to address these AI issues raised from biased training data by creating a new AI methodology, applying approximate reasoning, and retracing learned rules to the particular training data partition.  Decision makers in high-stakes fields such as healthcare are much less likely to trust recommendations for which no clear justification is provided.  There is a need for AI systems with transparency and accessible reasoning for use in clinical decision-making.  What is missing today is the investigation of how the training data distribution affects the decision-making rules.  If achieved, it will provide a better understanding of AI decision-making and possibly reveal the bias in the original training data.  It will also offer an opportunity to improve the AI system with the help of updated doctor heuristics and accumulated domain knowledge expertise.  

In our approach, the AI systems embed approximate reasoning, which includes fuzzy logic and set membership functions that can deal with problems with uncertainties.  The approximate-reasoning neural nodes can extract knowledge from the data in a way that humans can understand.  Our approach generates a partition of the input data space, and an input-output relation rule is formed in each subspace.  The input to the AI system will also include categorical variables, which is crucial in clinical decision support systems.  We use tropical geometry in the AI neural networks to ensure the differentiability of approximate reasoning and categorical variables.  This allows end-to-end gradient descent-based optimization of the network parameters.  For example, during training, the set membership functions can be parameterized as smooth curves using tropical geometry.  Throughout the optimization process, the membership functions are gradually updated and converted to piecewise linear functions, which ensures the stability and convergence of gradient descent as well as results in an interpretable system of approximate reasoning.