Skip to content

Assessment Of Deep Learning Models For Consumer Safety

Investigator(s): Pablo Rivas, PhD, Tomas Cerny, PhD, and Korn Sooksatra

First, By Mitigating Algorithmic Bias With Variational Autoencoders

Summary

When training data is not equally representative of all the groups that the algorithm will be applied to, many forms of bias can be introduced. This is something that industry developers may want to avoid for the safety of their stakeholders and the reliability of their models. Generally speaking, probabilistic optimization processes used in machine learning have a tendency to favor the performance of the algorithms towards the majority of the data. According to the distribution of the source from where the data is obtained, samples from groups with a low probability of appearance do not have a big impact on the overall accuracy of the system. However, in practice, when the algorithms are applied, the context of use may demand equal treatment for all the groups, no matter their probability of appearance. This is critical in social applications such as face recognition. Our work in this area follows the methodologies first introduced by Amini et al, where they proposed a generic debiasing variational autoencoder (DB-VAE) to mitigate the bias during the training. They apply the DB-VAE to the task of reducing the bias for face recognition on a dataset that is biased with respect to race and gender. In our current research, we apply DB-VAEs to mitigate bias on other datasets problems. Figure 8 shows the traditional architecture of our model.

Second, By Assessing Deep Learning Models’ Sensitivity To Adversarial Attacks

Adversarial attacks have recently been discovered. These are test-time attacks that involve the insertion of adversarial examples. An adversarial example in a classification model is an input of a target learning model which is improperly classified. For example, suppose that there is a well-trained classifier that can tell if its input is an image of a bear or a longhorn. An adversarial example is the insertion of an image of a bear which is improperly classified as a longhorn or vice versa. Figure 9 shows this kind of attack. Therefore, this attack is very dangerous if it happens in applications that humans’ lives depend on. For instance, if an autonomous car classifies a stop sign as a speed limit sign, this can causes a car crash. Further, in the medical field, if a doctor uses a machine to operate his/her patient’s heart, and the machine classifies the lungs as the heart, the patient will be definitely in danger or dead. Moreover, the adversarial attack also exists in generative models, and its adversarial example is its input which is regenerated as another input. For example, suppose that there is a MNIST autoencoder that can receive a MNIST image and generate the image that is very close to the input. Note that MNIST dataset is a collection of images of a digit. The example of an adversarial example is an image of number 0 which is regenerated as an image of number 4 by the autoencoder. This example can be illustrated Reconstruction Reconstruction Input Image Adversarial Example Decoder Encoder Figure 10: The example of adversarial attack in a generative model. An image of zero is regenerated to an image of zero and is then added by perturbation. The result is regenerated to an image of four. However, humans expect the autoencoder to regenerate the image similar to the input in Figure 10. One of the applications that are affected by this threat is an image compressor built by an autoencoder since if its input is attacked by the adversarial attack, when after decompressing it, the output can be totally different from the particular input. Explicitly, a classifier for a crucial job that does not consider this threat is dangerous for humans, all generative models are vulnerable

Figure 9: Example of an adversarial attack in a classification model. An image of a bear is classified as a bear and is perturbed. The result is classified as a longhorn with high confidence. However, humans still recognize it as a bear, and this image is called an adversarial example.
Figure 10: The example of adversarial attack in a generative model. An image of zero is regenerated to an image of zero and is then added by perturbation. The result is regenerated to an image of four. However, humans expect the autoencoder to regenerate the image similar to the input.