Automated Data Augmentation for Deflating Data Bias |
Overview
Machine Learning (ML) has been widely adopted in many areas and data bias is a major concern in ML - certain
elements of a dataset are more heavily weighted or represented than others. A biased dataset does not accurately represent a
model’s use case, resulting in skewed outcomes, low accuracy levels, and analytical errors. High bias refers to when a model
shows high inclination towards unnecessary inherited features, but the key features are usually omitted, and the generalization
becomes very poor. While data augmentation is a promising solution to address data bias, it faces several practical challenges:
(i) most of the existing efforts manually craft the synthesized dataset, which is time-consuming, (ii) the criteria for data
augmentation heavily rely on human-expert knowledge, and (iii) prior solutions assume that users already know the existence
of bias so that they can remedy it, which is usually not the case in many real-world applications. This proposal will develop
effective data augmentation techniques to address the above challenges.
The above figure shows an overview of our proposed data augmentation framework. The proposed framework consists of four tasks:
(i) GAN-based data augmentation, (ii) diffusion model based data augmentation, (iii) ensemble of GAN and diffusion models,
and (iv) acceleration of the framework by boosting algorithms.
Members
Downloads
Stay tuned ...
Publications
Stay tuned ...
Research Sponsors
|
This project is funded by the Semiconductor Research Corporation (SRC). The views expressed on the site are those of the members of
this project and do not necessarily represent those of the Semiconductor Research Corporation. |