Open Access Paper
28 December 2022 Interpolation-aware models for train-test consistency in mixup
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 1250628 (2022) https://doi.org/10.1117/12.2661854
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
Mixup is a learning principle that trains a neural network on convex combinations of pairs of examples and their labels. Despite of its good performance, there is an inherent inconsistency between training and testing in mixup, which makes theoretical understanding difficult and hurts the performance in some cases. In this work, we propose λ-mixup to alleviate this inconsistency. Specifically, λ-mixup reformulates the model to take the interpolation coefficient (𝜆) as input as well, so that a class of models indexed by 𝜆 is learned and we can select one specific coefficient or multiple coefficients for ensembles depending on the testing distribution. We theoretically demonstrate that, with enough data and model capacity, λ-mixup can recover the original conditional distribution. Moreover, we conduct image classification tasks on multiple datasets, including CIFAR-10, CIFAR-100 and Tiny-Imagenet, showing that comparing with mixup, λ-mixup exhibits better generalization, calibration and robustness to adversarial attacks and out-of-distribution transformations.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yinhan Hu, Congying Han, and Tiande Guo "Interpolation-aware models for train-test consistency in mixup", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 1250628 (28 December 2022); https://doi.org/10.1117/12.2661854
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Calibration

Data modeling

Electrochemical etching

Image classification

Performance modeling

Neural networks

Image resolution

Back to Top