Label Noise Detector (LND)

Research Fellow: Jiahen Wei

Advisor: Yang Liu

Overview

A dataset contains numerous noisy labels if collected from unverified sources. Such label noise in real-world datasets encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs). This project (LND -– Label Noise Detector) is an open-source project that provides efficient ways to detect corrupted patterns, i.e., the label noise transition matrix, which characterizes the probabilities of a training instance being wrongly annotated.

LND includes:

Detection with High-Order Consensuses: a generically applicable and light tool for fast estimation of the noise transition matrix relying on the first-, second-, and third-order consensuses checking among an example and its’ 2-NN’s noisy labels. This flexible tool has the potential to be applied to more sophisticated noise settings, including instance-dependent ones;
Detection beyond Images: an information-theoretic approach to down-weight the less informative parts of features with only noisy labels for tasks with lower-quality features.
Detection with Similar Features: a new and universally applicable datacentric training-free solution to detect noisy labels by using the neighborhood information of features.
Detection with Precarity Effects of Early Memorization Behavior: a model-centric approach to detect the corrupted labels that appeared in the biased and noisy training dataset by checking the model prediction confidence within time intervals. The biases and noise can happen both at the sampling and label collection: A dataset often contains numerous sub-populations and the size of these sub-populations tends to be long-tail distributed, where the tail sub-populations have an exponentially scaled probability of being under-sampled.