Kernel Density Networks
A desirable property of a machine learning model is the ability to know when it does not know, i.e., the ability to make aptly-confident predictions when it encounters out-of-distribution (OOD) inputs that are far away from the training data (in-distribution). While this property is readily observed in human and animal intelligences, deep neural networks (ReLU-type networks in particular), despite having achieved state-of-the-art performance in many learning tasks, are known to produce overconfident predictions on OOD data. Deep networks partition the input feature space into convex poly- topes and learn affine functions over them. Since the outer poly- topes at the boundary of the training data extend to infinity, they produce high confidence predictions for test samples far away from the training data. Most methods proposed to mitigate this problem rely heavily on specific network architectures, loss functions, and training routines that incorporate explicit OOD data. “Kernel Density Networks” (KDNs) overcome this problem by taking an already trained deep network and fitting Gaussian kernels over its polytopes instead of affine functions. KDNs yield class conditional density estimates for each class which are used during inference to compute the posterior class probabilities (confidences) and produce the final prediction.