Neural Expectation Maximization

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these<br>tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities.<br>We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the<br>learned representations are useful for next-step prediction.