The ml-Codesmell dataset was created by analyzing source code and extracting massive source code metrics with many labelled code smells. This dataset has been used to train and predict code smell using machine learning algorithms.