Borg Traces dataset
John Wilkes.
The clusterdata-2019
trace dataset provides information about eight different Borg cells for the month of May 2019. It includes the following new information:
- CPU usage information histograms for each 5 minute period, not just a point sample;
- information about alloc sets (shared resource reservations used by jobs);
- job-parent information for master/worker relationships such as MapReduce jobs.
The 2019 traces focus on resource requests and usage, and contain no information about end users, their data, or access patterns to storage systems and other services.
Because of it's size (about 2.4TiB compressed), we are only making the trace data available via Google BigQuery so that sophisticated analyses can be performed without requiring local resources.
The clusterdata-2019
traces are described in this document: Google cluster-usage traces v3. You can find the download and access instructions there, as well as many more details about what is in the traces, and how to interpret them. For additional background information, please refer to the 2015 Borg paper, Large-scale cluster management at Google with Borg.