Artifact for Towards Build Optimization Using Digital Twins
CI Build Digital Twin Framework
Artifact associated with the paper Towards Build Optimization Using Digital Twins accepted at the 21st International Conference on Predictive Models and Data Analytics in Software Engineering PROMISE 2025.
Getting started
The CBDT framework application stack includes the following containerized components:
- build-monitor
- build-pulse
- build-data-subscriber
- cbdt-datastore (PostgreSQL)
- cbdt-message-broker (RabbitMQ)
In addition, the Artifact includes:
- glbuild the (source code of the) GitLab build data integration Python library leveraged in build-pulse
- glbuilds-2024-11-01.sql.zip the dataset of 1.7 million jobs collected to demonstrate the CBDT framework. We release this dataset to facilitate onboarding and encourage implementations of CBDTs using the CBDT framework.
Requirements
Please read the README.md files of each component for their development requirements.
- Create a GitLab personal access token (PAT)
- Configure GitLab job events webhook using Reverse API endpoint as described in build-pulse/README.md
- Generate a secret token for
X_GITLAB_TOKEN
to protect the webhook receiver endpoint
You may use this command to generate X_GITLAB_TOKEN
environment variable value.
openssl rand --base64 16
Usage
1. Specify all required environment variables in the .env
file.
cp .env.example .env
2. Start only the datastore
service
docker compose up -d datastore
3. Unzip the glbuilds-2024-11-01.sql.zip
file which contains the 25GB .sql
dump file of the CBDT Data Store (PostgreSQL) as of November 1, 2024.
unzip glbuilds-2024-11-01.sql.zip
4. Then restore it in the datastore
. This might take a little while to complete.
cat /path/to/glbuilds-*.sql | ./build-pulse/scripts/data/restore.sh
5. Start the remaining services
docker compose up -d
Open http://localhost:3000 to access build-monitor
UI dashboard.