https://lupyuen.github.io/articles/ci4 Pick another theme! Continuous Integration Dashboard for Apache NuttX RTOS (Prometheus and Grafana) * 1 Build Score * 2 Grafana Dashboard * 3 Prometheus Metrics * 4 Ingest the Build Logs * 5 Ingest from GitHub Actions * 6 What's Next * 7 Appendix: All Builds Dashboard * 8 Appendix: Build History Dashboard 24 Nov 2024 Continuous Integration Dashboard for Apache NuttX RTOS (Prometheus and Grafana) Last article we spoke about the (Twice) Daily Builds for Apache NuttX RTOS... * "Optimising the Continuous Integration for Apache NuttX RTOS (GitHub Actions)" Today we talk about Monitoring the Daily Builds (also the NuttX Build Farm) with our new NuttX Dashboard... * We created our Dashboard with Grafana (open-source) * Pulling the Build Data from Prometheus (also open-source) * Which is populated by Pushgateway (staging database) * Integrated with our Build Farm and GitHub Actions * Why do all this? Because we can't afford to run Complete CI Checks on Every Pull Request! * We expect some breakage, and NuttX Dashboard will help with the fixing What will NuttX Dashboard tell us? NuttX Dashboard shows a Snapshot of Failed Builds for the present moment. (Pic above) We may Filter the Builds by Architecture, Board and Config... Filter the Builds by Architecture, Board and Config The snapshot includes builds from the (community-hosted) NuttX Build Farm as well as GitHub Actions (twice-daily builds). To see GitHub Actions Only: Click [+] and set User to NuttX... Show GitHub Actions Only To see the History of Builds: Click the link for "NuttX Build History". Remember to select the Board and Config. (Pic below) Sounds Great! What's the URL? Sorry can't print it here, our dashboard is under attack by WordPress Malware Bots (!). Please head over to NuttX Repo and seek NuttX-Dashboard. (Dog Tea? Organic!) Build History Dashboard SS1 Build Score What's this Build Score? Our NuttX Dashboard needs to know the "Goodiness" of Every NuttX Build (pic above). Whether it's a... * Total Fail: "undefined reference to atomic_fetch_add_2" * Warning: "nuttx has a LOAD segment with RWX permission" * Success: NuttX compiles and links OK That's why we assign a Build Score for every build... Score Status Example 0.0 Error undefined reference to atomic_fetch_add_2 0.5 Warning nuttx has a LOAD segment with RWX permission 0.8 Unknown STM32_USE_LEGACY_PINMAP will be deprecated 1.0 Success (No Errors and Warnings) Which makes it simpler to Colour-Code our Dashboard: Green (Success) / Yellow (Warning) / Red (Error). Build Scores for NuttX Dashboard Sounds easy? But we'll catch Multiple Kinds of Errors (in various formats) * Compile Errors: "return with no value" * Linker Errors: "undefined reference to atomic_fetch_add_2" * Config Errors: "modified: sim/configs/rtptools/defconfig" * Network Errors: "curl 92 HTTP/2 stream 0 was not closed cleanly" * CI Test Failures: "test_pipe FAILED" Doesn't the Build Score vary over time? Yep the Build Score is actually a Time Series Metric! It will have the following dimensions... * Timestamp: When the NuttX Build was executed (2024-11-24T00:00:00) * User: Whose PC executed the NuttX Build (nuttxpr) * Target: NuttX Target that we're building (milkv_duos:nsh) Which will fold neatly into this URL, as we'll soon see... localhost:9091/metrics/job/nuttxpr/instance/milkv_duos:nsh Where do we store the Build Scores? Inside a special open-source Time Series Database called Prometheus. We'll come back to Prometheus, first we study the Dashboard... Grafana Dashboard SS2 Grafana Dashboard What's this Grafana? Grafana is an open-source toolkit for creating Monitoring Dashboards. Sadly there isn't a "programming language" for coding Grafana. Thus we walk through the steps to create our NuttX Dashboard with Grafana... ## Install Grafana on Ubuntu ## See https://grafana.com/docs/grafana/latest/setup-grafana/installation/debian/ sudo apt install grafana sudo systemctl start grafana-server ## Or macOS brew install grafana brew services start grafana ## Browse to http://localhost:3000 ## Login as `admin` for username and password 1. Inside Grafana: We create a New Dashboard... Create a New Dashboard 2. Add a Visualisation... Add a Visualisation 3. Select the Prometheus Data Source (we'll explain why) Select the Prometheus Data Source 4. Change the Visualisation to "Table" (top right) Choose Build Score as the Metric. Click "Run Queries"... Change to Table Visualisation 5. We see a list of Build Scores in the Data Table above. But where's the Timestamp, Board and Config? That's why we do Transformations > Add Transformation > Labels To Fields Transform Label To Fields 6. And the data appears! Timestamp, Board, Config, ... Build Scores with Timestamp, Board, Config 7. Hmmm it's the same Board and Config... Just different Timestamps. We click Queries > Format: Table > Type: Instant > Refresh Change to Instant Query 8. Much better! We see the Build Score at the End of Each Row (to be colourised) Build Scores 9. Our NuttX Deashboard is nearly ready. To check our progress: Click Inspect > Panel JSON Inspect Panel JSON 10. And compare with our Completed Panel JSON... Panel: Builds with Errors and Warnings Panel: Successful Builds 11. How to get there? Watch the steps... "All Builds Dashboard" "Build History Dashboard" Prometheus Metrics SS3 Prometheus Metrics We saw the setup for Grafana Dashboard. What about the Prometheus Metrics? Remember that our Build Scores are stored inside a special (open-source) Time Series Database called Prometheus. This is how we install Prometheus... ## Install Prometheus on Ubuntu sudp apt install prometheus sudo systemctl start prometheus ## Or macOS brew install prometheus brew services start prometheus ## TODO: Update the Prometheus Config ## Edit /etc/prometheus/prometheus.yml (Ubuntu) ## Or /opt/homebrew/etc/prometheus.yml (macOS) ## Replace by contents of ## https://github.com/lupyuen/ingest-nuttx-builds/blob/main/prometheus.yml ## Restart Prometheus sudo systemctl restart prometheus ## Ubuntu brew services restart prometheus ## macOS ## Check that Prometheus is up ## http://localhost:9090 Prometheus looks like this... Prometheus User Interface Recall that we assign a Build Score for every build... Score Status Example 0.0 Error undefined reference to atomic_fetch_add_2 0.5 Warning nuttx has a LOAD segment with RWX permission 0.8 Unknown STM32_USE_LEGACY_PINMAP will be deprecated 1.0 Success (No Errors and Warnings) This is how we Load a Build Score into Prometheus... ## Install GoLang sudo apt install golang-go ## For Ubuntu brew install go ## For macOS ## Install Pushgateway git clone https://github.com/prometheus/pushgateway cd pushgateway go run main.go ## Check that Pushgateway is up ## http://localhost:9091 ## Load a Build Score into Pushgateway ## Build Score is 0 for User nuttxpr, Target milkv_duos:nsh cat </tmp/defconfig.txt ## Ingest the Build Logs from GitHub Gists: `nuttxpr` ## Remove special characters so they don't mess up the terminal. git clone https://github.com/lupyuen/ingest-nuttx-builds cd ingest-nuttx-builds cargo run -- \ --user nuttxpr \ --defconfig /tmp/defconfig.txt \ | tr -d '\033\007' Which will Identify Errors and Warnings in the logs: main.rs // To Identify Errors and Warnings: // We skip the known lines if line.starts_with("-- ") || // "-- Build type:" line.starts_with("----------") || line.starts_with("Cleaning") || line.starts_with("Configuring") || line.starts_with("Select") || line.starts_with("Disabling") || line.starts_with("Enabling") || line.starts_with("Building") || line.starts_with("Normalize") || line.starts_with("% Total") || line.starts_with("Dload") || line.starts_with("~/apps") || line.starts_with("~/nuttx") || line.starts_with("find: 'boards/") || // "find: 'boards/risc-v/q[0-d]*': No such file or directory" line.starts_with("| ^~~~~~~") || // `warning "FPU test not built; Only available in the flat build (CONFIG_BUILD_FLAT)"` line.contains("FPU test not built") || line.starts_with("a nuttx-export-") || // "a nuttx-export-12.7.0/tools/incdir.c" line.contains(" PASSED") || // CI Test: "test_hello PASSED" line.contains(" SKIPPED") || // CI Test: "test_mm SKIPPED" line.contains("On branch master") || // "On branch master" line.contains("Your branch is up to date") || // "Your branch is up to date with 'origin/master'" line.contains("Changes not staged for commit") || // "Changes not staged for commit:" line.contains("git add ") || // "(use "git add ..." to update what will be committed)" line.contains("git restore ") // "(use "git restore ..." to discard changes in working directory)" { continue; } // Skip Downloads: "100 533k 0 533k 0 0 541k 0 --:--:-- --:--:-- --:--:-- 541k100 1646k 0 1646k 0 0 1573k 0 --:--:-- 0:00:01 --:--:-- 17.8M" let re = Regex::new(r#"^[0-9]+\s+[0-9]+"#).unwrap(); let caps = re.captures(line); if caps.is_some() { continue; } Then compute the Build Score: main.rs // Not an error: // "test_ltp_interfaces_aio_error_1_1 PASSED" // "lua-5.4.0/testes/errors.lua" // "nuttx-export-12.7.0/include/libcxx/__system_error" let msg_join = msg.join(" "); let contains_error = msg_join .replace("aio_error", "aio_e_r_r_o_r") .replace("errors.lua", "e_r_r_o_r_s.lua") .replace("_error", "_e_r_r_o_r") .replace("error_", "e_r_r_o_r_") .to_lowercase() .contains("error"); // Identify CI Test as Error: "test_helloxx FAILED" let contains_error = contains_error || msg_join.contains(" FAILED"); // Given Board=sim, Config=rtptools // Identify defconfig as Error: "modified:...boards/sim/sim/sim/configs/rtptools/defconfig" let target_split = target.split(":").collect::>(); let board = target_split[0]; let config = target_split[1]; let board_config = format!("/{board}/configs/{config}/defconfig"); let contains_error = contains_error || ( msg_join.contains(&"modified:") && msg_join.contains(&"boards/") && msg_join.contains(&board_config.as_str()) ); // Search for Warnings let contains_warning = msg_join .to_lowercase() .contains("warning"); // Compute the Build Score based on Error vs Warning let build_score = if msg.is_empty() { 1.0 } else if contains_error { 0.0 } else if contains_warning { 0.5 } else { 0.8 }; And post the Build Scores to Pushgateway: main.rs // Compose the Pushgateway Metric let body = format!( r##" build_score ... version= ... "##); // Post to Pushgateway over HTTP let client = reqwest::Client::new(); let pushgateway = format!("http://localhost:9091/metrics/job/{user}/instance/{target}"); let res = client .post(pushgateway) .body(body) .send() .await?; Why do we need the defconfigs? ## Find all defconfig pathnames in NuttX Repo git clone https://github.com/apache/nuttx find nuttx \ -name defconfig \ >/tmp/defconfig.txt ## defconfig.txt contains: ## boards/risc-v/sg2000/milkv_duos/configs/nsh/defconfig ## boards/arm/rp2040/seeed-xiao-rp2040/configs/ws2812/defconfig ## boards/xtensa/esp32/esp32-devkitc/configs/knsh/defconfig Suppose we're ingesting a NuttX Target milkv_duos:nsh. To identify the Target's Sub-Architecture (sg2000), we search for milkv_duos/.../nsh in the defconfig pathnames: main.rs // Given a list of all defconfig pathnames: // Search for a Target ("milkv_duos:nsh") // Return the Sub-Architecture ("sg2000") async fn get_sub_arch(defconfig: &str, target: &str) -> Result> { let target_split = target.split(":").collect::>(); let board = target_split[0]; let config = target_split[1]; // defconfig contains ".../boards/risc-v/sg2000/milkv_duos/configs/nsh/defconfig" // Search for "/{board}/configs/{config}/defconfig" let search = format!("/{board}/configs/{config}/defconfig"); let input = File::open(defconfig).unwrap(); let buffered = BufReader::new(input); for line in buffered.lines() { // Sub-Architecture appears before "/{board}" let line = line.unwrap(); if let Some(pos) = line.find(&search) { let s = &line[0..pos]; let slash = s.rfind("/").unwrap(); let subarch = s[slash + 1..].to_string(); return Ok(subarch); } } Ok("unknown".into()) } Phew the Errors and Warnings are so complicated! Yeah our Build Logs appear in all shapes and sizes. We might need to standardise the way we present the logs. Refurbished 12-Core Xeon ThinkStation ($400 / 24 kg!) becomes (hefty) Ubuntu Build Farm for Apache NuttX RTOS. 4 times the throughput of a PC! Refurbished 12-Core Xeon ThinkStation ($400 / 24 kg!) becomes (hefty) Ubuntu Build Farm for Apache NuttX RTOS. 4 times the throughput of a PC! SS5 Ingest from GitHub Actions What about the Build Logs from GitHub Actions? It gets a little more complicated, we need to download the Build Logs from GitHub Actions. But before that, we need the GitHub Run ID to identify the Build Job: github.sh ## Fetch the Jobs for the Run ID. Get the Job ID for the Job Name. local os=$1 ## "Linux" or "msys2" local step=$2 ## "7" or "9" local group=$3 ## "arm-01" local job_name="$os ($group)" local job_id=$( curl -L \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer $GITHUB_TOKEN" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/repos/$user/$repo/actions/runs/$run_id/jobs?per_page=100 \ | jq ".jobs | map(select(.name == \"$job_name\")) | .[].id" ) Now we can Download the Run Logs: github.sh ## Download the Run Logs from GitHub ## https://docs.github.com/en/rest/actions/workflow-runs?apiVersion=2022-11-28#download-workflow-run-logs curl -L \ --output /tmp/run-log.zip \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer $GITHUB_TOKEN" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/repos/$user/$repo/actions/runs/$run_id/logs For Each Target Group: We ingest the Log File: github.sh ## For All Target Groups ## TODO: Handle macOS when the warnings have been cleaned up for group in \ arm-01 arm-02 arm-03 arm-04 \ arm-05 arm-06 arm-07 arm-08 \ arm-09 arm-10 arm-11 arm-12 \ arm-13 arm-14 \ risc-v-01 risc-v-02 risc-v-03 risc-v-04 \ risc-v-05 risc-v-06 \ sim-01 sim-02 sim-03 \ xtensa-01 xtensa-02 \ arm64-01 x86_64-01 other msys2 do ## Ingest the Log File if [[ "$group" == "msys2" ]]; then ingest_log "msys2" $msys2_step $group else ingest_log "Linux" $linux_step $group fi done (ingest_log is here) Which will be ingested like this: github.sh ## Ingest the Log Files from GitHub Actions cargo run -- \ --user $user \ --repo $repo \ --defconfig $defconfig \ --file $pathname \ --nuttx-hash $nuttx_hash \ --apps-hash $apps_hash \ --group $group \ --run-id $run_id \ --job-id $job_id \ --step $step ## user=NuttX ## repo=nuttx ## defconfig=/tmp/defconfig.txt (from earlier) ## pathname=/tmp/ingest-nuttx-builds/ci-arm-01.log ## nuttx_hash=7f84a64109f94787d92c2f44465e43fde6f3d28f ## apps_hash=d6edbd0cec72cb44ceb9d0f5b932cbd7a2b96288 ## group=arm-01 ## run_id=11603561928 ## job_id=32310817851 ## step=7 How to run all this? We ingest the GitHub Logs right after the Twice-Daily Build of NuttX. (00:00 UTC and 12:00 UTC) Thus it makes sense to bundle the Build and Ingest into One Single Script: build-github-and-ingest.sh ## Build NuttX Mirror Repo and Ingest NuttX Build Logs ## from GitHub Actions into Prometheus Pushgateway ## TODO: Twice Daily at 00:00 UTC and 12:00 UTC ## Go to NuttX Mirror Repo: github.com/NuttX/nuttx ## Click Sync Fork > Discard Commits ## Start the Linux, macOS and Windows Builds for NuttX ## https://github.com/lupyuen/nuttx-release/blob/main/enable-macos-windows.sh ~/nuttx-release/enable-macos-windows.sh ## Wait for the NuttX Build to start sleep 300 ## Wait for the NuttX Build to complete ## Then ingest the GitHub Logs ## https://github.com/lupyuen/ingest-nuttx-builds/blob/main/github.sh ./github.sh And that's how we created our Continuous Integration Dashboard for NuttX! (Please join our Build Farm ) Continuous Integration Dashboard for Apache NuttX RTOS (Prometheus and Grafana) SS6 What's Next Why are we doing all this? That's because we can't afford to run Complete CI Checks on Every Pull Request! We expect some breakage, and NuttX Dashboard will help with the fixing. What happens when NuttX Dashboard reports a Broken Build? Right now we scramble to identify the Breaking Commit. And prevent more Broken Commits from piling on. Yes NuttX Dashboard will tell us the Commit Hashes for the Build History. But the Batched Commits aren't Temporally Precise, and we race against time to inspect and recompile each Past Commit. Can we automate this? Yeah someday our NuttX Build Farm shall "Rewind The Build" when something breaks... Automatically Backtrack the Commits, Compile each Commit and discover the Breaking Commit. (Like this) Any more stories of NuttX CI? Next Article: We chat about the updated NuttX Build Farm that runs on macOS for Apple Silicon. (Great news for NuttX Devs on macOS) Then we study the internals of a Mystifying Bug that concerns PyTest, QEMU RISC-V and expect. (So it will disappear sooner from NuttX Dashboard) Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years. * Sponsor me a coffee * Discuss this article on Hacker News * My Current Project: "Apache NuttX RTOS for Sophgo SG2000" * My Other Project: "NuttX for Ox64 BL808" * Older Project: "NuttX for Star64 JH7110" * Olderer Project: "NuttX for PinePhone" * Check out my articles * RSS Feed Got a question, comment or suggestion? Create an Issue or submit a Pull Request here... lupyuen.github.io/src/ci4.md Continuous Integration Dashboard for Apache NuttX RTOS SS7 Appendix: All Builds Dashboard Earlier we spoke about creating the NuttX Dashboard (pic above). And we created a Rudimentary Dashboard with Grafana... * "Grafana Dashboard" We nearly completed the Panel JSON... * Panel: Builds with Errors and Warnings * Panel: Successful Builds Let's flesh out the remaining bits of our creation. Before we begin: Check that our Prometheus Data Source is configured to fetch the Build Scores from Prometheus and Pushgateway... Configure our Prometheus Data Source (Remember to set prometheus.yml) Head back to our upcoming dashboard... 1. This is how we Filter by Arch, Sub-Arch, Board, Config, which we defined as Dashboard Variables (see below) Filter by Arch, Sub-Arch, Board, Config 2. Why match the Funny Timestamps? Well mistakes were make. We exclude these Timestamps so they won't appear in the dashboard... We exclude these Timestamps 3. For Builds with Errors and Warnings: We select Values (Build Scores) <= 0.5... select Values (Build Scores) <= 0.5 4. We Rename and Reorder the Fields... Rename the Fields 5. Set the Timestamp to Lower Case, Config to Upper Case... Set the Timestamp to Lower Case, Config to Upper Case 6. Set the Color Scheme to From Thresholds By Value Set the Data Links: Title becomes "Show the Build Log", URL becomes "${__data.fields.url}" Colour the Values (Build Scores) with the Value Mappings below Set the Color Scheme and Data Links 7. And we'll achieve this Completed Panel JSON... Panel: Builds with Errors and Warnings What about the Successful Builds? 1. Copy the Panel for "Builds with Errors and Warnings" Paste into a New Panel: "Successful Builds" 2. Select Values (Build Scores) > 0.5 Select Values (Build Scores) > 0.5 3. And we'll accomplish this Completed Panel JSON Panel: Successful Builds And the Highlights Panel at the top? 1. Copy the Panel for "Builds with Errors and Warnings" Paste into a New Panel: "Highlights of Errors / Warnings" 2. Change the Visualisation from "Table" to "Stat" (top right) Change the Visualisation from "Table" to "Stat" 3. Select Sort by Value (Build Score) and Limit to 8 Items... Sort by Value and Limit to 8 Items 4. And we'll get this Completed Panel JSON Panel: Highlights of Errors / Warnings 5. Also check out the Dashboard JSON and Links Panel ("See the NuttX Build History") Links Panel Dashboard JSON Which will define the Dashboard Variables... Dashboard Variables Up Next: The NuttX Dashboard for Build History... Build History Dashboard SS8 Appendix: Build History Dashboard In the previous section: We created the NuttX Dashboard for Errors, Warnings and Successful Builds. Now we do the same for Build History Dashboard (pic above)... 1. Copy the Dashboard from the previous section. Delete all Panels, except "Builds with Errors and Warnings". Edit the Panel. 2. Under Queries: Set Options > Type to Range Set Type to Range 3. Under Transformations: Set Group By to First Severity, First Board, First Config, First Build Log, First Apps Hash, First NuttX Hash In Organise Fields By Name: Rename and Reorder the fields as shown below Set the Value Mappings below Organise Fields By Name and Value Mappings 4. Here are the Panel and Dashboard JSON... Panel: Build History Dashboard: NuttX Build History Is Grafana really safe for web hosting? Use this (safer) Grafana Configuration: grafana.ini * Modified Entries are tagged by "TODO" * For Ubuntu: Copy to /etc/grafana/grafana.ini * For macOS: Copy to /opt/homebrew/etc/grafana/grafana.ini Watch out for the pesky WordPress Malware Bots! This might help: show-log.sh ## Show Logs from Grafana log_file=/var/log/grafana/grafana.log ## For Ubuntu log_file=/opt/homebrew/var/log/grafana/grafana.log ## For macOS ## Watch for any suspicious activity for (( ; ; )); do clear tail -f $log_file \ | grep --line-buffered 'logger=context ' \ | grep --line-buffered -v ' path=/api/frontend-metrics ' \ | grep --line-buffered -v ' path=/api/live/ws ' \ | grep --line-buffered -v ' path=/api/plugins/grafana-lokiexplore-app/settings ' \ | grep --line-buffered -v ' path=/api/user/auth-tokens/rotate ' \ | grep --line-buffered -v ' path=/favicon.ico ' \ | grep --line-buffered -v ' remote_addr=\[::1\] ' \ | cut -d ' ' -f 9-15 \ & ## Restart the log display every 12 hours, due to Log Rotation sleep $(( 12 * 60 * 60 )) kill %1 done