Medium priority fixes: stability, monitoring, validation, and documentation#2
Closed
Medium priority fixes: stability, monitoring, validation, and documentation#2
Conversation
…r. update to version 0.39.2
…codes. Optimize debug messages. Improve python tester script. Update version number to 0.40.1
…n message 1005 interval to 10 for saving bandwidth. Update version to 0.41.1
Merge branch 'feature/test_github_actions'
…r handling and update baud rates
…, and update connection method
Critical Bug Fixes: - Add mutex protection for NTRIPStatus structs to prevent race conditions between NTRIP task and GPS callback - Fix RTCM buffer overflow handling to discard corrupted data instead of forwarding it - Fix critical task delay bug in GPS UART task (vTaskDelay calculation was evaluating to 0) - Fix millis overflow handling in RTCM timeout check using proper unsigned arithmetic - Add connection state validation to detect and sync client/status mismatches Stability Improvements: - Improve OTA error handling with proper abort on failures and state tracking - Reduce debug log verbosity by removing per-byte RTCM logging - Add proper error messages for RTCM length and CRC failures These fixes address memory safety, task scheduling, and connection reliability issues.
Undefine W6100 library default GPIO pins before setting custom hardware configuration. This eliminates compiler warnings about macro redefinition.
Test Coverage: - RTCM buffer processing: message parsing, CRC validation, overflow protection - Base64 encoding: RFC 4648 test vectors, NTRIP auth format, edge cases - Timing/overflow: millis() overflow handling, 49-day uptime scenarios Tests run on native platform for fast iteration without hardware. Uses Unity test framework built into PlatformIO. Run with: pio test -e native
Test Results: 27/27 tests passing on native platform Test Structure: - Organize tests into PlatformIO convention (test_*/test_main.cpp) - test_base64: 11 tests (RFC 4648 vectors, NTRIP auth, binary data with null bytes) - test_rtcm_buffer: 8 tests (CRC validation, overflow, message parsing) - test_timing: 8 tests (millis overflow, 49-day uptime scenarios) RTCM Buffer Isolation: - Add UNIT_TEST guards to separate pure logic from Arduino dependencies - Mock logging functions when UNIT_TEST defined - Include string.h for memset - Copy rtcmbuffer.h/cpp to test directory for self-contained testing - Expose parse_rtcm_length() and get_rtcm_message_type() in header Base64 Test Improvements: - Add length-based String constructor to properly handle binary data - Test actual null bytes (0x00) in binary data test - Verify exact expected output for binary encoding Configuration: - Set test_build_src = no to avoid compiling Arduino-dependent source files - Tests run independently on native platform without ESP32 hardware Run tests: pio test -e native
Changes: - Add protocolVersion field to NTRIPStatus struct for tracking per-connection protocol - Split ntripVersion setting into ntripVersion1 and ntripVersion2 for independent caster configuration - Update settings read/write to handle separate version settings per caster This allows primary and secondary casters to use different NTRIP protocol versions.
Changes: - Remove global ntripVersion toggle - Add separate ntripVersion1 and ntripVersion2 toggles for each caster - Position version toggle next to each caster's enable switch - Add proper enable/disable handling for version toggles - Update form handling to submit separate version settings - Improve caster header layout with flexbox - Add disabled state styling for switches - Add mobile responsive design This allows independent NTRIP protocol version selection for primary and secondary casters.
Rate limit connection health checks to every 5 seconds instead of every 1ms. The client.connected() call can temporarily return false during write operations, causing false disconnects when checked too frequently.
1. Remove duplicate error logging in checkConnectionHealth() - only return error code and let caller handle logging 2. Improve connection detection - check both connected() and available() to avoid false positives during write operations 3. Initialize lastHealthCheck_ms to current time to prevent immediate health check on first connection attempt This fixes the issue where connections were immediately checked after boot before the 5-second interval was properly established, and removes the double error message output.
Change web UI labels from "RTK2go mountpoint/username/password" to generic "Mountpoint/Username/Password" since the system works with any NTRIP caster, not just RTK2go. Updated labels: - "RTK2go mountpoint" → "Mountpoint" - "RTK2go username" → "Username" - "RTK2go password" → "Password" Updated descriptions: - "RTK2go Mount point" → "NTRIP caster mountpoint" - "Backup RTK2go Mount point" → "Backup NTRIP caster mountpoint"
NTRIP 1.0 protocol uses only password for authentication (SOURCE method), while NTRIP 2.0 uses HTTP Basic Auth with username:password. Changes: - Username field is disabled and cleared when NTRIP 1.0 is selected - Username field is enabled and required when NTRIP 2.0 is selected - State updates on page load and when version toggle changes - Applies to both primary and secondary caster configurations This prevents user confusion about whether username is needed for NTRIP 1.0 connections and ensures form validation is correct.
Run unit tests before building firmware in CI/CD pipeline. Tests will run on the native environment and must pass before firmware is built and released. This ensures: - Base64 encoding is correct (11 tests) - RTCM buffer parsing works properly (8 tests) - Timing/millis overflow handling is correct (8 tests) Total: 27 unit tests must pass for release builds
v0.42.0: NTRIP 2.0 support, connection stability fixes, and unit tests
Replace unsigned long with uint32_t throughout timing tests to ensure consistent behavior across platforms: - Linux: unsigned long is 64-bit - Windows/ESP32: unsigned long is 32-bit This caused overflow tests to fail on Linux (GitHub Actions) because the arithmetic didn't wrap at 32-bit boundaries. Using uint32_t explicitly matches ESP32's millis() behavior. Also add test workflow for PRs and main branch pushes to catch platform-specific issues before release. All 27 tests now pass on both Windows and Linux.
Prevents native environment from attempting to build Arduino source files which causes build failures.
Security fixes: - Remove GET handler for /applySettings to prevent credentials in URL query parameters - Fix XSS vulnerability by replacing innerHTML with textContent in web UI - Add input validation for port numbers (1-65535) - Add input validation for NTRIP version (only 1 or 2) - Add bounds checking for ECEF coordinates (±7,000 km) Quality fixes: - Fix memory leak in UDP logging by cleaning up previous stream before creating new one - Fix NTRIP mutex race condition by holding mutex for entire critical section - Ensure bytesSent counter updates are atomic and consistent
Corrects variable name typo in GPS module (gps.cpp:18, 322-323)
Validate snprintf return value to detect buffer truncation and prevent potential buffer overflow when building NTRIP authentication requests. Add BUFFER_OVERFLOW error type and proper error handling.
Prevent indefinite blocking if DHCP server is unavailable. Device will now fail gracefully after 30 seconds instead of hanging forever.
Add ESP32 Task Watchdog Timer (TWDT) with 30-second timeout to automatically recover from system hangs. Watchdog is reset every loop iteration to confirm normal operation.
Add comprehensive documentation of timeouts, buffer sizes, task stack sizes, and other configuration constants. Replace hardcoded values throughout codebase with named constants from defines.h for better maintainability.
Change loop delay from 1000ms to 100ms to allow for more responsive system shutdown and control while maintaining low CPU usage.
Test port range validation (1-65535), NTRIP version validation (1 or 2), ECEF coordinate bounds checking (±7000km), and username length limits (15 chars). All 14 new tests passing. Total: 41/41 tests pass.
Disable Nagle's algorithm (TCP_NODELAY) and add explicit flush() after each chunked transfer to prevent 7-second gaps and bursts in RTCM data delivery. This ensures consistent 1Hz timing for RTK corrections as required by GNSS receivers. Addresses server-side analysis showing chunked encoding buffering causing position solution degradation.
Problem: UDPStream was sending each character as a separate 1-byte UDP broadcast packet, causing ~3000 packets/second network flooding. Solution: Implemented 512-byte buffer that accumulates characters and sends complete packets only when: - Newline character is encountered (end of log line) - Buffer is full - flush() is called explicitly This reduces UDP traffic from thousands of packets per second to just a few packets per second (one per log line). Impact: Eliminates network congestion and improves overall system performance for UDP broadcast logging.
Problem: RTCM buffer was only 256 bytes but RTCM messages can be up to 1029 bytes (3 header + 1023 payload + 3 CRC), causing constant buffer overflow errors every second. Solution: Increased MAX_BUFFER_LEN from 256 to 1029 bytes to accommodate the maximum RTCM message size as defined in the RTCM 3.x specification. Impact: Eliminates "RTCM buffer overflow - discarding corrupted data" errors and ensures all RTCM messages are processed correctly, regardless of size.
Added detailed comments explaining two complex subsystems: 1. GPS Task Synchronization (gps.cpp): - Documented fast_uart_handle flag purpose and usage - Explained coordination between gpsStatusTask and gps_uart_check_task - Clarified thread-safety concerns with UART access - Added usage comments to enable/disable functions 2. RTCM Buffer State Machine (rtcmbuffer.cpp): - Documented complete RTCM 3.x message format - Explained 5-state processing flow (IDLE → HEADER → PAYLOAD → CRC → RESET) - Clarified 10-bit length encoding across bytes - Documented CRC24 computation and verification process - Added inline comments for each state transition These comments address medium-priority documentation issues identified in code review and make the codebase more maintainable.
Added periodic stack watermark logging (every 5 minutes) to monitor actual stack usage for all FreeRTOS tasks: - gpsStatusTask (4096 bytes) - gps_uart_check_task (10000 bytes) - NTRIPTask (8192 bytes) - WebServerTask (8192 bytes) Stack watermark shows the minimum free stack space ever reached, helping identify if tasks are over/under-provisioned. This data will be used to optimize stack sizes based on actual runtime usage. Lower watermarks (< 500 bytes) indicate potential stack overflow risk. Higher watermarks (> 50% free) suggest oversized allocation.
Added comprehensive client-side validation before form submission: Validation checks: - Port numbers: Must be 1-65535 for enabled casters - Usernames: Maximum 15 characters (NTRIP spec limit) - ECEF coordinates: Within ±7000 km (±700000 cm) bounds User experience improvements: - Validation runs before confirmation dialog - Clear error messages listing all validation failures - Confirmation dialog prevents accidental settings changes - Warns user that device may restart after save This matches server-side validation in settings.cpp and provides immediate feedback to users, reducing invalid submissions and improving overall system reliability.
Replaced hardcoded stack sizes in xTaskCreate() calls with centralized defines from defines.h: Changes: - GPS status task: 4096 bytes (GPS_STATUS_TASK_STACK) - GPS UART task: 10000 bytes (GPS_UART_CHECK_TASK_STACK) - NTRIP task: 8192 bytes (NTRIP_TASK_STACK) - Web server task: 8192 bytes (WEB_SERVER_TASK_STACK) Benefits: - Single source of truth for stack configurations - Easier to tune based on watermark monitoring data - Stack monitoring now uses same constants for accurate reporting - Reduces risk of mismatch between allocation and documentation Updated stack monitoring to dynamically display configured sizes.
Added min/max validation attributes to provide browser-level validation: Port fields (casterPort1, casterPort2): - min="1" max="65535" - Updated help text to show valid range ECEF coordinate fields (ecefX, ecefY, ecefZ): - min="-700000" max="700000" (±7000 km in centimeters) - Updated help text to clarify limits Username fields already had maxlength="15" attribute. Benefits: - Browser provides immediate visual feedback on invalid input - Prevents form submission with out-of-range values - Works alongside JavaScript validation as defense-in-depth - Matches server-side validation in settings.cpp - Improved user experience with native browser validation UI
UART RX buffer was too small (5KB) and polling too slow (1ms), causing data loss at 460800 baud (~46KB/s). This resulted in: - Buffer overflow errors - CRC errors from corrupted/partial RTCM messages - Loss of correction data Changes: - Increase Serial1 RX buffer from 5KB to 8KB (174ms @ 460800 baud) - Remove 1ms delay in uart task - yield CPU only when buffer empty - Add buffer usage monitoring and warnings at 75% capacity - Track peak buffer usage for diagnostics The combination of larger buffer + faster polling should eliminate buffer overflows and resulting CRC errors.
The high-priority GPS UART task (priority 23) was starving lower-priority tasks including loopTask which feeds the watchdog. Using taskYIELD() kept the task scheduled continuously without allowing watchdog feeding. Solution: Always use vTaskDelay(1) instead of taskYIELD() to ensure lower-priority tasks get CPU time. The 8KB buffer provides sufficient margin (~140ms at 460800 baud) even with 1ms delays between iterations. Verified: Buffer usage remains low at 1.5% (120/8192 bytes)
Task name fixes: - Rename "gps_uart_check_task" to "gpsUartTask" (20 chars -> 11 chars) - FreeRTOS has 15 character limit for task names (16 with null terminator) - xTaskGetHandle() was asserting and crashing when looking up long name - Update both task creation and stack monitoring code Log buffer increase: - Increase LOG_SIZE from 20 to 50 entries for web UI - Provides more history for debugging without excessive memory usage - Users can now see more log context in the web interface Tested: System runs stable, no crashes, RTCM data flowing correctly
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses all medium-priority issues identified in the code review, including critical bug fixes, reliability improvements, code quality enhancements, and comprehensive testing.
Critical Bug Fixes (3)
UDP Broadcast Flooding
RTCM Buffer Overflow
TCP Real-Time Streaming
Reliability Improvements (3)
Code Quality (4)
Testing & Monitoring (2)
Web UI Improvements (3)
Files Changed
Test Plan
Breaking Changes
None - all changes are backward compatible.