Monitoring Best Practices

Signal Configuration

Standardize Signal Naming

Use consistent signal names across all systems. When every system uses “battery_voltage” for the same measurement, you can create one trigger that applies uniformly. Inconsistent naming requires duplicate triggers and complicates management. Good Examples:

battery_voltage (not batt_v, voltage_battery, bat_volt)
motor_temperature (not temp_motor, motor_temp, temperature_motor)
obstacle_distance (not dist_obstacle, obstacle_dist)

Convention Recommendations:

Use lowercase with underscores
Start with the component, then the metric (e.g., battery_voltage, motor_current)
Use full words, not abbreviations
Document your naming convention and enforce it across all systems

Choose Appropriate Signal Types

Match signal types to your data:

NUMBER: Continuous values (voltage, temperature, distance)
STRING: Text states (error messages, statuses)
BOOLEAN: Binary conditions (door_open, emergency_stop)

Using the correct type ensures proper threshold comparisons and data visualization.

Trigger Configuration

Start Conservative, Then Tune

Begin with conservative trigger thresholds that catch genuine issues without generating excessive false positives. Initial Configuration Process:

Set thresholds based on manufacturer specifications or safety limits
Deploy to production and monitor for 1-2 weeks
Review event frequency and patterns
Identify false positives (events that didn’t require action)
Identify false negatives (missed conditions discovered through other means)
Adjust thresholds based on actual system behavior

Example - Battery Monitoring:

Week 0: Set low battery threshold at 12.0V (manufacturer spec: 11.0V minimum)
Week 2 Review: 50 events generated, but only 5 required intervention
Adjustment: Lower threshold to 11.5V to reduce false positives while maintaining safety margin
Week 4 Review: 8 events generated, all requiring intervention ✓

Account for Environmental Variations

System behavior changes with environmental conditions. Adjust your monitoring configuration accordingly: Seasonal Adjustments:

Winter: Battery voltage drops faster in cold temperatures. Increase low battery thresholds slightly (e.g., from 11.5V to 11.8V) to provide more warning time.
Summer: Higher ambient temperatures mean motors and batteries run hotter. Adjust overheating thresholds to account for seasonal baselines while still catching genuine issues.

Operational Context:

Peak Hours: Systems under heavy load may exhibit different normal ranges
Off-Peak Hours: Idle time thresholds may need different values
Maintenance Windows: Temporarily disable non-critical triggers during scheduled maintenance

Use Multiple Conditions for Precision

Combine multiple signal conditions to create more precise triggers that reduce false positives. Example - Battery Aging Detection: Instead of a single condition:

❌ charge_cycles > 800 (too many false positives on healthy batteries)

Use combined conditions:

✓ charge_cycles > 800 AND battery_voltage < 12.0V (indicates genuine degradation)

Example - Performance Issues: Instead of:

❌ task_duration > 300 seconds (may be normal for complex tasks)

Use:

✓ task_duration > 300 seconds AND error_count > 0 (indicates actual problem)

Deployment Strategies

Validate Before Wide Rollout

Test new triggers on a small subset of systems before deploying to all. Gradual Rollout Process:

Phase 1 - Pilot (5-10% of systems):
- Apply new trigger to one zone or system type
- Monitor for 24-48 hours
- Verify trigger behavior matches expectations
Phase 2 - Review:
- Check for false positives (unnecessary events)
- Check for false negatives (missed conditions)
- Adjust thresholds if needed
Phase 3 - Full Deployment:
- Roll out to remaining systems once validated
- Monitor closely for the first week
Phase 4 - Optimize:
- Fine-tune based on fleet-wide patterns

This approach prevents disruption from misconfigured triggers while allowing real-world validation.

Use Labels for Targeted Deployment

Organize your systems using labels to enable targeted trigger deployment and analysis. Organizational Labels (for filtering and analysis):

zone=picking / zone=packing / zone=shipping
model=amr_v2 / model=amr_v3
shift=day / shift=night
environment=indoor / environment=outdoor

Operational Labels (for maintenance tracking):

battery_replaced=2024-06-15
last_maintenance=2024-11-28
firmware_version=2.1.3

Trigger Labels (for notification management):

category=battery_health / category=navigation / category=performance
severity=critical / severity=high / severity=medium
action_required=immediate / action_required=schedule_maintenance

Label Best Practices:

Use consistent key names across all systems
Keep values simple and searchable
Document your labeling convention
Update operational labels as systems change

Predictive Maintenance Strategies

Battery Degradation Monitoring

Instead of waiting for batteries to fail unexpectedly, configure triggers that detect degradation patterns early. Early Warning Trigger: Monitor battery health indicators (voltage combined with charge cycle count). When a battery has accumulated significant charge cycles AND voltage is dropping, generate a maintenance notification weeks before actual failure. Example Configuration:

Condition 1: charge_cycles > 800
Condition 2: battery_voltage < 12.0V
Priority: Medium (not urgent, but needs scheduling)
Recording: Capture 0 seconds before, 10 seconds after (minimal data needed)

Maintenance Workflow:

Week 0: Trigger matches degradation pattern → Event generated with medium priority
Week 1: Operations reviews event → Schedules maintenance during next planned downtime
Week 4: Battery replaced during scheduled window, system immediately returns to service
Result: Zero unplanned downtime, optimized battery lifecycle, predictable maintenance costs

Performance Anomaly Detection

Configure triggers to identify systems performing differently from normal baselines. Outlier Detection Approach:

Establish baseline performance (e.g., average task completion time is 4.2 minutes)
Configure triggers that fire when individual systems exceed acceptable variance (e.g., consistently taking 5.0+ minutes per task - 20% slower)
Set threshold to trigger only after pattern confirms (e.g., 3 consecutive slow tasks, not just one)

Example Configuration:

Condition: task_completion_time > (fleet_average * 1.2)
Requires: 3 consecutive matches to avoid false positives from occasional complex tasks
Priority: Medium (requires investigation, not immediate action)

Investigation Workflow: When performance outlier events occur, review associated recordings to understand what’s different. Common root causes include:

Mechanical wear (wheels, motors, sensors)
Software configuration drift
Environmental factors (floor condition, lighting)
Workload imbalance (assigned more difficult routes)

This targeted investigation approach focuses attention on systems that actually need it, rather than requiring constant manual monitoring.

Component Lifecycle Tracking

Track usage patterns to schedule proactive replacements before end-of-life. Examples:

Motor Bearings: Track operating hours and vibration levels
Sensors: Monitor calibration drift over time
Belts/Chains: Track tension variations and operating hours

Configuration Pattern: Combine usage metrics with performance indicators to predict component failure:

operating_hours > 5000 AND vibration_level > baseline * 1.5
calibration_error > 5% AND days_since_calibration > 180

Event Management

Establish Review Routines

Create a structured process for reviewing and acting on events to ensure they drive action rather than accumulating as noise. Daily Reviews:

Review all critical and high-priority events
Respond to immediate issues
Verify resolution of previous day’s critical events

Weekly Reviews:

Analyze event patterns across systems
Identify recurring problems
Adjust trigger thresholds if needed
Review false positive/negative rates

Monthly Reviews:

Review predictive maintenance queue
Schedule upcoming maintenance interventions
Assess overall fleet health trends
Update trigger configurations based on seasonal changes

Configure Appropriate Notification Channels

Match notification methods to event priority: Critical Events (requires immediate action):

Slack with @mentions
Email to multiple recipients
In-app notifications

High Priority Events (requires attention within hours):

Slack notifications
Email to operations team
In-app alerts

Medium/Low Priority Events (review during regular monitoring):

Email notifications
In-app only

Use Recording Windows Strategically

Configure recording windows to capture relevant context without excessive data storage: Short Events (battery low, sensor spike):

Record 60 seconds before, 30 seconds after
Captures immediate context without excessive data

Long Events (overheating, stuck detection):

Record 300 seconds (5 minutes) before, 600 seconds (10 minutes) after
Captures progression and aftermath for root cause analysis

Predictive Maintenance (early warnings):

Record 0 seconds before, 10 seconds after
Minimal recording needed since event marks a pattern, not an incident

Continuous Improvement

Monitor Trigger Effectiveness

Track metrics to assess how well your triggers are working: Key Metrics:

Event Frequency: How often does each trigger fire?
Action Rate: What percentage of events require intervention?
False Positive Rate: Events that didn’t require action
False Negative Rate: Issues discovered outside trigger system
Response Time: Time from event generation to resolution

Optimization Signals:

High event frequency + low action rate = Threshold too sensitive
Low event frequency + high false negatives = Threshold too conservative
Inconsistent response times = Priority levels may need adjustment

Document Configuration Decisions

Maintain documentation explaining your trigger configurations: What to Document:

Why specific thresholds were chosen
Environmental or operational context
Historical adjustments and their rationale
Known edge cases or limitations

This documentation helps:

New team members understand existing configuration
Troubleshoot unexpected behavior
Make informed adjustments over time
Replicate successful patterns across systems

Learn from Incidents

When issues occur, review whether your monitoring could have caught them earlier: Post-Incident Review Questions:

Did an existing trigger fire? If so, was it acted upon promptly?
If no trigger fired, could one have been configured to detect this?
Were there warning signs in the data that weren’t monitored?
Would different thresholds or conditions have provided earlier warning?

Use these insights to continuously refine your monitoring strategy.

Common Pitfalls to Avoid

Alert Fatigue

Problem: Too many low-value alerts cause teams to ignore notifications. Solutions:

Start with fewer, high-confidence triggers
Tune thresholds to reduce false positives
Use appropriate priority levels
Consolidate similar alerts
Regularly review and disable ineffective triggers

Over-Monitoring

Problem: Creating triggers for every possible metric creates noise without value. Solutions:

Focus on conditions that require action
Ask “If this triggers, what would we do?” before creating a trigger
Monitor outcomes that matter, not just metrics
Consolidate related conditions into single triggers

Under-Monitoring

Problem: Missing critical conditions because they weren’t configured. Solutions:

Review incidents to identify missed monitoring opportunities
Implement comprehensive coverage for safety-critical systems
Use gradual rollout to test new trigger ideas
Balance with alert fatigue concerns

Static Configuration

Problem: Triggers configured once and never adjusted despite changing conditions. Solutions:

Schedule regular trigger reviews
Adjust for seasonal variations
Update as systems age or workloads change
Respond to operational feedback

Next Steps

Apply to Your Use Case:

Fleet Monitoring - Apply these practices to fleet management

Learn Related Concepts:

Understanding Events - Deep dive into event lifecycle
Labels Guide - Master label strategies

Reference Documentation:

Trigger Configuration - Create and manage triggers in the UI
Notifications - Configure notification channels

Getting Started

Understanding Heex

Guides & Tutorials

Web Interface

CLI Reference

SDK Development

Deployment & Operations

Signal Configuration

Standardize Signal Naming

Choose Appropriate Signal Types

Trigger Configuration

Start Conservative, Then Tune

Account for Environmental Variations

Use Multiple Conditions for Precision

Deployment Strategies

Validate Before Wide Rollout

Use Labels for Targeted Deployment

Predictive Maintenance Strategies

Battery Degradation Monitoring

Performance Anomaly Detection

Component Lifecycle Tracking

Event Management

Establish Review Routines

Configure Appropriate Notification Channels

Use Recording Windows Strategically

Continuous Improvement

Monitor Trigger Effectiveness

Document Configuration Decisions

Learn from Incidents

Common Pitfalls to Avoid

Alert Fatigue

Over-Monitoring

Under-Monitoring

Static Configuration

Next Steps

Getting Started

Understanding Heex

Guides & Tutorials

Web Interface

CLI Reference

SDK Development

Deployment & Operations

​Signal Configuration

​Standardize Signal Naming

​Choose Appropriate Signal Types

​Trigger Configuration

​Start Conservative, Then Tune

​Account for Environmental Variations

​Use Multiple Conditions for Precision

​Deployment Strategies

​Validate Before Wide Rollout

​Use Labels for Targeted Deployment

​Predictive Maintenance Strategies

​Battery Degradation Monitoring

​Performance Anomaly Detection

​Component Lifecycle Tracking

​Event Management

​Establish Review Routines

​Configure Appropriate Notification Channels

​Use Recording Windows Strategically

​Continuous Improvement

​Monitor Trigger Effectiveness

​Document Configuration Decisions

​Learn from Incidents

​Common Pitfalls to Avoid

​Alert Fatigue

​Over-Monitoring

​Under-Monitoring

​Static Configuration

​Next Steps

Signal Configuration

Standardize Signal Naming

Choose Appropriate Signal Types

Trigger Configuration

Start Conservative, Then Tune

Account for Environmental Variations

Use Multiple Conditions for Precision

Deployment Strategies

Validate Before Wide Rollout

Use Labels for Targeted Deployment

Predictive Maintenance Strategies

Battery Degradation Monitoring

Performance Anomaly Detection

Component Lifecycle Tracking

Event Management

Establish Review Routines

Configure Appropriate Notification Channels

Use Recording Windows Strategically

Continuous Improvement

Monitor Trigger Effectiveness

Document Configuration Decisions

Learn from Incidents

Common Pitfalls to Avoid

Alert Fatigue

Over-Monitoring

Under-Monitoring

Static Configuration

Next Steps