Signal Configuration
Standardize Signal Naming
Use consistent signal names across all systems. When every system uses “battery_voltage” for the same measurement, you can create one trigger that applies uniformly. Inconsistent naming requires duplicate triggers and complicates management. Good Examples:battery_voltage(notbatt_v,voltage_battery,bat_volt)motor_temperature(nottemp_motor,motor_temp,temperature_motor)obstacle_distance(notdist_obstacle,obstacle_dist)
- Use lowercase with underscores
- Start with the component, then the metric (e.g.,
battery_voltage,motor_current) - Use full words, not abbreviations
- Document your naming convention and enforce it across all systems
Choose Appropriate Signal Types
Match signal types to your data:- NUMBER: Continuous values (voltage, temperature, distance)
- STRING: Text states (error messages, statuses)
- BOOLEAN: Binary conditions (door_open, emergency_stop)
Trigger Configuration
Start Conservative, Then Tune
Begin with conservative trigger thresholds that catch genuine issues without generating excessive false positives. Initial Configuration Process:- Set thresholds based on manufacturer specifications or safety limits
- Deploy to production and monitor for 1-2 weeks
- Review event frequency and patterns
- Identify false positives (events that didn’t require action)
- Identify false negatives (missed conditions discovered through other means)
- Adjust thresholds based on actual system behavior
- Week 0: Set low battery threshold at 12.0V (manufacturer spec: 11.0V minimum)
- Week 2 Review: 50 events generated, but only 5 required intervention
- Adjustment: Lower threshold to 11.5V to reduce false positives while maintaining safety margin
- Week 4 Review: 8 events generated, all requiring intervention ✓
Account for Environmental Variations
System behavior changes with environmental conditions. Adjust your monitoring configuration accordingly: Seasonal Adjustments:- Winter: Battery voltage drops faster in cold temperatures. Increase low battery thresholds slightly (e.g., from 11.5V to 11.8V) to provide more warning time.
- Summer: Higher ambient temperatures mean motors and batteries run hotter. Adjust overheating thresholds to account for seasonal baselines while still catching genuine issues.
- Peak Hours: Systems under heavy load may exhibit different normal ranges
- Off-Peak Hours: Idle time thresholds may need different values
- Maintenance Windows: Temporarily disable non-critical triggers during scheduled maintenance
Use Multiple Conditions for Precision
Combine multiple signal conditions to create more precise triggers that reduce false positives. Example - Battery Aging Detection: Instead of a single condition:- ❌
charge_cycles > 800(too many false positives on healthy batteries)
- ✓
charge_cycles > 800 AND battery_voltage < 12.0V(indicates genuine degradation)
- ❌
task_duration > 300 seconds(may be normal for complex tasks)
- ✓
task_duration > 300 seconds AND error_count > 0(indicates actual problem)
Deployment Strategies
Validate Before Wide Rollout
Test new triggers on a small subset of systems before deploying to all. Gradual Rollout Process:-
Phase 1 - Pilot (5-10% of systems):
- Apply new trigger to one zone or system type
- Monitor for 24-48 hours
- Verify trigger behavior matches expectations
-
Phase 2 - Review:
- Check for false positives (unnecessary events)
- Check for false negatives (missed conditions)
- Adjust thresholds if needed
-
Phase 3 - Full Deployment:
- Roll out to remaining systems once validated
- Monitor closely for the first week
-
Phase 4 - Optimize:
- Fine-tune based on fleet-wide patterns
Use Labels for Targeted Deployment
Organize your systems using labels to enable targeted trigger deployment and analysis. Organizational Labels (for filtering and analysis):zone=picking/zone=packing/zone=shippingmodel=amr_v2/model=amr_v3shift=day/shift=nightenvironment=indoor/environment=outdoor
battery_replaced=2024-06-15last_maintenance=2024-11-28firmware_version=2.1.3
category=battery_health/category=navigation/category=performanceseverity=critical/severity=high/severity=mediumaction_required=immediate/action_required=schedule_maintenance
- Use consistent key names across all systems
- Keep values simple and searchable
- Document your labeling convention
- Update operational labels as systems change
Predictive Maintenance Strategies
Battery Degradation Monitoring
Instead of waiting for batteries to fail unexpectedly, configure triggers that detect degradation patterns early. Early Warning Trigger: Monitor battery health indicators (voltage combined with charge cycle count). When a battery has accumulated significant charge cycles AND voltage is dropping, generate a maintenance notification weeks before actual failure. Example Configuration:- Condition 1:
charge_cycles > 800 - Condition 2:
battery_voltage < 12.0V - Priority: Medium (not urgent, but needs scheduling)
- Recording: Capture 0 seconds before, 10 seconds after (minimal data needed)
- Week 0: Trigger matches degradation pattern → Event generated with medium priority
- Week 1: Operations reviews event → Schedules maintenance during next planned downtime
- Week 4: Battery replaced during scheduled window, system immediately returns to service
- Result: Zero unplanned downtime, optimized battery lifecycle, predictable maintenance costs
Performance Anomaly Detection
Configure triggers to identify systems performing differently from normal baselines. Outlier Detection Approach:- Establish baseline performance (e.g., average task completion time is 4.2 minutes)
- Configure triggers that fire when individual systems exceed acceptable variance (e.g., consistently taking 5.0+ minutes per task - 20% slower)
- Set threshold to trigger only after pattern confirms (e.g., 3 consecutive slow tasks, not just one)
- Condition:
task_completion_time > (fleet_average * 1.2) - Requires: 3 consecutive matches to avoid false positives from occasional complex tasks
- Priority: Medium (requires investigation, not immediate action)
- Mechanical wear (wheels, motors, sensors)
- Software configuration drift
- Environmental factors (floor condition, lighting)
- Workload imbalance (assigned more difficult routes)
Component Lifecycle Tracking
Track usage patterns to schedule proactive replacements before end-of-life. Examples:- Motor Bearings: Track operating hours and vibration levels
- Sensors: Monitor calibration drift over time
- Belts/Chains: Track tension variations and operating hours
operating_hours > 5000 AND vibration_level > baseline * 1.5calibration_error > 5% AND days_since_calibration > 180
Event Management
Establish Review Routines
Create a structured process for reviewing and acting on events to ensure they drive action rather than accumulating as noise. Daily Reviews:- Review all critical and high-priority events
- Respond to immediate issues
- Verify resolution of previous day’s critical events
- Analyze event patterns across systems
- Identify recurring problems
- Adjust trigger thresholds if needed
- Review false positive/negative rates
- Review predictive maintenance queue
- Schedule upcoming maintenance interventions
- Assess overall fleet health trends
- Update trigger configurations based on seasonal changes
Configure Appropriate Notification Channels
Match notification methods to event priority: Critical Events (requires immediate action):- Slack with @mentions
- Email to multiple recipients
- In-app notifications
- Slack notifications
- Email to operations team
- In-app alerts
- Email notifications
- In-app only
Use Recording Windows Strategically
Configure recording windows to capture relevant context without excessive data storage: Short Events (battery low, sensor spike):- Record 60 seconds before, 30 seconds after
- Captures immediate context without excessive data
- Record 300 seconds (5 minutes) before, 600 seconds (10 minutes) after
- Captures progression and aftermath for root cause analysis
- Record 0 seconds before, 10 seconds after
- Minimal recording needed since event marks a pattern, not an incident
Continuous Improvement
Monitor Trigger Effectiveness
Track metrics to assess how well your triggers are working: Key Metrics:- Event Frequency: How often does each trigger fire?
- Action Rate: What percentage of events require intervention?
- False Positive Rate: Events that didn’t require action
- False Negative Rate: Issues discovered outside trigger system
- Response Time: Time from event generation to resolution
- High event frequency + low action rate = Threshold too sensitive
- Low event frequency + high false negatives = Threshold too conservative
- Inconsistent response times = Priority levels may need adjustment
Document Configuration Decisions
Maintain documentation explaining your trigger configurations: What to Document:- Why specific thresholds were chosen
- Environmental or operational context
- Historical adjustments and their rationale
- Known edge cases or limitations
- New team members understand existing configuration
- Troubleshoot unexpected behavior
- Make informed adjustments over time
- Replicate successful patterns across systems
Learn from Incidents
When issues occur, review whether your monitoring could have caught them earlier: Post-Incident Review Questions:- Did an existing trigger fire? If so, was it acted upon promptly?
- If no trigger fired, could one have been configured to detect this?
- Were there warning signs in the data that weren’t monitored?
- Would different thresholds or conditions have provided earlier warning?
Common Pitfalls to Avoid
Alert Fatigue
Problem: Too many low-value alerts cause teams to ignore notifications. Solutions:- Start with fewer, high-confidence triggers
- Tune thresholds to reduce false positives
- Use appropriate priority levels
- Consolidate similar alerts
- Regularly review and disable ineffective triggers
Over-Monitoring
Problem: Creating triggers for every possible metric creates noise without value. Solutions:- Focus on conditions that require action
- Ask “If this triggers, what would we do?” before creating a trigger
- Monitor outcomes that matter, not just metrics
- Consolidate related conditions into single triggers
Under-Monitoring
Problem: Missing critical conditions because they weren’t configured. Solutions:- Review incidents to identify missed monitoring opportunities
- Implement comprehensive coverage for safety-critical systems
- Use gradual rollout to test new trigger ideas
- Balance with alert fatigue concerns
Static Configuration
Problem: Triggers configured once and never adjusted despite changing conditions. Solutions:- Schedule regular trigger reviews
- Adjust for seasonal variations
- Update as systems age or workloads change
- Respond to operational feedback
Next Steps
Apply to Your Use Case:- Fleet Monitoring - Apply these practices to fleet management
- Understanding Events - Deep dive into event lifecycle
- Labels Guide - Master label strategies
- Trigger Configuration - Create and manage triggers in the UI
- Notifications - Configure notification channels