Autoscaling with Hatchet Compute
Hatchet Compute provides automatic scaling capabilities to ensure your workflow workers efficiently handle varying workloads. This guide explains how to configure and use autoscaling features.
Overview
Autoscaling automatically adjusts the number of worker replicas based on utilization metrics. When workload increases, Hatchet scales up your workers to handle the load. When workload decreases, it scales down to optimize resource usage.
Basic Configuration
The basic autoscaling configuration requires two parameters:
- Minimum Replicas: The minimum number of worker replicas that should be running at all times.
- Maximum Replicas: The maximum number of worker replicas that can be created during high load.
You can also enable “Scale to zero during periods of inactivity” to allow the system to scale down to zero replicas when there’s no work to be done, helping to reduce costs.
Advanced Autoscaling Settings
Wait Duration
The wait duration specifies how long to wait between autoscaling events. This prevents rapid scaling changes that could destabilize your system.
Format: Use time units like:
10s
(10 seconds)5m
(5 minutes)1h
(1 hour)
Default: 1m
(1 minute)
Rolling Window Duration
This setting determines the time window used to calculate utilization metrics for scaling decisions. Shorter windows make the system more responsive but potentially more volatile.
Format: Use time units like:
2m
(2 minutes)5m
(5 minutes)1h
(1 hour)
Default: 2m
(2 minutes)
Utilization Scale Up Threshold
This threshold determines when to scale up worker replicas. It’s expressed as a decimal between 0 and 1.
Example: A value of 0.75
means:
- If utilization exceeds 75%, add more replicas
- Scale up occurs in increments defined by the scaling increment
Default: 0.75
Utilization Scale Down Threshold
This threshold determines when to scale down worker replicas. It’s expressed as a decimal between 0 and 1.
Example: A value of 0.25
means:
- If utilization falls below 25%, remove replicas
- Scale down occurs in increments defined by the scaling increment
Default: 0.25
Scaling Increment
The number of replicas to add or remove during each scaling event.
Example: A value of 1
means:
- Add or remove 1 replica at a time
- Higher values enable faster scaling but may be less stable
Default: 1
Best Practices
-
Start Conservative: Begin with moderate thresholds (e.g., 0.75 for scale-up and 0.25 for scale-down) and adjust based on your workload patterns.
-
Tune Wait Duration:
- Shorter durations (e.g., 1m) work well for bursty workloads
- Longer durations (e.g., 5m) are better for stable, predictable loads
-
Rolling Window:
- Shorter windows (2-5m) provide faster response to changes
- Longer windows (10m+) provide more stable scaling behavior
-
Monitor and Adjust:
- Watch your scaling patterns in the Hatchet UI
- Adjust thresholds based on observed behavior
- Consider your cost vs. performance requirements
Monitoring Autoscaling
You can monitor your autoscaling behavior in the Hatchet UI under the Managed Compute section. The UI shows:
- Current number of replicas
- Scaling events history
- Utilization metrics
- Scaling decisions and their triggers
Troubleshooting
If you’re experiencing unexpected scaling behavior:
- Check your utilization metrics in the Hatchet UI
- Verify your threshold settings are appropriate for your workload
- Ensure your wait duration isn’t too short or too long
- Review your scaling events history for patterns
For additional help, join the Hatchet Community or reach out to us here.