Timeouts in Hatchet
Timeouts are an important concept in Hatchet that allow you to control how long a workflow or step is allowed to run before it is considered to have failed. This is useful for ensuring that your workflows don't run indefinitely and consume unnecessary resources. Timeouts in Hatchat are treated as failures and the step will be retried if specified.
There are two types of timeouts in Hatchet:
- Scheduling Timeouts (Default 5m) - the time a step is allowed to wait in the queue before it is cancelled
- Execution Timeouts (Default 60s) - the time a step is allowed to run before it is considered to have failed
Timeout Format
In Hatchet, timeouts are specified using a string in the format <number><unit>
, where <number>
is an integer and <unit>
is one of:
s
for secondsm
for minutesh
for hoursd
for days
For example:
10s
means 10 seconds4m
means 4 minutes1h
means 1 hour2d
means 2 days
If no unit is specified, seconds are assumed.
Scheduling Timeouts
To specify a timeout for an entire workflow, you can set the Schedule Timeout
property in the workflow definition:
@hatchet.workflow(schedule_timeout="2m")
class TimeoutWorkflow:
# ...
This would set a timeout of 2 minutes for all steps in the workflow. If the workflow takes longer than 2 minutes for assignment of the step to a worker, it will be cancelled and will not be assigned to a worker.
Step Timeouts
To specify a timeout for an individual step, you can set the timeout
property in the step definition:
@hatchet.step(timeout="30s")
def timeout(self, context):
try:
print("started step2")
time.sleep(5)
print("finished step2")
except Exception as e:
print("caught an exception: " + str(e))
raise e
This would set a timeout of 30 seconds for this specific step. If the step takes longer than 30 seconds to complete, it will fail and the workflow will be cancelled.
A timed out step does not guarantee that the step will be stopped immediately. The step will be stopped as soon as the worker is able to stop the step. See cancellation for more information.
Refreshing Timeouts
In some cases, you may need to extend the timeout for a step while it is running. This can be done using the refreshTimeout
function provided by the step context (ctx
).
For example:
@hatchet.step(timeout="30s")
def timeout(self, context):
time.sleep(20)
context.refresh_timeout("15s")
time.sleep(10)
return {
step1: "step1 results!"
}
In this example, the step initially has a timeout of 30 seconds. After 19 seconds, the refreshTimeout
function is called with an argument of '15s'
, which extends the timeout by an additional 15 seconds. This allows the step to continue running for a total of 45 seconds (30 seconds initial timeout + 15 seconds refreshed timeout).
The refreshTimeout
function can be called multiple times within a step to further extend the timeout as needed.
Use Cases
Timeouts are useful in a variety of scenarios:
- Ensuring workflows don't run indefinitely and consume unnecessary resources
- Failing workflows early if a critical step takes too long
- Keeping workflows responsive by ensuring individual steps complete in a timely manner
- Preventing infinite loops or hung processes from blocking the entire system
For example, if you have a workflow that makes an external API call, you may want to set a timeout to ensure the workflow fails quickly if the API is unresponsive, rather than waiting indefinitely.
By carefully considering timeouts for your workflows and steps, you can build more resilient and responsive systems with Hatchet.