Understanding Steps in Hatchet
In Hatchet, steps are simply a function that satisfies the following signature:
def my_step(context: Context) -> dict:
# Perform some operation
return output
This function takes a single argument, context
, which is an object that provides access to the workflow's input, as well as methods to interact with hatchet (i.e. logging). The function returns a JSON-serializable object, which represents the output of the step.
Often, Hatchet users will start by wrapping one large, existing function into a step, and then break it down into smaller, more focused steps. This approach allows for better reusability and easier testing.
Best Practices for Defining Independent Steps
A step in Hatchet is a self-sufficient function that encapsulates a specific operation or task. This independence means that while steps can be orchestrated into larger workflows, each step is designed to function effectively in isolation.
Here's some key best practices for defining independent steps in Hatchet:
- Consistent Input and Output Shapes: Each step can accept an input and produce an output. The input to a step is typically a JSON object, allowing for flexibility and ease of integration. The output is also a JSON-serializable object, ensuring compatibility and ease of further processing or aggregation. Wherever possible, use consistent input and output shapes to ensure that steps produce predictable and uniform results.
- Reusability: Due to their self-contained nature, steps can be reused across different workflows or even within the same workflow, maximizing code reuse and reducing redundancy.
- Testing: Steps can be easily tested in isolation, ensuring that they function as expected and produce the desired output. This testing approach simplifies the debugging process and ensures that steps are reliable and robust.
- Logging: Each step can log its operation, input, and output, providing valuable insights into the workflow's execution and enabling effective monitoring and troubleshooting. Logs can be streamed to the Hatchet dashboard through the
context.log
method.
The Workflow Input Object
A step in Hatchet accepts a single workflow input, which is typically a JSON object and can be accessed through the context
argument:
def my_step(context: Context) -> dict:
data = context.workflow_input()
# Perform some operation
return output
This input object can contain any data or parameters required for the step to perform its operation. By using a JSON object as the input, steps can be easily integrated and combined, as the input can be easily serialized and deserialized.
The Return Object
A step in Hatchet returns any JSON-serializable object, which can be a simple value, an array, or a complex object. This flexibility allows steps to encapsulate a wide range of operations, from simple transformations to complex computations.
"Thin" vs "Full" Payloads in Hatchet
Hatchet can handle inputs and result data in two primary formats: "thin" and "full" data payloads. Full data payloads include the full model data and all (or most) properties. Conversely, thin data payloads provide essential identifiers (i.e. GUIDs) and possibly minimal change details.
Upon the execution of a new task, here are the example data payloads Hatchet might dispatch:
Full data payload:
{
"type": "task.executed",
"timestamp": "2022-11-03T20:26:10.344522Z",
"data": {
"id": "1f81eb52-5198-4599-803e-771906343485",
"type": "task",
"taskName": "Database Backup",
"taskStatus": "Completed",
"executionDetails": "Backup completed successfully at 2022-11-03T20:25:10.344522Z",
"assignedTo": "John Smith",
"priority": "High"
}
}
Thin data payload:
{
"type": "task.executed",
"timestamp": "2022-11-03T20:26:10.344522Z",
"data": {
"id": "1f81eb52-5198-4599-803e-771906343485"
}
}
It's feasible to adopt a thin payload strategy while still including frequently utilized or critical fields, such as "taskName" in the thin payload, balancing necessity and efficiency.
The choice between thin and full data payloads hinges on the specific requirements: full data payloads offer immediate, comprehensive context, reducing the need for subsequent data retrieval. Thin data payloads, however, enhance performance, and adaptability especially in scalable, distributed environments like Hatchet.
Payload Size Considerations
While Hatchet allows for payloads upto 4mb, it's advisable to maintain smaller payload sizes, to minimize the processing burden on data consumers. For extensive data requirements, consider referencing data through links or URLs within the payload, allowing consumers to access detailed information only as needed. This approach aligns with efficient data handling and consumer-centric design principles in distributed systems.
What's Next: Composing Steps into Workflows
While each step in Hatchet stands on its own they can be further utilized through composition into workflows. In the next section, we'll explore how to combine these independent steps into declarative workflow definitions, where Hatchet seamlessly manages their ordering and execution, enabling you to orchestrate complex processes with ease.