I was inspired by this question on the Plotly community forum to talk about sequential callbacks and, more broadly, callback design with or without allow_duplicates
.
Ensuring that some information is stored without desynchronization issues with callbacks is also a problem that I encountered a few times when developing Dash applications. As a result, I want to discuss here a few solutions and when to approach them.
Let’s dive in 🙂
The problem
Imagine you have a dcc.Store("user_data")
that stores something like {"firstname": "John", "lastname": "Doe", "age": 50, ...}
. This Store is set via multiple inputs: first_name, last_name, age, …It is a large dictionary and you might have multiple callbacks updating it.
To update this user_data
from many inputs, you can set allow_duplicates=True
. That way, you are not restricted to only one callback for this output. You just need to get the full user_data
back each time (as a State), modify it, and return it.
The code would be something like:
@callback(
Output("user_data", "data", allow_duplicate=True),
Input("first_name", "value"),
State("user_data", "data"),
prevent_initial_call=True,
)
def update_first_name(first_name, user_data):
# Just update user_data with the first name
if first_name:
user_data["first_name"] = first_name
return user_data
raise PreventUpdate
@callback(
Output("user_data", "data", allow_duplicate=True),
Input("last_name", "value"),
State("user_data", "data"),
prevent_initial_call=True,
)
def update_last_name(last_name, user_data):
# Just update user_data with the last name
if last_name:
user_data["last_name"] = last_name
return user_data
raise PreventUpdate
# ... other callbacks
For the moment, everything is fine. But a problem starts appearing if you trigger all inputs at the same time, e.g. to set initial values:
# A callback to simulate the desynchro problem.
# As it triggers the first name and last name input as the same time
@app.callback(
Output("first_name", "value"),
Output("last_name", "value"),
Input("load_data", "n_clicks"),
)
def load_data(n_clicks):
if n_clicks:
return "Mickael", "Jackson"
raise PreventUpdate
A de-synchronization problem appears, between the inputs and the result in the user_dict
memory store. The inputs have the updated values, not the stored memory user_dict
.
You can try it below (or click here):
- if you enter “john” and “doe”, then everything gets updated correctly.
- then click on “load” data. the
user_data
should contains firstname = “mickael” and lastname = “jackson”, but instead some discrepency appears.
Why this problem appears.
Both callbacks are run at the same time. If update_first_name
is triggered at the same time update_last_name
is triggered, the state value of user_data
do not contains the updated first name. It also works the way around, which explains that sometimes we don’t have the first name, and sometimes the last name. it depends which callback is fired first.
And having Input("user_data", "value")
instead of State("user_data", "value")
might be possible in some cases, but you would end up with some weird behavior due to the circular pattern. If you have 5 or 10 callbacks following the same scheme, your app will basically blow up because of all the callbacks triggered in all directions.
Here is a video example with 4 inputs:
Notice how many callbacks are triggered with a single change in one input (1 HTTP request = 1 callback)! Because of the circular scheme, callbacks get triggered unecessarily.
The solution can be to ensure that callbacks are executed sequentially: first update_first_name
, and then update_last_name
, with the updated value of user_data
. That’s actually what it “seems to be”, but the real solution is to think about the problem differently.
But let’s see why.
Solution 1: chaining callbacks
How do we make callbacks sequential?
Chaining is the simplest form of sequential callbacks. In that case, you just need to connect one of the outputs of the first callback as an input for the second callback (A -> o -> B -> o).
But as I said previously, we cannot set user_data
as an input for the second callback (because of circular dependencies).
Instead, we can use an intermediary dcc.Store
, that will just be used as a trigger. I think a timestamp is a good fit for this purpose:
@callback(
Output("user_data", "data", allow_duplicate=True),
Output("intermediary_timestamp", "data"),
Input("first_name", "value"),
State("user_data", "data"),
prevent_initial_call=True,
)
def update_first_name(first_name, user_data):
# Just update user_data with the first name
if first_name:
user_data["first_name"] = first_name
return user_data, time.time()
raise PreventUpdate
@callback(
Output("user_data", "data", allow_duplicate=True),
Input("last_name", "value"),
Input("intermediary_timestamp", "data"),
State("user_data", "data"),
prevent_initial_call=True,
)
def update_last_name(last_name, timestamp, user_data):
# Just update user_data with the last name
if last_name:
user_data["last_name"] = last_name
return user_data
raise PreventUpdate
In this example, the update_first_name
callback will return the user_data
and a timestamp. The dash-renderer will know that it has to wait before triggering update_last_name
because intermediary_timestamp
must be ready before executing it.
Pro-tip: we could have used the property modified_timestamp
of our user_data
dcc.Store component as an input. i.e., replacing Input("intermediary_timestamp", "data")
by Input("user_data", "modified_timestamp")
. That works too and we don’t even need the intermediary dcc.Store 🙂
Solution 2: rewriting callbacks
But wait wait wait… Should we really need to chain callbacks?
Often the best option is to handle the processing of the two callbacks inside the same callback.
@callback(
Output("user_data", "data"),
Input("first_name", "value"),
Input("last_name", "value"),
# .. other possible inputs
State("user_data", "data"),
)
def update_user_data(first_name, user_data):
# Just update user_data with the first name
if first_name:
user_data["first_name"] = first_name
if last_name:
user_data["last_name"] = last_name
# .. other possible values
return user_data
And that solves the synchronization problem.
However, it can happen that you don’t want to get the value from the input if it wasn’t really triggered.
Hopefully, there is a way to filter which input was really triggered, using ctx.triggered_prop_ids
(documentation link). Let’s see an example:
def slow_processing(user_data):
# Simulate some long processing, e.g. requesting a database
time.sleep(4)
return "Some information"
@callback(
Output("user_data", "data"),
Input("first_name", "value"),
Input("last_name", "value"),
Input("info_button", "n_clicks"),
# .. other possible inputs
State("user_data", "data"),
)
def update_user_data(first_name, last_name, n_clicks, user_data):
# Identify which inputs were triggered
first_name_triggered = "first_name" in ctx.triggered_prop_ids.values()
last_name_triggered = "last_name" in ctx.triggered_prop_ids.values()
button_triggered = "info_button" in ctx.triggered_prop_ids.values()
# Update user_data accordingly
if first_name_triggered and first_name:
user_data["first_name"] = first_name
if last_name_triggered and last_name:
user_data["last_name"] = last_name
if button_triggered and n_clicks:
# An information that takes time to retrieve
# You want to compute it only if button was *really* triggered
info = slow_processing(user_data)
user_data["info"] = info
# .. other possible inputs being hanled
return user_data
The good thing is that this solution is scalable: we can continue adding inputs and keys to our user_data
easily. And we get rid of allow_duplicate=True
which is useful but also leads to “callback bad design”. I’ll come back to this later.
Solution 3: using intermediary stores
Part of the problem that we have here is the use of one memory store for multiple information.
So instead of trying to store everything inside the same callback, we could actually split the user_data
dict into as many stores as required: user_first_name
, user_last_name
, user_age
, user_info
, etc.
Then, we would have to merge all this information into one callback:
app.layout = [
# ...
dcc.Store(id="user_first_name"),
dcc.Store(id="user_last_name"),
dcc.Store(id="user_age"),
dcc.Store(id="user_info"),
# ... other values
]
@callback(
Output("user_first_name", "data"),
Input("first_name", "value")
)
def update_first_name(first_name):
if first_name:
return first_name
raise PreventUpdate
@callback(
Output("user_last_name", "data"),
Input("last_name", "value")
)
def update_last_name(last_name):
if last_name:
return last_name
raise PreventUpdate
@callback(
Output("user_age", "data"),
Input("age", "value")
)
def update_age(age):
if age:
return age
raise PreventUpdate
@callback(
Output("user_info", "data"),
Input("info_button", "n_clicks"),
State("user_data", "data")
)
def update_info(n_clicks, user_data):
if n_clicks:
info = slow_processing(user_data)
return info
raise PreventUpdate
# ... other callbacks
# then we update the user_data:
@callback(
Output("user_data", "data"),
Input("user_first_name", "data"),
Input("user_last_name", "data"),
Input("user_age", "data"),
Input("user_info", "data"),
prevent_initial_call=True,
)
def update_user_data(first_name, last_name, age, info):
return {
"first_name": first_name,
"last_name": last_name,
"age": age,
"info": info,
# ... other properties and values?
}
If all callbacks are triggered at the same time, update_user_data
will be the last one triggered, and all input values will be filled before it can run. That’s a good way to make callbacks sequential.
The good thing about this solution is that we can get rid of allow_duplicate=True
too. The bad thing is that it is poorly scalable: we would need to add as many stores and callbacks as we have keys to update in user_data
.
As you can see, the initial problem was made possible because of allow_duplicate=True
. If it wasn’t an option, we might have used this solution in the first place. And even if it’s a verbose solution, it’s a good simple solution. allow_duplicate=True
is a fortunate option to use in some cases, but it often leads to bad callback design.
Pro-tip: 99% of the time you don’t need allow_duplicate=True
. Try to avoid it as much as possible. It can creates callback mess.
Solution 4: using Partial Update
The above solutions will rely on the fact that we get user_data
as a State
.
But what if user_data
is very large? Dash callbacks are HTTP requests, so a large input or state would result in a large HTTP request. Depending on the user’s bandwidth, it can take time.
Instead, we could use Patch()
(documentation link) to only update one key at a time, not the whole user_data
dictionary.
Let’s see the code:
from dash import Patch
@callback(
Output("user_data", "data", allow_duplicate=True),
Input("first_name", "value"),
prevent_initial_call=True,
)
def update_first_name(first_name):
# Just update user_data with the first name
if first_name:
user_data = Patch()
user_data["first_name"] = first_name
return user_data
raise PreventUpdate
@callback(
Output("user_data", "data", allow_duplicate=True),
Input("last_name", "value"),
prevent_initial_call=True,
)
def update_last_name(last_name):
# Just update user_data with the last name
if last_name:
user_data = Patch()
user_data["last_name"] = last_name
return user_data
raise PreventUpdate
And that’s it. :-). So, how does it works ?
- In the other solutions, Dash will send the whole
user_data
to be modified in a callback, gets the new full value, and updates in its storage. - With Patch(), Dash will just send the input and get just the piece of information that was modified. It then updates the full object in its storage with the new piece of information.
- Unless two callbacks modifies the same piece of information (come’on, you want problems), this solution do not require sequential execution.
It’s maybe the most powerful solution, especially if user_data
is very large. It can work with solution n°1, n°2 and n°3. We still use allow_duplicate
, but it is not a problem anymore.
Conclusion
So what’s the best solution? As always, it depends on your use case.
But here key takeaways:
- If you can, try to avoid using
allow_duplicate=True
and have one callback update one output - If you can, try to merge multiple callbacks into one callback. It’s more efficient, and you will de-facto solve the synchronization problem
- If you have large data,
Patch()
is probably the best solution.
I hope this article helped you learn about different approaches and how things can get complex with Dash callbacks. Feel free to ask questions or join the discussion here.
Happy coding! ⭐