The dark side of allow_duplicates & making callbacks sequential

I was inspired by this question on the Plotly community forum to talk about sequential callbacks and, more broadly, callback design with or without allow_duplicates.

Ensuring that some information is stored without desynchronization issues with callbacks is also a problem that I encountered a few times when developing Dash applications. As a result, I want to discuss here a few solutions and when to approach them.

Let’s dive in 🙂

The problem

Imagine you have a dcc.Store("user_data") that stores something like {"firstname": "John", "lastname": "Doe", "age": 50, ...}. This Store is set via multiple inputs: first_name, last_name, age, …It is a large dictionary and you might have multiple callbacks updating it.

To update this user_data from many inputs, you can set allow_duplicates=True. That way, you are not restricted to only one callback for this output. You just need to get the full user_data back each time (as a State), modify it, and return it.

The code would be something like:

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Input("first_name", "value"),
    State("user_data", "data"),
    prevent_initial_call=True,
)
def update_first_name(first_name, user_data):

    # Just update user_data with the first name
    if first_name: 
        user_data["first_name"] = first_name
        return user_data

    raise PreventUpdate

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Input("last_name", "value"),
    State("user_data", "data"),
    prevent_initial_call=True,
)
def update_last_name(last_name, user_data):

    # Just update user_data with the last name
    if last_name: 
        user_data["last_name"] = last_name
        return user_data

    raise PreventUpdate

# ... other callbacks

For the moment, everything is fine. But a problem starts appearing if you trigger all inputs at the same time, e.g. to set initial values:

# A callback to simulate the desynchro problem.
# As it triggers the first name and last name input as the same time
@app.callback(
    Output("first_name", "value"),
    Output("last_name", "value"),
    Input("load_data", "n_clicks"),
)
def load_data(n_clicks):
    if n_clicks:
        return "Mickael", "Jackson"
    raise PreventUpdate

A de-synchronization problem appears, between the inputs and the result in the user_dict memory store. The inputs have the updated values, not the stored memory user_dict.

You can try it below (or click here):

  • if you enter “john” and “doe”, then everything gets updated correctly.
  • then click on “load” data. the user_data should contains firstname = “mickael” and lastname = “jackson”, but instead some discrepency appears.

Why this problem appears.

Both callbacks are run at the same time. If update_first_name is triggered at the same time update_last_name is triggered, the state value of user_data do not contains the updated first name. It also works the way around, which explains that sometimes we don’t have the first name, and sometimes the last name. it depends which callback is fired first.

And having Input("user_data", "value") instead of State("user_data", "value") might be possible in some cases, but you would end up with some weird behavior due to the circular pattern. If you have 5 or 10 callbacks following the same scheme, your app will basically blow up because of all the callbacks triggered in all directions.

Here is a video example with 4 inputs:

Illustration: 4 inputs with circular callbacks. Each input modification generates a lot of callback requests.

Notice how many callbacks are triggered with a single change in one input (1 HTTP request = 1 callback)! Because of the circular scheme, callbacks get triggered unecessarily.

The solution can be to ensure that callbacks are executed sequentially: first update_first_name, and then update_last_name, with the updated value of user_data. That’s actually what it “seems to be”, but the real solution is to think about the problem differently.

But let’s see why.

Solution 1: chaining callbacks

How do we make callbacks sequential?

Chaining is the simplest form of sequential callbacks. In that case, you just need to connect one of the outputs of the first callback as an input for the second callback (A -> o -> B -> o).

But as I said previously, we cannot set user_data as an input for the second callback (because of circular dependencies).

Instead, we can use an intermediary dcc.Store, that will just be used as a trigger. I think a timestamp is a good fit for this purpose:

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Output("intermediary_timestamp", "data"),
    Input("first_name", "value"),
    State("user_data", "data"),
    prevent_initial_call=True,
)
def update_first_name(first_name, user_data):

    # Just update user_data with the first name
    if first_name: 
        user_data["first_name"] = first_name
        return user_data, time.time()

    raise PreventUpdate

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Input("last_name", "value"),
    Input("intermediary_timestamp", "data"),
    State("user_data", "data"),
    prevent_initial_call=True,
)
def update_last_name(last_name, timestamp, user_data):

    # Just update user_data with the last name
    if last_name: 
        user_data["last_name"] = last_name
        return user_data

    raise PreventUpdate

In this example, the update_first_name callback will return the user_data and a timestamp. The dash-renderer will know that it has to wait before triggering update_last_name because intermediary_timestamp must be ready before executing it.

Pro-tip: we could have used the property modified_timestamp of our user_data dcc.Store component as an input. i.e., replacing Input("intermediary_timestamp", "data") by Input("user_data", "modified_timestamp"). That works too and we don’t even need the intermediary dcc.Store 🙂

Solution 2: rewriting callbacks

But wait wait wait… Should we really need to chain callbacks?
Often the best option is to handle the processing of the two callbacks inside the same callback.

@callback(
    Output("user_data", "data"),
    Input("first_name", "value"),
    Input("last_name", "value"),
    # .. other possible inputs
    State("user_data", "data"),
)
def update_user_data(first_name, user_data):

    # Just update user_data with the first name
    if first_name: 
        user_data["first_name"] = first_name
    if last_name:
        user_data["last_name"] = last_name
    # .. other possible values

    return user_data

And that solves the synchronization problem.

However, it can happen that you don’t want to get the value from the input if it wasn’t really triggered.

Hopefully, there is a way to filter which input was really triggered, using ctx.triggered_prop_ids (documentation link). Let’s see an example:

def slow_processing(user_data):
    # Simulate some long processing, e.g. requesting a database
    time.sleep(4)
    return "Some information"


@callback(
    Output("user_data", "data"),
    Input("first_name", "value"),
    Input("last_name", "value"),
    Input("info_button", "n_clicks"),
    # .. other possible inputs
    State("user_data", "data"),
)
def update_user_data(first_name, last_name, n_clicks, user_data):

    # Identify which inputs were triggered
    first_name_triggered = "first_name" in ctx.triggered_prop_ids.values()
    last_name_triggered = "last_name" in ctx.triggered_prop_ids.values()
    button_triggered = "info_button" in ctx.triggered_prop_ids.values()

    # Update user_data accordingly
    if first_name_triggered and first_name: 
        user_data["first_name"] = first_name
    if last_name_triggered and last_name:
        user_data["last_name"] = last_name

    if button_triggered and n_clicks:
        # An information that takes time to retrieve
        # You want to compute it only if button was *really* triggered
        info = slow_processing(user_data)  
        user_data["info"] = info

    # .. other possible inputs being hanled

    return user_data

The good thing is that this solution is scalable: we can continue adding inputs and keys to our user_data easily. And we get rid of allow_duplicate=True which is useful but also leads to “callback bad design”. I’ll come back to this later.

Solution 3: using intermediary stores

Part of the problem that we have here is the use of one memory store for multiple information.

So instead of trying to store everything inside the same callback, we could actually split the user_data dict into as many stores as required: user_first_name, user_last_name, user_age, user_info, etc.

Then, we would have to merge all this information into one callback:


app.layout = [
    # ...
    dcc.Store(id="user_first_name"),
    dcc.Store(id="user_last_name"),
    dcc.Store(id="user_age"),
    dcc.Store(id="user_info"),
    # ... other values
]

@callback(
    Output("user_first_name", "data"),
    Input("first_name", "value")
)
def update_first_name(first_name):
    if first_name: 
        return first_name
    raise PreventUpdate

@callback(
    Output("user_last_name", "data"),
    Input("last_name", "value")
)
def update_last_name(last_name):
    if last_name: 
        return last_name
    raise PreventUpdate

@callback(
    Output("user_age", "data"),
    Input("age", "value")
)
def update_age(age):
    if age: 
        return age
    raise PreventUpdate

@callback(
    Output("user_info", "data"),
    Input("info_button", "n_clicks"),
    State("user_data", "data")
)
def update_info(n_clicks, user_data):
    if n_clicks:
        info = slow_processing(user_data) 
        return info
    raise PreventUpdate

# ... other callbacks 

# then we update the user_data:
@callback(
    Output("user_data", "data"),
    Input("user_first_name", "data"),
    Input("user_last_name", "data"),
    Input("user_age", "data"),
    Input("user_info", "data"),
    prevent_initial_call=True,
)
def update_user_data(first_name, last_name, age, info):
    return {
        "first_name": first_name,
        "last_name": last_name,
        "age": age,
        "info": info,
        # ... other properties and values?
    }

If all callbacks are triggered at the same time, update_user_data will be the last one triggered, and all input values will be filled before it can run. That’s a good way to make callbacks sequential.

The good thing about this solution is that we can get rid of allow_duplicate=True too. The bad thing is that it is poorly scalable: we would need to add as many stores and callbacks as we have keys to update in user_data.

As you can see, the initial problem was made possible because of allow_duplicate=True. If it wasn’t an option, we might have used this solution in the first place. And even if it’s a verbose solution, it’s a good simple solution. allow_duplicate=True is a fortunate option to use in some cases, but it often leads to bad callback design.

Pro-tip: 99% of the time you don’t need allow_duplicate=True. Try to avoid it as much as possible. It can creates callback mess.

Solution 4: using Partial Update

The above solutions will rely on the fact that we get user_data as a State.
But what if user_data is very large? Dash callbacks are HTTP requests, so a large input or state would result in a large HTTP request. Depending on the user’s bandwidth, it can take time.

Instead, we could use Patch() (documentation link) to only update one key at a time, not the whole user_data dictionary.

Let’s see the code:

from dash import Patch

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Input("first_name", "value"),
    prevent_initial_call=True,
)
def update_first_name(first_name):

    # Just update user_data with the first name
    if first_name: 
        user_data = Patch()
        user_data["first_name"] = first_name
        return user_data

    raise PreventUpdate

@callback(
    Output("user_data", "data", allow_duplicate=True),
    Input("last_name", "value"),
    prevent_initial_call=True,
)
def update_last_name(last_name):

    # Just update user_data with the last name
    if last_name: 
        user_data = Patch()
        user_data["last_name"] = last_name
        return user_data

    raise PreventUpdate

And that’s it. :-). So, how does it works ?

  • In the other solutions, Dash will send the whole user_data to be modified in a callback, gets the new full value, and updates in its storage.
  • With Patch(), Dash will just send the input and get just the piece of information that was modified. It then updates the full object in its storage with the new piece of information.
  • Unless two callbacks modifies the same piece of information (come’on, you want problems), this solution do not require sequential execution.

It’s maybe the most powerful solution, especially if user_data is very large. It can work with solution n°1, n°2 and n°3. We still use allow_duplicate, but it is not a problem anymore.

Conclusion

So what’s the best solution? As always, it depends on your use case.

But here key takeaways:

  • If you can, try to avoid using allow_duplicate=True and have one callback update one output
  • If you can, try to merge multiple callbacks into one callback. It’s more efficient, and you will de-facto solve the synchronization problem
  • If you have large data, Patch() is probably the best solution.

I hope this article helped you learn about different approaches and how things can get complex with Dash callbacks. Feel free to ask questions or join the discussion here.

Happy coding! ⭐