WordPress Background Jobs Need a Lock and Persistent State

Why long-running WordPress jobs need an atomic start lock, durable run state with a run ID, and recovery that survives worker crashes without duplicate imports.

Jakub Czechowski

/ May 22, 2026 / 7 min read

Dark isometric technical scene showing an atomic lock and persistent state protecting a WordPress background job queue.

WordPress plugin developers often reach for set_transient() when a job must not run twice. Set a key, check it on entry, delete it on exit. The code looks like a lock and may behave like one for months. Then a cron request overlaps with an admin click, two imports start, and stock or pricing data drifts.

The failure comes from treating two problems as one. A long-running job needs atomic start admission so that two requests cannot start it concurrently. It also needs persistent run state so that a later request can see work that continues across Action Scheduler processes. A MySQL advisory lock can provide the first property. A record in wp_options can provide the second. The important part is not merely having both, but changing the persistent state while the atomic lock is held.

Iteration One: A Transient Is Cache State, Not a Lock

The common first attempt looks like this:

if ( get_transient( 'plugin_import_lock' ) ) {
    return;
}

set_transient( 'plugin_import_lock', 1, 2 * HOUR_IN_SECONDS );
start_import();

This is a check-then-act race. Request A and request B can both read “no transient” before either writes the key. Both then continue.

Wrapping those calls in a helper does not change the guarantee. The gap may become smaller, but it still exists. Correctness cannot depend on one PHP request reaching its write a few milliseconds before another.

There is a second mismatch: WordPress defines transients as cache entries. They may live in wp_options or an external object cache, and they may disappear before their nominal expiration. That flexibility is useful for cached data. It is the wrong contract for authoritative job state.

Some plugins also delete the transient when the first batch finishes while later batches still run. The lock disappears; the job does not. A second start then looks allowed even though work is still in flight.

Iteration Two: `GET_LOCK()` Is Atomic but Session-Scoped

A MySQL advisory lock is a meaningful upgrade:

SELECT GET_LOCK('plugin_import_start', 0);

The database serializes access to the lock name. Two sessions cannot hold it at the same time, so it can protect the short section that decides whether a new run may start.

It cannot represent the whole import. MySQL releases a named lock when it is explicitly released or when its session ends. In a normal WordPress request, the start handler acquires the lock, schedules Action Scheduler work, returns a response, and closes the database connection. The batches that continue for the next twenty or sixty minutes run in other processes and sessions.

That behavior is correct for a mutex. The mistake is extending its scope in our mental model from one database session to one logical operation.

A second admin request arriving twenty minutes later will find the advisory lock free. Without persistent state, it can enqueue another import against the same data.

Iteration Three: Lock, Check, Record, Enqueue

The safe sequence is:

Acquire the advisory lock.
Read the persistent run state after acquiring the lock.
Refuse to start if a healthy run is already active.
Write a new run record with a unique run ID.
Enqueue the first background action with that run ID.
Release the advisory lock in a finally block.

The order matters. Checking the flag before GET_LOCK() can be a fast path, but it cannot be the authoritative check. A request may read “idle,” pause, and acquire the lock after another request has already recorded a run. The state must be checked again inside the critical section.

Assuming plugin_run_is_active() returns true when status is running and heartbeat_at is newer than the project’s stale threshold, a compact implementation looks like this:

function plugin_start_import() {
    global $wpdb;

    $lock_name = $wpdb->prefix . 'plugin_import_start';
    $got_lock  = (int) $wpdb->get_var(
        $wpdb->prepare( 'SELECT GET_LOCK(%s, 0)', $lock_name )
    );

    if ( 1 !== $got_lock ) {
        return new WP_Error( 'import_busy', 'Another start request is in progress.' );
    }

    try {
        $state = get_option( 'plugin_import_state', [] );

        if ( plugin_run_is_active( $state ) ) {
            return new WP_Error( 'import_running', 'Import already running.' );
        }

        $run_id = wp_generate_uuid4();
        $state  = [
            'run_id'       => $run_id,
            'status'       => 'running',
            'started_at'   => time(),
            'heartbeat_at' => time(),
        ];

        update_option( 'plugin_import_state', $state, false );

        $written = get_option( 'plugin_import_state', [] );
        if ( ( $written['run_id'] ?? '' ) !== $run_id ) {
            return new WP_Error( 'state_write_failed', 'Could not record import state.' );
        }

        $action_id = as_enqueue_async_action(
            'plugin_import_batch',
            [ 'run_id' => $run_id ],
            'plugin-import'
        );

        if ( 0 === $action_id ) {
            delete_option( 'plugin_import_state' );
            return new WP_Error( 'enqueue_failed', 'Could not enqueue import.' );
        }

        return $run_id;
    } finally {
        $wpdb->get_var(
            $wpdb->prepare( 'SELECT RELEASE_LOCK(%s)', $lock_name )
        );
    }
}

The advisory lock has one narrow responsibility: serialize the transition from idle to running. The option has another: describe the logical run after the request and database session have ended. Calling both of them “locks” obscures the distinction.

Action Scheduler also offers a $unique argument on as_enqueue_async_action(). When true, it skips scheduling if another pending or running action shares the same hook and group. That can catch some duplicate enqueue attempts, but it does not track a multi-batch run, expose heartbeat state, or support admin recovery. It is a queue guard, not run ownership.

In production code, make the lock name application-specific and keep it within MySQL’s length limit. Verify that every contender reaches the same database server; an advisory lock is server-wide, not magically global across independent database primaries. On read/write split setups, read run state from the same connection that holds GET_LOCK(), not from a lagging replica. If options are cached in Redis or Memcached, either bypass the cache for authoritative state or store runs in a dedicated table.

Persistent State Needs Ownership and Recovery

A boolean such as plugin_import_in_progress = 1 is better than no state, but it is not enough for robust recovery. If a worker dies, the flag may remain forever. If an admin clears it and starts a new run, a delayed worker from the old run may later finish and clear the new run’s state.

A run ID prevents that ownership bug. Every scheduled batch carries the ID, and every state mutation verifies that the stored ID still matches. An old worker may report an error, but it cannot complete or unlock a newer run. Batch handlers should also be idempotent: Action Scheduler may retry a failed batch, and a retry must not apply partial work twice.

The state should also include enough timing information to distinguish slow work from abandoned work:

started_at records the beginning of the logical operation.
heartbeat_at is refreshed by each successful batch.
status distinguishes running, failed, completed, and possibly cancelling.
An admin recovery action can mark a stale run as failed before allowing a replacement.

Do not silently treat every old timestamp as permission to start again. A batch may be slow rather than dead. Define the stale threshold from observed batch duration, expose the state in the admin UI, and log who forced recovery.

This is the same principle behind quality gates that enforce a rule instead of merely documenting it. “Do not click twice” is a checklist item. An atomic state transition is an executable constraint.

The Pattern Generalises Beyond WordPress

Any system with two timescales has the same design problem: millisecond-level contention at admission and minute-level visibility throughout execution. A short-lived mutex solves the first. Durable state solves the second.

Redis, a dedicated jobs table, or a queue with uniqueness guarantees may combine these responsibilities differently. The architecture still needs to answer the same questions:

Who owns the current run?
Which operation changes the system from idle to running atomically?
How do later workers prove they belong to that run?
How is abandoned state detected and recovered?

The implementation may use one storage system or several. The guarantees remain separate.

What to Check When Reviewing Plugin Background Jobs

When a WordPress plugin schedules long-running work, treat a get_transient() followed by set_transient() as a concurrency warning. Treat a standalone GET_LOCK() as incomplete if the job outlives the request that acquired it.

Then inspect the transition, not just the ingredients. The persistent state must be read and written while the atomic guard is held. Workers must carry a run identity. Completion and recovery must only modify state they still own.

That is the difference between code that appears to have a lock and a background workflow that remains correct under cron overlap, retries, worker crashes, and impatient admin clicks. The same failure mode shows up in WooCommerce product imports that look successful while stock and prices drift. Encode those conditions before the first batch starts, not in a cleanup script after duplicate data has landed.