Implementing a Job Queue in SQL






Introduction

Software program techniques usually periodically execute collections of comparable or equivalent duties. Whether or not it is computing every day account analytics, or working background computations asynchronously like GitHub Actions, it’s normal to construction and full this computation utilizing a queue of duties and the idea of a generic “employee” which executes the duty. Within the naive strategy, a single employee sequentially executes every activity, avoiding locks or different coordination.

Two downsides to the only employee strategy are:

  • Duties which might be able to execute might be delayed from executing till prior duties full (head-of-line blocking).
  • Unbiased duties aren’t run in parallel (as they are often logically),
    so the general wall-clock execution time is longer than mandatory.

The choice to the only employee is to make use of a pool of employees, every pulling a activity from the queue when they’re able to execute one. In trade for the decreased queueing delay and the decreased total wall-clock execution time,
the programmer should handle the complexity of assigning and executing duties. Some cloud companies, equivalent to Amazon Easy Queue Service (SQS), provide a managed abstraction for queueing duties and assigning them to employees. Nevertheless, they are often tough to debug or impose undesirable properties, equivalent to SQS’s default coverage of a minimum of as soon as supply (fairly than precisely as soon as). Lastly, it is perhaps the case that you just simply don’t desire the third-party dependency!

This text describes the way to implement a activity queue in SQL with the next properties:

  • A activity is assigned to precisely 1 employee (or 0 employees) at a time.
  • As soon as accomplished, a activity is rarely assigned once more.
  • When a activity reaches a configured most execution time with out finishing, it is going to be assigned once more to a employee.

Let’s soar in!



Design

A queue gives some elasticity in your system when your message (or, in our case, activity) quantity varies or is unpredictable. It additionally permits you to impose smoothing to a computational workload, which might in any other case be bursty,
through the use of a hard and fast set of sources to devour the queue and course of the messages.

Let’s take into account some concrete examples to tell the design of our SQL activity queue:

  • Every day Analytics – your analytics software shows utilization metrics damaged down by day.
    Because you need these analytics out there to your customers by 8AM native time, you queue up one analytics job for every account each night time beginning at 12:01AM.

  • Scheduled Reminders – your on-line sport depends on in-app advertisements and improve notifications to drive income, however the logic for deciding which content material to set off for the consumer is dynamic. So, you queue up all of the notification content material for every account and in real-time devour and filter the specified content material.

  • Actual-Time Occasions – your monetary software receives tens of millions of hits to its API all through the day. So as to preserve your endpoints quick, you queue occasions instantly after receiving them and course of them later.



Precisely as soon as supply

The distinguishing property in any queueing use case is whether or not or not the duty is idempotent. That’s, can the identical activity be executed a number of instances with out opposed results? If that’s the case, then the queue can ship the identical message to the employee pool greater than as soon as and the inner queueing coordination and locking complexity is decreased. In fact, if the duties aren’t idempotent, otherwise you merely do not wish to waste the duplicative compute capability, the employee itself can use a lock to make sure a message is just processed as soon as in an a minimum of as soon as queueing system.

For our use case, we’re concerned about precisely as soon as supply of every message (i.e., precisely as soon as execution of a activity), and on condition that the a minimum of as soon as coverage nonetheless requires a lock anyway to attain precisely as soon as conduct, we’ll use the atomicity of SQL writes because the lock in our queue.



Information mannequin

Thus far, we have been alluding to a activity as a single knowledge object, however in observe, we solely really want the duty’s identifier to be able to coordinate its lock and proprietor. Any employee receiving a activity identifier for execution might learn the duty’s info from one other desk. As such, we might use two tables: one for the duty task and one for the duty info, doubtlessly saving bandwidth on database operations.

For simplicity, we’ll take into account a single desk with the entire related info saved in a particulars discipline.
Contemplate the next knowledge mannequin definition for a Job:

desk duties:
    id serial
    particulars textual content
    standing textual content -- One in all: queued, executing, accomplished
    operator_id textual content
    expiry timestamp
    created timestamp
    last_updated timestamp
Enter fullscreen mode

Exit fullscreen mode

The lifecycle of a activity in our mannequin is the next:

  • a activity begins within the queued state when it’s written to the desk,
  • a activity enters the executing state when a employee is assigned the duty and it begins execution, and
  • a employee updates a activity to the accomplished state when it finishes execution.



Implementation

We use a Supervisor course of to assign duties to employees, and we run two situations of the Supervisor to enhance fault tolerance and preserve availability throughout deployments. Managers repeatedly question the database for queued duties and assign them to employees. Moreover, Managers search for duties which have reached their most execution time (i.e., “timed out”), and reassign these duties to new employees.

A employee could be a native thread on the identical machine, or a distant machine that may settle for work. The Supervisor would not actually care so long as it might assume a dependable communication channel to the employee. (Clearly, the Supervisor may carry out further jobs equivalent to employee liveness or reporting,
however we omit these particulars right here.)



Database queries

The database queries kind the spine of our SQL queue and
concentrate on the areas of figuring out unfinished duties and finishing duties. The Supervisor (or one other course of) is chargeable for queueing new duties by writing them to the desk. Then, the Supervisor queries the desk periodically to establish duties which might be able to be executed.



Discover executable duties:

    choose
        *
    from duties t
    the place t.standing = 'queued' or (t.standing = 'executing' and t.expiry <= NOW())
Enter fullscreen mode

Exit fullscreen mode

Our Supervisor randomizes the returned duties to scale back competition throughout Supervisor situations when deciding on a activity. Then, the Supervisor makes an attempt to lock a activity in order that it might assign the duty to a employee.



Try and lock a activity

    replace duties t
        set t.standing = 'executing',
            t.operator_id = $operator_id,
            t.expiry = $expiry,
            t.last_updated = NOW()
    the place t.id = $id and t.last_updated = $last_updated
Enter fullscreen mode

Exit fullscreen mode

The Supervisor calculates the expiry worth as the present time plus the utmost period of time it expects a employee to take to execute the duty, plus some buffer. If the question returns with 0 rows up to date, then it means the Supervisor didn’t receive the lock. In any other case, if the question returns with a worth of 1, it means the Supervisor efficiently locked the duty. The Supervisor can now assign the duty to a employee for execution.
If the employee fails to execute the duty or takes too lengthy to execute the duty, then the duty might be chosen within the first question to search out executable duties.

This code makes use of the last_updated column as an indicator for whether or not a row has been written because it was final learn. Thus, this optimistic concurrency management mechanism will fail if a row could possibly be written or up to date with out updating the last_updated column. Typically, the decision of the last_updated timestamp have to be better than the system’s shortest learn+write replace.

The employee will then learn the duty from the desk by its id and use the particulars to execute the duty. It’s possible you’ll want to have your employee replace the duty row to specify its personal operator_id to help in debugging as soon as it begins executing the duty. Regardless, when the employee completes execution of the duty, it updates the row to point that it is full.



Replace a accomplished activity

    replace duties t
        set t.standing = 'full',
            t.operator_id = NULL,
            t.expiry = NULL,
            t.last_updated = NOW()
    the place t.id = $id and t.last_updated = $last_updated
Enter fullscreen mode

Exit fullscreen mode

This method has employees marking duties as full, however in case you wished to consolidate writes to this desk throughout the Supervisor solely, you could possibly have the Supervisor search for duties that fulfill some exterior property, such because the presence of a log file or a row in one other desk, earlier than marking duties as full.



Dialogue

A significant profit to this strategy is that it has no exterior or third-party dependencies. For any system that already depends on a relational database, this strategy incurs no further infrastructure.

A second main benefit of this strategy is that it routinely produces a historic report or log of all duties that have been executed. It’s possible you’ll want to trim the duties desk relying how shortly it grows in dimension, however broadly talking, the audit log it gives could be very helpful for debugging issues.

Lastly, this strategy is each scalable as you add extra employees, and fault tolerant to employee and Supervisor failures. With solely three database queries, the Supervisor or employee can fail at any level, and the duty will finally be retried. To supply additional safety in opposition to a replica execution, you may introduce an exterior lock or state administration to trace a employee’s progress in executing a activity. In the same vein, you could want to add a column for retries of a activity.

The most important draw back or threat to this strategy of queueing is that this setup {couples} the throughput/quantity of your system processing to the learn/write capability of your database. In case your database is “small” and has an in any other case low quantity of writes, however your work queue has excessive quantity/throughput necessities, you could find yourself in a state of affairs the place it’s a must to scale your database solely due to the necessities of the work queue. And, after all, in case you do not proactively monitor the expansion of your quantity, the competition brought on by employees studying and writing on this desk might negatively influence the remainder of your database operations and your system as an entire.

A second downside to this strategy is that you’re managing the complexity your self. With Amazon SQS you might be outsourcing all implementation particulars to a third-party. However with this strategy, you must guarantee that the queries and desk indices are appropriate. Equally, and associated to the primary draw back, there is not the warm-and-fuzzy feeling with a self-built strategy like this that you just may get from Amazon’s eleven 9’s of service reliability or throughput ensures.
Nonetheless, over time, the operational maturity will improve your confidence.



Conclusion

The simplicity of a SQL queue makes it engaging as an alternative choice to a managed, third-party queue abstraction. On this article, we introduced a minimal implementation of 1 such SQL-based activity queue. In the event you like this text and wish to talk about extra, you may attain us at [email protected]. We might love to listen to from you!



Abu Sayed is the Best Web, Game, XR and Blockchain Developer in Bangladesh. Don't forget to Checkout his Latest Projects.


Checkout extra Articles on Sayed.CYou

#Implementing #Job #Queue #SQL