Task Queues and Python RQ

A task queue is a powerful concept to avoid waiting for a resource-intensive task to finish. Instead, the task is scheduled (queued) to be processed later. Another worker process will pick up this task and do it in the background. This is very useful when the outcome of the task is not important in the current context but it must be executed anyway.

An example of such scenario is when a user registers in a website and an email is sent to him for email confirmation. If we send the email during the HTTP request execution, the user has to wait for the communication with the SMTP server (or an external email service provider) to finish which may take long time if there is load on these services. Combine this with another task, like sending an SMS, and it gets worse. Now imagine another scenario where it is necessary that when a user posts a new message, an email is sent to all his friends. It will take ages if this user has hundreds of friends and much worse if it was thousands. Other resource-intensive tasks may include anything like generating a report, fetching some HTTP resource, converting a file from a format to another and image or video processing etc.

Another benefit and use case for task queues is distributing the load of processing to multiple nodes. Task queues use message queues to schedule tasks. Messages will be picked up by the worker process which listens and consumes messages (tasks) from this message queue. However, there is one rule that should always apply for task queues to be useful: the time to put the task into the queue should be less than the time needed to execute the task.

There are some frameworks and libraries to implement task queues in different languages, but sometimes it is not necessary to use the same language to use it since message passing decouples the producer from consumer. Some examples of task queues:

  1. Celery, written in python, “Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.”
  2. Python RQ, also written in python, “RQ (Redis Queue) is a simple Python library for queueing jobs and processing them in the background with workers. It is backed by Redis and it is designed to have a low barrier to entry. It should be integrated in your web stack easily.”
  3. Resque, written in Ruby, backed by Redis.
  4. Beanstalk: written in C, Beanstalk is “a simple, fast work queue, originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.”
  5. Backburner: written in ruby, Backburner is “a beanstalkd-powered job queue that can handle a very high volume of jobs.”
  6. Huey: another python task queue using Redis with corntab like periodic execution and trying failed tasks.
  7. Taskmaster: “A simple distributed queue designed for handling one-off tasks with large sets of tasks”.

You can always roll out your own solution, for example, you can use RabbitMQ and any AMQP client (pika for python, RabbitMQ java client, Bunny for ruby, php-amqplib for php and .NET client for C#) to implement task queues. Here is a basic example on how to use RabbitMQ as a task queue from different languages.

Python RQ and django-rq

In this post, we will quickly go through an example of processing some task in the background using python RQ. Python RQ is very simple to use compared to something like Celery since “it is designed to have a low barrier to entry”. Django-RQ “is a simple app that allows you to configure your queues in django's settings.py and easily use them in your project.”

First you should install django-rq:

pip install django-rq  

add it to settings.py:

    # other apps

Configure the queues in settings.py:

    'default': {
        'HOST': 'localhost',
        'PORT': 6379,
        'DB': 0

Obviously, you need to install and run Redis to use RQ.

Now suppose we want to push a file to user Dropbox folder upon some user action. All we need to do is to decorate this function with @job decorator:

from django_rq import job  
from dropbox import client, session

def push_to_dropbox(file, file_name, client_id, app_secret, token, token_secret):  
    sess = session.DropboxSession(client_id, app_secret)
    sess.set_token(token, token_secret)
    dclient = client.DropboxClient(sess)
    with open(f_path, "rb") as from_file:
        dclient.put_file("/" + f_name, from_file)

Now, in your django view, all you need to do is to schedule the task using .delay:

def push_to_my_dropbox(request):  
    # some stuff
    push_to_dropbox.delay(file, file_name, client_id, app_secret, token, token_sec)
    # render template

Now once you execute .delay, a message is stored in Redis and you can return immediately to user without him waiting for Dropbox API call to return. This message will be consumed by one of RQ workers and processed. Now where are my workers?!

To start a worker, all we need to do is to run the rqworker custom management command:

    python manage.py rqworker default

Abdulaziz AlMalki

Read more posts by this author.