Faster Reads With Materialized Views
This Blog aims to give you a gist of Views in PostgreSQL and how to integrate in Django.
PostgreSQL Views are quite handy, think of Views as wrapping(abstracting) your complex queries and assigning a name to them.
They are several benifits of using Views, here are few
Views hide the complexity
if you have a query that requires aggregating or joining multiple tables and has complex logic, you can code all that logic into a view and then retreive data from it as if it were a normal table.
Views could be used for security
imagine you have a table with sensitive data in certain columns/rows and you want to give access to a user without exposing that data, views allow us to wrap only the required data and grant permission to that view instead of underlying table.
Like all the SQL databases postgres has two kinds of views
- Views
- Materialized Views
Here are few differences between Views and Materialized Views
Views | Materialized Views |
---|---|
Views are not stored in the disk. | Materiliazed Views are stored on the disk. |
Whenever we read data from views, data is read from the underlying table. | Since materialized views has its own table, reading from materialized views will read from the table directly, but not from the underlying table. |
Let us try to implement Materialized Views. Consider a following scenario of a Blogging site. A typical Blogging site models would contain
- blogs table
- categories table
- tags table
where the relationship between them is quite straightforward, blogs has foreign key to categories and has a many to many relationship with tags. The equivalent Django Models would look like this.
class Tags(models.Model):
name = models.CharField(max_length=200, unique=True)
class Meta:
verbose_name = "Tag"
verbose_name_plural = "Tags"
db_table = "tags"
def __str__(self):
return self.name
class Category(models.Model):
name = models.CharField(max_length=200, unique=True)
class Meta:
verbose_name = "Category"
verbose_name_plural = "Categories"
db_table = "categories"
def __str__(self):
return self.name
class Blog(models.Model):
title = models.CharField(max_length=200, unique=True)
content = models.TextField()
created_at = models.DateTimeField(auto_now=True)
is_published = models.BooleanField(default=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True)
tags = models.ManyToManyField(Tags)
class Meta:
verbose_name = "Blog"
verbose_name_plural = "Blogs"
db_table = "blogs"
def __str__(self):
return self.title
Let’s assume a situation where we have to query all the three tables and get data from all the three tables, for us to achieve this we’ll have to join all the tables, the resulting django query would look like this
queryset = Blog.objects.select_related("category").prefetch_related("tags").all()
# the resulting SQL query generated by Django ORM is
SELECT "blogs"."id",
"blogs"."title",
"blogs"."content",
"blogs"."created_at",
"blogs"."is_published",
"blogs"."category_id",
"categories"."id",
"categories"."name"
FROM "blogs"
LEFT OUTER JOIN "categories"
ON ("blogs"."category_id" = "categories"."id")
LIMIT 21
SELECT ("blogs_tags"."blog_id") AS "_prefetch_related_val_blog_id",
"tags"."id",
"tags"."name"
FROM "tags"
INNER JOIN "blogs_tags"
ON ("tags"."id" = "blogs_tags"."tags_id")
WHERE "blogs_tags"."blog_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
Notice that Django actually performs two queries here, we can do better by writing our own query :)
SELECT b.id,
b.title AS blog_title,
b.created_at AS blog_created_at,
b.is_published AS blog_is_published,
b.category_id AS blog_category_id,
b.content AS blog_content,
c.id AS category_id,
c.name AS category_name,
bt.blog_id AS blog_tag_blog_id,
count(t.id) AS tag_count,
jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tag_json
FROM blogs b
LEFT JOIN categories c ON b.category_id = c.id
LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
LEFT JOIN tags t ON bt.tags_id = t.id
GROUP BY b.id,
c.id,
bt.blog_id
ORDER BY b.id;
As you can see the query is quite complex, we’ll need to run the query every time when we need to access the data, thankfully views provide a convenient way to make it simple.
Let’s see how we can create a Materialized Views
CREATE materialized VIEW mv_blogs_with_categories_and_tags_combined AS
SELECT b.id,
b.title AS blog_title,
b.content AS blog_content,
b.created_at AS blog_created_at,
b.is_published AS blog_is_published,
c.id AS category_id,
c.name AS category_name,
count(t.id) AS tag_count,
jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tags
FROM blogs b
LEFT JOIN categories c ON b.category_id = c.id
LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
LEFT JOIN tags t ON bt.tags_id = t.id
GROUP BY b.id,
c.id,
bt.blog_id
ORDER BY b.id WITH NO DATA;
--- create a unique index, this is needed when we try to refresh the materialized view concurrently
CREATE UNIQUE INDEX ON mv_blogs_with_categories_and_tags_combined (id);
--- refresh the materialized view, to populate the data into it.
REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;
--- refresh the materialized view concurrently, without locking the table
--- when we refresh the view without concurrently , a lock is acquired on the table
--- to prevent this we use concurrently
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_blogs_with_categories_and_tags_combined;
In the below screenshots you can see the execution times, when querying the materialized view table vs performing the complex query.
Execution Time: 25.874 ms
Execution Time: 0.155 ms
Speed Up ~ 160x
All the code is available here in this repo https://github.com/sreevardhanreddi/django-materialized-views
Integration With Django
Now that the SQL is done, let us go through integrating it with Django
Create an empty migrations file using this command
python manage.py makemigrations --name materialized_blogs pg_app --empty
This file that is generated is used to create the migrations using Django’s migrate command and it like show below
# Generated by Django 3.2.6 on 2021-08-12 06:59
from django.db import migrations
from django.db.migrations.operations.special import RunSQL
class Migration(migrations.Migration):
dependencies = [
("pg_app", "0001_initial"),
]
operations = [
migrations.RunSQL(
"""
CREATE materialized VIEW mv_blogs_with_categories_and_tags_combined AS
SELECT b.id,
b.title AS blog_title,
b.content AS blog_content,
b.created_at AS blog_created_at,
b.is_published AS blog_is_published,
c.id AS category_id,
c.name AS category_name,
count(t.id) AS tag_count,
jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tags
FROM blogs b
LEFT JOIN categories c ON b.category_id = c.id
LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
LEFT JOIN tags t ON bt.tags_id = t.id
GROUP BY b.id,
c.id,
bt.blog_id
ORDER BY b.id WITH NO DATA;
""",
"""
DROP MATERIALIZED VIEW IF EXISTS mv_blogs_with_categories_and_tags_combined;
""",
),
migrations.RunSQL(
"""
REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;
""",
"""
REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;
""",
),
migrations.RunSQL(
"""
CREATE UNIQUE INDEX idx_mv_blogs_with_categories_and_tags_combined_id ON mv_blogs_with_categories_and_tags_combined (id);
""",
"""
DROP INDEX IF EXISTS idx_mv_blogs_with_categories_and_tags_combined_id
""",
),
]
# RunSQL allows you to write code for migrating forwards and backwards
# i.e, applying migrations and unapplying them. here
# the first string in RunSQL is the forward SQL, the second is the reverse SQL
Once you have added the SQL commands, run python manage.py migrate
this will apply the migrations to the database.
After this create model, which reflects the materialized view
class BlogsWithCategoriesAndTagsCombined(models.Model):
id = models.IntegerField(primary_key=True)
blog_title = models.CharField(max_length=200, unique=True)
blog_content = models.TextField()
blog_created_at = models.DateTimeField()
blog_is_published = models.BooleanField()
category_id = models.IntegerField()
category_name = models.CharField(max_length=200)
tag_count = models.IntegerField()
tags = models.JSONField()
class Meta:
managed = False
db_table = "mv_blogs_with_categories_and_tags_combined"
def __str__(self):
return self.blog_title
Notice the meta class here, it has managed = False
, this tells django to not create the table for this model in the database.
Once this is done, querying it is as simple as
queryset = BlogsWithCategoriesAndTagsCombined.objects.all()
Since materialized views rely on the tables for the data, when a table updates, we’ll have to manually refresh the materialized views, this can be done inside a cron job using django-crontab or celery.