Django: Migrations and Data Seeding
--
In this blog, I’ll be talking about Django’s way of database migrations and data seeding. I’ll try to explain both as briefly as I can and focus more on the examples. Well then, let’s get started!
Database Migrations
Migrations are Django’s way of propagating changes you make to your models (adding a field, deleting a model, etc.) into your database schema. They’re designed to be mostly automatic, but you’ll need to know when to make migrations, when to run them, and the common problems you might run into.
The two main commands we’ll probably use in Django in migrating will probably be:
makemigrations
, which is responsible for creating new migrations based on the changes you have made to you models.migrate
, which is responsible for applying and unapplying migrations.
You should think of migrations as a version control system for your database schema. makemigrations
is responsible for packaging up your model changes into individual migration files - analogous to commits - and migrate
is responsible for applying those to your database.
Whether you are using PostgreSQL, MySQL, SQLite, or any other databases. The steps taken to migrate your models to the database is still the same. For more information about migrations, you can check out the Migrations Docs that Django made. Well, enough theory crafting, let’s get straight onto an example.
Example of Django Database Migrations
Let’s say that we want to create a Booking model that depends on the User and Proyek model:
# booking/models.pyfrom django.db import models
from monitoring.models import Proyek
from account.models import Userclass Booking(models.Model):
nama_kegiatan = models.CharField(max_length=30)
waktu_booking = models.TimeField()
tanggal_booking = models.DateField()
proyek = models.ForeignKey(Proyek, on_delete=models.PROTECT)
user = models.ForeignKey(User, on_delete=models.CASCADE)
deskripsi = models.TextField(null=True)
Then, we need to run python manage.py makemigrations
to tell Django that we’ve done some changes to the models. Django will create a migration file in a the migrations
folder (in this case booking/migrations
). The file name should be 0001_initial.py
since it is the first change we’ve done to the model. Here’s the file contents:
# booking/migrations/0001_initial.py
# Generated by Django 3.1.1 on 2021-06-05 08:18from django.conf import settings
from django.db import migrations, models
import django.db.models.deletionclass Migration(migrations.Migration):
initial = True
dependencies = [
('monitoring', '0001_initial'),
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
] operations = [
migrations.CreateModel(
name='Booking',
fields=[
('id', models.AutoField(auto_created=True, ..)),
('nama_kegiatan', models.CharField(max_length=30)),
('waktu_booking', models.TimeField()),
('tanggal_booking', models.DateField()),
('deskripsi', models.TextField(null=True)),
('proyek', models.ForeignKey(...)),
('user', models.ForeignKey(...)),
],
),
]
After makemigrations
we can then tell Django to read all migrations files that have been changed and then migrate all the changes to the database. We do this by using the migrate
command.
python manage.py migrate
What if you want to change something in the model. Let’s say we want to change the max_length
of nama_kegiatan
from 30 to 50.
# booking/models.py...class Booking(models.Model):
nama_kegiatan = models.CharField(max_length=50)
...
Now, if we run makemigrations
, Django will create another migrations file with the prefix 0002_...
.
# booking/migrations/0002_....py
# Generated by Django 3.1.1 on 2021-06-07 16:54from django.db import migrations, modelsclass Migration(migrations.Migration): dependencies = [
('booking', '0001_initial'),
] operations = [
migrations.AlterField(
model_name='booking',
name='nama_kegiatan',
field=models.CharField(max_length=50),
),
]
Instead of overwriting the initial migrations file, Django will create a new migration file. And we can see the difference is that whenever we change an argument of a field (in this case the max_length
), the migration file will use the AlterField
function instead of the CreateModel
function used in 0001_initial.py
. This feature is similar to Git which acts as a version controller. It helps us know how the initial model made and what changes we’ve done to it.
Data Seeding
What is data seeding?
Data seeding is the action of pre-populating your database with hard-coded data. The process is usually automated and executed upon the initial setup of an application or whenever the database is completely empty. Data seeding is really handy when you have multiple models and consistently do migrations on them. It’s also handy when developing in a team and helps too with testing.
Why we need it?
Every team member has different local databases. Now, imagine that your feature depends on another feature that other team members implement, you would need to create instances of the models of the feature you depend on manually. If the model only has one field, then that would be no problem. But in most cases, the models would have more than one field for each model which takes a lot of effort to just create one instance manually, especially when testing your feature. This is where data seeding is needed!
Fixtures
Before we go to the example, I have one term to explain which is Fixture. Fixtures are collections of data that can be read by Django and loaded into it’s database. Fixtures can also be used or created to store existing data. So, in essence, fixtures are a way for Django to export and import data into the database. We can write fixtures you want to store into the database with the format XML, JSON, or YAML. Here’s an example of a fixture file in JSON:
# This is from official documentation
[
{
"model": "myapp.person",
"pk": 1,
"fields": {
"first_name": "John",
"last_name": "Lennon"
}
},
{
"model": "myapp.person",
"pk": 2,
"fields": {
"first_name": "Paul",
"last_name": "McCartney"
}
}
]
A fixture file in JSON is an array of JSON objects. Each object has exactly three keys:
- model indicates where the model is located scoped by the app name
- pk indicates what the value of the primary key would be
- fields contains all the names of the fields and their respective values
Enough theory crafting, let’s get straight to the example.
Example of data seeding in our project
According to Django’s Providing initial data for models documentation, we can seed data into the database by using the command python manage.py loaddata <fixture-file>
.
So here’s an example of a fixture file in my project called 0001_User.json
:
# seed/0001_User.json[
{
"model": "account.user",
"pk": 1,
"fields": {
"email": "user1@user.com"
}
},
]
You can also create more user intances from the shell
command. Here is an example:
python manage.py shell
>>> from account import User
>>> User(pk=2, email="user2@user.com").save()
Then, you can simply use the dumpdata
command to update the fixture file. Here is an example:
python manage.py dumpdata account.user --indent 4 > seed/0001_User.json
The generated file from executing the dumpdata
command will look like this:
# seed/0001_User.json[
{
"model": "account.user",
"pk": 1,
"fields": {
"email": "user1@user.com"
}
}, {
"model": "account.user",
"pk": 2,
"fields": {
"email": "user2@user.com"
}
},
]
Then, whenever we reset the database, we can simply run:
python manage.py loaddata seed/0001_User.json
Django will automatically create the instances in the database without us creating it manually. We can also loaddata
from all fixture files, for example:
python manage.py loaddata seed/*.json
From the example above, the user only has one field. Problems arise when a model requires multiple fields to create. We can use tools that are available to resolve the problem. So, the tool we’re going to use is django_seed.
django_seed
In short, django_seed creates instances of the specified model You can then create a fixture file by using the dumpdata
command. You can check out django_seed‘s guide here. For installation, simply run pip install django-seed
on the virtual environment or globally. Then, put in django_seed
to the INSTALLED_APPS
in your settings.py
.
How to create fixture file using django_seed
First of all, we use django_seed to create instances in the database. Let’s say we want to make two Proyek instances. Since the Proyek model is in the monitoring app, we use the command:
python manage.py seed monitoring --number 2
Then run dumpdata
to make a fixture of the Proyek instance.
python manage.py dumpdata monitoring.proyek --indent 4 > 0002_Project.json
The data that’s seeded using django_seed is always different. So, we generate the seeded data into a fixture so that every other developers have the same data as we do. Even though django_seed provides some easyness to data seeding, most of it does not resemble real data. For example, the status
field of each Proyek is suppose to be berjalan
, selesai
, and belum dimulai
. Whereas, django_data fills it with some random string. This is why django_seed is recommended to use only if you don’t need the data to resemble real data.
Automatic Seeding after Migration
Everytime I migrate
, I usually destroy the previous database to avoid any inconsistent data fields. So, I just made a simple bash script that automatically runs migrate and then automatically seeds all the data in the /seed
directory.
#migrate_and_load.sh
#!/bin/bashecho "Migrating..."
python manage.py migrateecho "Load data..."
python manage.py loaddata seed/*.json
Final Thoughts
Learning Django migrations is important since I am sure that some of us only use it without knowing what it actually does. Migrations is also related to data seeding and I’ve shown how important seeding is especially when we’re developing in a team in the example above. In cases where the models aren’t too complex or each model has high coupling, I personally think that seeding is not needed. Nonetheless, migrations and data seeding are both useful knowledge and provide benefits when used in the correct way.