Migrating data into your Django project

A successfully applied technique to migrate data into a Django project

There are times when we have an existing, legacy, DB and we need to migrate its data into our Django application. In this post I’ll share a technique that we successfully applied for this.

Working on a big project, our client had an existing application using a MySQL DB. Our objective was to develop a new, more modern, feature-rich, Django 1.5-based version of his tool. At a certain stage of the development our client requested that we migrate some of the current users’ data into the new system, so we could move to a beta-testing phase.

The method that we applied not only allowed us to effectively migrate dozens of users to the new system, but also we could keep doing migrations as the application continued its development.

General description

We based our work in two very powerful Django’s features:

  1. Multiple databases and
  2. Integrating Django with a legacy database

So, the general procedure would be:

  1. Add a new, legacy database to your project.

  2. Create a legacy app.
    • Automatically generate the models
    • Set up a DB router.
  3. Write your migration script.

Let’s describe each step a little bit more:

1. A legacy database

We assume here that you have access to the legacy DB. In our particular case, before each migration our client will give us a MySQL dump of the legacy DB. So we create a fresh legacydb in our own DB server and import the dump, every time.

However, it doesn’t matter how you access the legacy DB as long as you can do it from Django. So, following the Multiple databases approach, you must edit the project’s settings.py and add the legacy database. For example like this:

DATABASES = {
    'default': {
        'NAME': 'projectdb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'some_user',
        'PASSWORD': '123'
        },
    'legacy': {
        'NAME': 'legacydb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'other_user',
        'PASSWORD': '456'
    }
}

Depending on your objectives regarding the migration, this settings can be set either in your standard project’s settings.py file or in a different, special, settings file to be used only during extraordinary migrations.

2. A legacy app

The general idea here is that you start a new app that will represent your legacy data. All the work (other than the settings) will be done within this app. Thus, you can keep it in a different branch (maintain the migration feature isolated) and continue the development process normally.

inspectdb

Now, the key for this step is to follow the Integrating Django with a legacy database document. By using the admin’s inspectdb command the models.py file can be automatically generated!.

$ mkdir apps/legacy
$ python manage.py startapp legacy apps/legacy/
$ python manage.py inspectdb --database=legacy > apps/legacy/models.py

Anyways, as the documentation says:

This feature is meant as a shortcut, not as definitive model generation. After you run it, you’ll want to look over the generated models yourself to make customizations.

In our particular case, it worked like a charm and only cosmetic modifications were needed!

Database router

Next, a database router must be provided. It is Django’s mechanism to match objects with their original database.

Django’s default routing scheme ensures that if a database isn’t specified, all queries fall back to the default database. In our case, we will make sure that objects from the legacy app are taken from its corresponding DB (and make it read-only). An example router would be:

# Specific router to point all read-operations on legacy models to the
# 'legacy' DB.
# Forbid write-operations and syncdb.


class LegacyRouter(object):

    def db_for_read(self, model, **hints):
        """Point all operations on legacy models to the 'legacy' DB."""
        if model._meta.app_label == 'legacy':
            return 'legacy'
        return 'default'

    def db_for_write(self, model, **hints):
        """Our 'legacy' DB is read-only."""
        return False

    def allow_relation(self, obj1, obj2, **hints):
        """Forbid relations from/to Legacy to/from other app."""
        obj1_is_legacy = (obj1._meta.app_label == 'legacy')
        obj2_is_legacy = (obj2._meta.app_label == 'legacy')
        return obj1_is_legacy == obj2_is_legacy

    def allow_syncdb(self, db, model):
        return db != 'legacy' and model._meta.app_label != 'legacy'

Finally, to use the router you’ll need to add it to your settings.py file.

DATABASE_ROUTERS = ['apps.legacy.router.LegacyRouter']

Now you are ready to access your legacy data using Django’s ORM. Open the shell, import your legacy models and play around!

For a more detailed example of this technique applied, check this other blog post. It is based on Django 1.3 but still useful.

3. Your migration script

At this point you have access to the legacy data using Django’s ORM. Now it is time to write the actual migration script. There is no magic nor much automation here: you know your data model and (hopefully) the legacy DB structure. It is in your hands to create your system’s models instances and their relations.

In our case, we wrote an export.py script that we manually run from the command line whenever we need.

It’s a really good idea to perform the migration inside a single transaction. Otherwise, any error while running the migration script will let you with a partial (and possible inconsistent migration) and will force you to write complex logic to be able to resume it. The @transaction.commit_on_success decorator is a good way to achieve the desired effect. As a helpful side effect, it will also be faster to do a single commit.

Conclusions

As a general data-migration technique for Django applications, it has several advantages:

  • allows to migrate lots of data,
  • can be used with immature or changing data-models,
  • relies on standard Django’s features (ORM, Multiple databases),
  • the project’s testing infrastructure can be used normally,
  • can be used for one-time-only migration scripts as well as for continuous-migration’s features,
  • it can be applied in case of multiple and heterogeneous data sources.

On the other side, as usual, it is no silver bullet. One of the main problems here is that the complexity of the task is directly proportional to the difference between the DB models. Since the actual data manipulation must be programmed manually, very different data models potentially means a lot of work.

So, as stated in the beginning of the post: the method allowed us to successfully migrate a considerable amount of data into our system, allowing us to accommodate to changes as the application continued its development.


Previous / Next posts


Comments