Zero Downtime Migrations With Rails and Mysql 5.6+

Sat Apr 18, 2020
~700 Words

As the size of your tables increases the risk of downtime also increases as MySQL will by default apply a full table lock for many typical DDL operations.

If you find yourself in the position where adding a column to a table will cause downtime for your application then you may avoid transformation of your schema in general, which can stop a major part of the system design from evolving.

The effect of this reminds me of the concept of Gumption Traps, from the book Zen and the Art of Motorcycle Maintenance, where any task or idea which would involve such a risky operation remains undone because it comes with this operational burden that saps the enthusiasm of those who would undertake it.

Below are some notes which, at least for Rails and MysQL 5.6+, cover some straightforward ways to perform typical DDL operations online, i.e. without the tables being locked or otherwise causing the tables to be copied in full.

The documentation states that:

Typically, you do not need to do anything special to enable online DDL. By default, MySQL performs the operation in place, as permitted, with as little locking as possible.

In my experience it seems that we need to specify the algorithm and lock level explicitly in most cases, but your mileage may vary.

Online DDL: Explicit specification of algorithm and lock level

To avoid accidentally making the table unavailable for reads, writes, or both, specify a clause on the ALTER TABLE statement such as LOCK=NONE (permit reads and writes) or LOCK=SHARED (permit reads). The operation halts immediately if the requested level of concurrency is not available.

To avoid tying up the server with an ALTER TABLE operation that copies the table, include ALGORITHM=INPLACE. The statement halts immediately if it cannot use the in-place mechanism.

In using this functionality we will need to consult the documentation to check that we won’t be subject to the various caveats. Operations which have been tried and tested are below.

Caveat

From the docs on the limitations of InnoDB regarding online migrations:

Tables created before MySQL 5.6 that include temporal columns (DATE, DATETIME or TIMESTAMP) and have not been rebuilt using ALGORITHM=COPY do not support ALGORITHM=INPLACE. In this case, an ALTER TABLE … ALGORITHM=INPLACE operation returns the following error…

As all ActiveRecord tables include timestamp columns by default, there is a need to run one alter table statement using COPY mode before we can use INPLACE on each of them in future.

Operations

Add a null column

execute 'ALTER TABLE `survey_responses` ADD `invite_at` datetime, ALGORITHM=INPLACE, LOCK=NONE;'

Optionally set a default, as a separate operation

change_column_default :users, :a_column, "default_value"

Drop a column

First deploy a change to ignore the column to remove

class SurveyResponse < ApplicationRecord
  self.ignored_columns = ['invite_at']
end

Then in a separate deploy remove the column

execute 'ALTER TABLE `survey_responses` DROP COLUMN `invite_at`, ALGORITHM=INPLACE, LOCK=NONE;'

Adding/removing a foreign key constraint

The INPLACE algorithm is only useable if referential integrity checking is disabled, otherwise COPY is used. MySQL documentation.

Removing an FK uses INPLACE.

  def up
    ActiveRecord::Base.connection.disable_referential_integrity do 
      add_foreign_key :applicant_stage_changes, :applicants
    end
  end

  def down
    remove_foreign_key :applicant_stage_changes, :applicants
  end

Making a column not null

This requires setting the sql_mode for the session to either STRICT_ALL_TABLES or STRICT_TRANS_TABLES in order to work. This is done using SET SESSION.

  def up
    execute "SET SESSION sql_mode = 'STRICT_ALL_TABLES';"
    execute "ALTER TABLE applicant_stage_changes MODIFY COLUMN campaign_id INTEGER NOT NULL, ALGORITHM=INPLACE, LOCK=NONE;"
  end

  def down
    execute "ALTER TABLE applicant_stage_changes MODIFY COLUMN campaign_id INTEGER NULL, ALGORITHM=INPLACE, LOCK=NONE;"
  end

Renaming a column

ALTER TABLE tbl CHANGE old_col_name new_col_name data_type, ALGORITHM=INPLACE, LOCK=NONE;

To permit concurrent DML, keep the same data type and only change the column name.

When you keep the same data type and [NOT] NULL attribute, only changing the column name, the operation can always be performed online.

You can also rename a column that is part of a foreign key constraint. The foreign key definition is automatically updated to use the new column name. Renaming a column participating in a foreign key only works with ALGORITHM=INPLACE. If you use the ALGORITHM=COPY clause, or some other condition causes the command to use ALGORITHM=COPY behind the scenes, the ALTER TABLE statement fails.