I’m sure we have seen this happen: A programmer forgets to add a status = 1 condition/check to an SQL query and the app disintegrates.

Status fields can be useful in some ways especially if the range of values are large, e.g. to track order status. But the most common use I’ve seen is data retention, i.e. keeping a deleted record. The status field is an ancient way of archiving a record, and this design exists in almost every application I’ve seen, because back in those days schools told us to normalize tables. Some companies even make it a mandatory design requirement to have this field in every table — even in many-to-many join tables.

We should avoid designing tables with record deletion status fields. Why? There’s many reasons why, and the common accidental omission of the status check is one of those reasons. There are also different data types being used, sometimes highly inefficient for indexing — for example INT, a 32-bit signed integer is very commonly used for storing values 0 and 1. The ambiguity is also a common problem: What does 0, 1 and 2 stand for? Where are these defined? In constants, or cross-referenced to another table? The status check is also an additional condition that the database needs index/filter. What’s worse is that sometimes the check is done at the model layer, while other times the check is done at the controller layer… or maybe both… or maybe none.

So what’s a better way of dealing with record archival? If there’s really no reason to keep the record, delete it.

If the record should be kept for some reason, archive it into another table, a technique known as partitioning. It would also be more efficient to search a table containing only records you need to query.

Partitioning also allows you to tier by storage performance, e.g. place “live” tables on high performance storage, and “archive” tables on slower storage devices. You can also tweak indexing to cater to different search patterns on both sets of data.

Archiving data is very easy, and is not much more complicated than a delete: If the two tables are identical (in column definition), simply use INSERT INTO … SELECT, then DELETE.

If you are worried about deletes being I/O intensive, then consider batching the deletes or having a cache. This will depend on how your application is designed.

(Side note: MySQL 5.1 has a new partition feature that can be used to automagically partition tables, but that is more for performance purposes. This article is discussing a design issue to remove common errors in programming.)

Edit: Updated Oct 19. Friend asked about status fields that may have a large range of values, such as transaction statuses. These are OK to remain as fields. I was primarily targeting record archival.