{"id":2309,"date":"2014-10-14T16:04:47","date_gmt":"2014-10-14T08:04:47","guid":{"rendered":"http:\/\/tzlee.com\/blog\/?p=2309"},"modified":"2014-12-09T00:01:58","modified_gmt":"2014-12-08T16:01:58","slug":"why-status-fields-must-die","status":"publish","type":"post","link":"https:\/\/tzlee.com\/blog\/2014\/10\/why-status-fields-must-die\/","title":{"rendered":"Why status fields must die"},"content":{"rendered":"<p>I&#8217;m sure we have seen this happen: A programmer forgets\u00a0to add a <strong>status = 1<\/strong> condition\/check to an SQL query and the app disintegrates.<\/p>\n<p>Status fields can be useful in some ways especially if the range of values are large, e.g. to track order status. But the most common use I&#8217;ve seen is <strong>data retention<\/strong>, <em>i.e. keeping a deleted record<\/em>. The status field is an ancient way of archiving a\u00a0record, and this design exists in almost every application I&#8217;ve seen, because back in those days schools told us to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Database_normalization\">normalize tables<\/a>. Some companies even make it a mandatory design requirement to have this field in every table &#8212; even in <em>many-to-many<\/em> join tables.<\/p>\n<p>We should avoid\u00a0designing tables with record deletion\u00a0status fields. Why? There&#8217;s many reasons why, and the common accidental omission of the status check is one of those reasons. There are also different data types being used, sometimes highly inefficient for indexing &#8212; for example\u00a0<strong>INT<\/strong>, a <em>32-bit signed integer<\/em> is very commonly used for storing values 0 and 1. The ambiguity is also a common problem: What does 0, 1 and 2 stand for? Where are these defined? In constants, or cross-referenced to another table? The status check\u00a0is also an additional condition that the database needs index\/filter. What&#8217;s worse is that sometimes the check is done at the model layer, while other times the check is done at the controller layer&#8230; or maybe both&#8230; or maybe none.<\/p>\n<p>So what&#8217;s a better way of dealing with record archival? If there&#8217;s really no reason to keep the record, <strong>delete it<\/strong>.<\/p>\n<p>If the record should be kept for some reason, <strong>archive it into another table<\/strong>, a technique known as <em>partitioning<\/em>. It would also be more efficient to search a table containing only records you need to query.<\/p>\n<p>Partitioning also allows you to tier by storage performance, e.g. place &#8220;live&#8221; tables on high performance storage, and &#8220;archive&#8221; tables on slower storage devices. You can also tweak indexing to cater to different search patterns on both sets of data.<\/p>\n<p>Archiving data is very easy, and is not much more complicated than a delete: If the two tables are identical (in column definition), simply use INSERT INTO &#8230; SELECT, then DELETE.<\/p>\n<p>If you are worried about deletes being I\/O intensive, then consider batching the deletes or having a cache. This will depend on how your application is designed.<\/p>\n<p>(Side note: <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/partitioning-overview.html\">MySQL 5.1 has a new partition feature<\/a> that can be used to automagically partition tables, but that is more for performance purposes. This article is discussing a design issue to remove common errors in programming.)<\/p>\n<p><em>Edit: Updated Oct 19. Friend asked about status fields that may have a large range of values, such as transaction statuses. These are OK to remain as fields. I was primarily targeting record archival.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m sure we have seen this happen: A programmer forgets\u00a0to add a status = 1 condition\/check to an SQL query and the app disintegrates. Status fields can be useful in some ways especially if the range of values are large,&#8230; <a class=\"more-link\" href=\"https:\/\/tzlee.com\/blog\/2014\/10\/why-status-fields-must-die\/\">Continue Reading &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[8],"tags":[461,465,462,463,464],"class_list":["post-2309","post","type-post","status-publish","format-standard","hentry","category-tech","tag-database","tag-mysql","tag-partioning","tag-record-status","tag-status-flag"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/posts\/2309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/comments?post=2309"}],"version-history":[{"count":11,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/posts\/2309\/revisions"}],"predecessor-version":[{"id":2320,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/posts\/2309\/revisions\/2320"}],"wp:attachment":[{"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/media?parent=2309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/categories?post=2309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tzlee.com\/blog\/wp-json\/wp\/v2\/tags?post=2309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}