Tip for optimizing MySQL data types

This is a tip that I’ve kept forgetting to write down so here it is:

During a system’s life cycle, requirements change and components are refactored. This includes databases as well, and particularly as data grows. Decisions and assumptions are made at the beginning of a system’s life cycle that may or may not hold up over years of operation and it’s good practice to continually analyze how well the initial design is working.

When doing analysis in support of refactoring database schemas in MySQL, I’ve found this little bit of SQL to be invaluable.



(I suggest giving it a try on a small table with few rows.)

PROCEDURE ANALYSE interrogates the values in a table, shows the smallest and largest values and suggests a type for each column. While the results frequently indicate that an ENUM type is the most appropriate you can add arguments to the ANALYSE procedure to get more rational suggestions. However, even with no arguments the results can be useful.

For example, you might see that the max value of a column actually is smaller than the type that it’s using. I’ve frequently seen INT(11) columns that would work fine as a MEDIUMINT or even TINYINT. Or your might find that an ENUM type is better since the distribution of values is small in a VARCHAR column. (The benefit of an ENUM is that the data is stored as an integer rather than the string value so its’ footprint on the disk can be significantly smaller).

Anyway, while it’s not a panacea, PROCECURE ANALYSE() is another helpful tool.

Leave a Reply