Merging legacy MySQL databases with conflicting unique IDs

When upgrading the structure of an old database with production data, it’s easy to find resources online that will show you how to do a MySQL dump and import it into a new database with an improved structure. Heck, even if the database has some oddities with it, it’s usually straight forward.But what if you have this scenario:A production database where most tables have an auto increment column used as a unique ID. Most of these unique ID’s are used in other tables as foreign keys. In addition, the tool using this database structure has been deployed to multiple locations. You’ve been tasked with upgrading the tool so that data from all of the deployed instances can be exported and merged together so one tool can have ALL of the data from ALL of the desperate instances without conflict.Yea…exactly.Obviously my first thought involved changes to the application itself. But this would mean months and months of rearchitecting how the tool functions and interacts with it’s database.Given that this didn’t need to be a “perfect” solution with a focus on something that “just works”, I turned my eye to patching the database. But as I ran down that rabbit hole, I found that there were few examples of people facing this issue online and thus had to figure out my own solution.In the end, it was rather simple.file: update.sql USE DB_NAME;ALTER TABLE table_1 DROP FOREIGN KEY fk_1;ALTER TABLE table_1 ADD CONSTRAINT fk_1 FOREIGN KEY (fk_1_col_name) REFERENCES table_2 (col_name) ON UPDATE CASCADE;ALTER TABLE table_1 DROP FOREIGN KEY fk_2;ALTER TABLE table_1 ADD CONSTRAINT fk_2 FOREIGN KEY (fk_2_col_name) REFERENCES table_2 (col_name) ON UPDATE CASCADE;UPDATE table_1 SET col_name = col_name + BIG_NUMBER;UPDATE table_2 SET col_name = col_name + BIG_NUMBER;UPDATE table_1 AUTO_INCREMENT=BIG_NUMBER;UPDATE table_2 AUTO_INCREMENT=BIG_NUMBER;You’ll notice that at first, all we are doing is dropping and re-adding foreign keys. This is because the production database I was working with didn’t have CASCADE enabled which is important for the next step. After we have the proper foreign key settings, we then make our desired update. In this case, we start by altering the columns with the auto_incremented unique IDs. Before, they had the row values of 1,2,3,4,etc. After this upgrade, they’ll be something like 10001, 10002, 10003, etc. Enabling CASCADE will ensure that the change is reflected in all tables that reference the column we alter. Finally, we want to change the AUTO_INCREMENT starting point. If we don’t the next row added to the database will start back at 1 instead of being something like 10053.Some Finer Points:Going back to the original problem, lets briefly look at the maintainability of these changes. For the niche scenario above, lets say you have 3 separate database installs and each table has exactly 1000 rows. You could leave the first database alone (let the unique ID’s be 1,2,3…1000) and then, in the 2nd database, use 1000 as your “BIG_NUMBER” so that the ID’s won’t conflict. (database two will have 1001,1002,1003…2000) We can then do the same for the 3rd database and have it start at 2000. Obviously, when dealing with multiple tables in a database, some with have more rows than others. You don’t really need to be exact but could have different “BIG_NUMBER"s for each table that meet your needs.If going with this scenario, you could skip the final step in the script and not change AUTO_INCREMENT. It is pretty smart and will fill in any gaps you leave. For my situation, I wasn’t going to be able to permanently merge the databases. We would just be doing periodic dumps into a central database. In this case, I set my “BIG_NUMBER"s to something large like 100000 and made SURE my AUTO_INCREMENTs matched. This will allow the databases to remain separate, however, it adds the risk that if any of your tables climb above that big number, you will have a conflict with your merged data. (I don’t have this issue with my implementation since 1) I picked big numbers large enough that we should never reach them. 2) If we did reach row counts that high, the tool would have other problems before then. 3) The database is being rearchitected anyway, so in a few months, this will be replaced with something better)Reflection/Lessons Learned:This tactic is only really useful if you have an old production database that you can’t just wipe away and start over from scratch with. The biggest lesson for me is than, when you know your are going to be building something more than a simple one-off tool, avoid using unique ID columns as a crutch where you can. By this I mean, using unique IDs as anything more than just to avoid conflicts within the table structure itself. And if you can’t, try to avoid using them as your means of querying the database or as the primary identifier for the end user. You never know what you might need to do to that data in the future and having a unique ID that is important and can’t be changed really locks down your options.

Edward Romano Written by:

I dabble in, and occasionally obsess over, technology and problems that bug me