Tuesday, March 3, 2015

MySQL Efficiently Generating Millions of Random Registration Codes

Problem:

While working with a company, I decided to look into how they were generating their registration codes which were used to allow users to pre-purchase access to their platform. The programming behind this was written in PHP and utilized MySQL to store the codes.

The database structure was set up to allow for each code to be unique. The PHP code would handle each code generated one-at-a-time. First, it would generate a random number-letter combination to a set length and then it would query the database to see if such a code existed. If the query returned no results, it would then insert the new code into the table. If a match existed, it would discard the generated code and try a new one.

As you can see generating many codes using this process would be highly inefficient. This meant that we would actually performs up to two queries for each code - so for 100,000 codes, we would likely be 200,000+ queries against our database.


Solution:

Without really changing too much, we can optimize this process to running maybe 5 queries for the same 100,000 codes (and prevent an unlikely race condition when this table grew super large). First we have to think about what MySQL can do that would really save us with mass queries. MySQL can actually process a large group of inserts all at one time. The problem is that we need to then consider the duplicate codes. To handle these, we simply ignore the duplicates using MySQL INSERT IGNORE INTO ... But we still need to generate all 100,000 codes still, right? No problem, we can ask mysql how many records actually got inserted into the database, then use our original number (100,000) and subtract the successful ones. This tells us how many we still need to generate and insert into the database. To handle this we wrap the process in a loop. It looks something like this:

<?php

/**
 * Generates codes and inserts them into DB, retrying any fails
 *
 * @param int $code_count
 */
 public function add_code_batch($code_count)
 {
    $pending_codes = $code_count;
    while ($pending_codes){
        $mass_insert = array();
        for($i = 0; $i < $pending_codes; $i++){
            $code = $this->generate_code(); // @TODO: Add this
            $mass_insert[] = "('{$code}')";
        }
        // try to mass insert them all
        $this->db->query("INSERT IGNORE INTO `codes` (code) VALUES ".implode(',', $mass_insert));
        $pending_codes = $code_count - $this->db->affected_rows();
    }

}

?>

Keep in mind the above code may need to be modified to fit your framework and code structure (like including batch IDs or additional information). As you can see, we have drastically improved from the previous situation.

I'm not saying that this can't be optimized more or that it is the best possible solution, but it is a step forward.

Happy coding.


Possible things to explore for better performance:

  • Generating codes through SQL
  • How intense the code generation process is
  • Cost of storing codes in memory between generation and transaction


No comments: