

I created an empty StackOverflow database, then fired up the Stack Overflow Data Dump Importer (SODDI), an open source tool that reads the XML data dump files and does batch inserts into a SQL Server database. (Don’t forget to revisit your SQL Server’s max memory, MAXDOP, and TempDB settings when you make changes like this.) To make the import run faster, I shut the VM down, then changed its instance type to the largest supported m4 – an M4 Deca Extra Large with 40 cores and 160GB RAM for $4.91/hour – and booted it back up. The top update date of Mais the current version you’ll get if you use the download links at the top right of the page. I downloaded the Stack Exchange data dump on that 2005 VM. It’s a little confusing because the page says it was uploaded on, but that’s just the first date the file was published. If you want 2014’s new cardinality estimator, you’ll need to set your compat level to 2014 after you attach the database. Keep in mind, though, that it attaches at a 2005 or similar compatibility level. (This is the same reason I don’t make the database smaller with table compression – that’s an Enterprise Edition feature, and not everybody can use that.) You can attach this database to a SQL 2005, 2008, 2008R2, 2012, or 2014 instance and it’s immediately usable.

I still use 2005 to create the dump because I want the widest possible number of folks to be able to use it. In our AWS lab, we have an m4.large (2 cores, 8GB RAM) VM with SQL Server 2005. We use that for testing behaviors – even though 2005 isn’t supported anymore, sometimes it’s helpful to hop in and see how things used to work. Here’s how I built the torrent: 128GB USB3 flash drives with the StackOverflow database that we use in our training classes
