Well, it looks like I have finally reached an end to managing massive DBF tables (500MB+), and MySQL provides the answer. Even if I do have to make a couple hacks to get it to work. For those of you unaware, this has been a plaguing problem of mine for a while now.
For starters, MySQL does not support DBF natively. There isn’t a way to import a dBase format table directly into MySQL. Which is unfortunate, provided the knowledge that DBF is the oldest database format to date, and used to be one of the most common. So, other than purchasing proprietary software that handles the import directly, I need another way.
MySQL handles importing comma-separated (*.csv) and tab-separated (*.tsv) files through the LOAD DATA INFILE command (technically speaking, any separation format is acceptable as long as it is consistent, and can be identified for recognizing each column). So, I can export the file into tab-separated, and import the data as follows:
LOAD DATA [LOCAL] INFILE '/home/aaron/somedata.tsv' INTO TABLE some_table FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'
[LOCAL] (without the brackets) is optional and only needed when administering a remote MySQL database. LOCAL would load the data from your client machine, otherwise look for it on the server. Also, MySQL looks at tab-separated data and Unix line feeds only by default, so FIELDS TERMINATED BY ‘\t’ and LINES TERMINATED BY ‘\n’ aren’t needed if that is the case. If you are working on a Windows machine, then your file could be saved in DOS format with carriage returns and line feeds, and which case LINES TERMINATED BY ‘\r\n’ would need to be used. Check the documentation for further info. Of course, the table structure would need to be built and defined before loading the data.
So, now I have my data loaded.
I should mention that loading data with 2 million rows and 250 fields (with each field containing ~30 characters of data and each row with ~200 of the fields populated on average) actually imported fairly quickly. My database server is a Pentium 4 2.8 GHz with 1GB DDR2 RAM 533Mhz FSB and a SATA 2 100GB HDD (I got an upgrade). It took about 10 minutes to export the DBF to *.tsv (we had to break up the table, or Paradox would not do it) and 2 minutes to import the data into MySQL. When I get the data I need, I can then export my result set to *.csv in less than a minute. Needless to say, much much MUCH faster than SQL Server 2000 (45-90 minutes to import the DBF directly and another 20-40 minutes to export it to *.csv not including the queries and populating data).
The great thing about MySQL is the sheer speed. This is important, because we get very busy at work, and I don’t have time to sit and wait. You may remember an earlier post (link above) where I mentioned that SQL Server 2000 took 3-4 hours from start to finish to get done what I need. Well, with MySQL the whole process from start to finish takes less than 1 hour. Exactly what I am looking for.
There is only one hiccup, and it is a fairly large one. When running some generic queries, like SELECT * FROM table, on a table that size, it can take anywhere from 20 minutes to over an hour, which is expected. If I am running the query on a server that resides on a Linux machine, it completely crashes X and locks up the machine and the query obviously fails, which isn’t expected. The only thing I can think of is allocating more disk space to the swap partition. Has anyone else noticed this issue, and how to fix it? It has to be a size issue. Other than that, MySQL rocks!
Now, here is the great thing. MySQL has a slew of GUI tools available for Windows, Linux, Mac, etc. to make this process as painless as possible. For example, rather than using the command line to execute my queries, there is the MySQL Query Browser for that very need. It can connect to multiple databases and servers (MySQL and others) simultaneously, it has great documentation build right in, and it is easy to use (see screenshot above). There is MySQL Administrator, MySQL Workbench (Windows only), MySQL Query browser and MySQL Migration Toolkit (Windows and Linux only) all available for download and completely open source.