Database Best Practices
Importing
Importing
Basics of Importing
The following is a basic overview of the import process. Please see the subsequent example for commands that are to be used. Complete documentation for the LOAD DATA command to be run within MariaDB may be found at [https://mariadb.com/kb/en/library/load-data-infile/].
The general process is as follows:
- (1) Move your data to the appropriate import folder
- (2) Within MariaDB, create the database table that will hold the imported data
- (3) Within MariaDB, import the data
- (4) Validate the import
- (5) Remove your data from the import folder
In order to (2) create the database table, you will need to have an understanding of your data. You will need to know the name of all columns, as well as each column’s data type (integer, numeric with decimals, string of characters, etc) and each column’s maximum width. As an example, if one of the columns in your data is US phone numbers of the format 6174953292, then you may opt to use int(10). This tells us that all entries will be integers with up to 10 digits. However, if you suspect some entries have dashes such as 617-495-3292, then you will need to use char(12) which stores the data as a string of characters, up to 12 characters in length.
Please feel free to contact RCS at research@hbs.edu with any questions you may have regarding data import.
Example of Importing
We will use the following to illustrate importing:
- MariaDB username = jharvard
- MariaDB database = jharvard_database
- MariaDB database table = table_import
- Import filename = SampleData.txt
- Data:
Column_1 |
Column_2 |
Column_3 |
---|---|---|
25 |
Harvard Way |
Boston, MA |
- which as a file appears as:
- Line 1: Column_1|Column_2|Column_3
- Line 2: 25|Harvard Way|Boston, MA
-
- Before importing, we will need to create our database table. Creating a table includes specifying the maximum size of each column. We will keep this example simple by specifying each column be char and have a maximum length of 20. Please note that you can modify your table at a later time, for instance if you need to example a column from char(20) to char(30).
To begin, prepare the import folder as described above in the section Prevent Others From Accessing Your Data
mkdir /export/mdb_external/import/jharvard
chmod 700 /export/mdb_external/import/jharvard
Move your data to this import folder
mv SampleData.txt /export/mdb_external/import/jharvard
! Note that mv moves the file, as opposed to cp which copies the file!
cp SampleData.txt /export/mdb_external/import/jharvard
Log into MariaDB
mysql -h HOSTNAME -u jharvard -p
Within MariaDB, create table that will hold imported data
use jharvard_database;
create table table_import (Column_1 char(20), Column_2 char(20), Column_3 char(20));
Within MariaDB, import data
load data local infile ‘/export/mdb_external/import/jharvard/SampleData.txt’ into table table_import fields terminated by ‘|’ lines terminated by ‘\n’ ignore 1 lines;
Please note our command has 4 sections:
- (1) load data local infile ‘/export/mdb_external/import/jharvard/SampleData.txt’
- specify file to import
- (2) into table table_import
- specify table that will hold the imported data
- (3) fields terminated by ‘|’ lines terminated by ‘\n’
- specify delimiters (Click here for more information)
- (4) ignore 1 lines
- include this only if your file includes column header information
Official documentation for this command may be found at [https://mariadb.com/kb/en/mariadb/load-data-infile/].
We can check do a preliminary check on the first 10 rows of our data via
-
select * from table_import limit 10;
Log out of MariaDB
>exit;
Delete import folder
rm -rf /export/mdb_external/import/jharvard