Skip to Main Content
HBS Home
  • About
  • Academic Programs
  • Alumni
  • Faculty & Research
  • Baker Library
  • Giving
  • Harvard Business Review
  • Initiatives
  • News
  • Recruit
  • Map / Directions
Research Computing Services
  • Online Requests
  • FAQ
  • Blog
  • Contact Us
  • About Us
  • Faculty Projects
  • Training
  • Compute Cluster & Data Storage
  • Data Practices
  • Help
  • …→
  • Harvard Business School→
  • Research Computing Services→
  • Data Practices
    • Data Practices
    • Best Practices
    • Data Tips & Toolkits
    • Database Best Practices
    • Transferring Data
    →
  • Database Best Practices
    • Database Best Practices
    • Delimiters
    • Exporting
    • Import & Export of Text Files
    • Importing
    →

Data Practices

Data Practices

  • Best Practices
  • Data Tips & Toolkits
  • Database Best Practices
  • Transferring Data

Database Best Practices

Database Best Practices

  • Best Practices
  • Data Tips & Toolkits
  • Database Best Practices
    • Delimiters
    • Exporting
    • Import & Export of Text Files
    • Importing
  • Transferring Data
2ms
Please consider the following before you start building your MariaDB database.

What are your goals for the database?

Keep in mind for what purposes your database will be used. Once built, will the database be static or will someone be updating it? Note any potential security restrictions, including who will have access to the data and editing permissions.

Planning your database

Knowing your data well is a crucial first step. Ensure that you are familiar with datatypes, value ranges, missing data, row counts, and designated primary keys. Before importing, determine whether cleanup is required. If it is, save interim files as you clean in case later on you need to modify any changes. You may decide that the delimiter is appropriate for loading the data (see Delimiters). When you are designing your database, your table architecture should reflect a balance between the efficiency of relational data and the usability of your database for analysis.

Fully document the preparation and building of the database

Draft up a design plan. You may find it helpful to create an entity-relationship diagram. As you construct your database, keep track of the steps taken in cleaning and preparing raw data for importation. Column naming is also an important part of documentation. Names should be descriptive and usable as well as consistent. Avoid using spaces and quoted identifiers in names. Consider how the names will be used in queries.

Data definition language

When choosing the appropriate data types, pay special attention to the following:
  • type, length and sign of numeric columns
  • features of fixed or variable length character, text, and binary columns
  • row enumeration and auto-increment requirements
  • dates and time columns
  • text field usability
    • how will non-categorical text fields be used?
    • is it more efficient to create a separate (linked) table for large text fields so that text may be analyzed differently?
    • try to avoid string literals

Refer to the CREATE TABLE statement for additional details.

Loading data

During your data load, determine whether you need to turn off logging and make sure to analyze any warnings. It is highly recommended that you first do test loads of tables. These import tests are very useful for assessing the appropriateness of data type choices. They are also excellent timesavers, especially if the full tables will be large, and they may uncover any unforeseen data problems.

Analysis of your data using SQL

Implement index key columns to maximize efficiency during joins and sorts. It is suggested that you query your data and build analysis tables in your ‘personal’ database rather than in your research (production) database. Consider if it will save time to use temporary tables when doing complex joins. While your query is running, use SHOW EXPLAIN to fine-tune the SQL. Use SELECT * and ORDER BY only if necessary, as it is more efficient to sort in a front-end program. Data analysis can be run via SQL scripts in the client, on the command line, or through code written in R, Python, SAS, and other applications.

Resources

Database Building

  • MariaDB Knowledge Base
  • Conceptual Design and the Entity-Relationship Diagram
  • Table Statements
  • Data Definition

SQL

  • Basic SQL Statements
  • Getting Data from MariaDB
  • Useful MariaDB Queries
  • Joining Tables
  • Operators
  • Built-in Functions
  • Data Manipulation: SELECT, INSERT, UPDATE, DELETE
ǁ
Campus Map
Research Computing Services (RCS) 
Harvard Business School
Baker Library, B90, 25 Harvard Way
Boston, MA 02163
Phone: 617.495.6100
Email: research@hbs.edu
→Map & Directions
→More Contact Information
→Terms Of Service
  • Make a Gift
  • Site Map
  • Jobs
  • Harvard University
  • Trademarks
  • Policies
  • Accessibility
  • Digital Accessibility
Copyright © President & Fellows of Harvard College