Creating A Lego Database
If you spend time around lego sets you will notice that there is a structure between sets, bricks, and themes that is an obvious fit for a relational database system. In this post we will provision a SQLite3 database with Python to hold set and theme information. This will expand on the Counting Lego Bricks post with the end goal of pumping lego data from the Rebrickable API into our local database.
SQLite3 is a fast and lightweight SQL database engine that lives on a hard drive. It does not have the overhead of traditional client-server database engines making it easy to use for small projects. We will interact with SQLite3 using the Python API included in the standard library. The efficiency of SQLite3 combined with the utility of Python provides a good solution for creating and maintaining a lego database.
Before we can start pumping lego data into the database we need to initialize a SQLite3 data store, create methods to operate on the database, and define the structure of tables that hold data. These operations are effectively SQL statements held in strings and passed to the SQLite3 engine via the Python API. The following actions for provisioning a database can be recorded in a Python module and reused as the pipeline grows:
- create connection
- define sets table
- define themes table
- create tables
- close connection
Installing SQLite3
Before we can provision the lego databse we need to install SQLite on a personal linux machine. Fortunately installing SQLite3 is a simple task.
First check if SQLite3 is installed by running which
in a terminal:
$ which sqlite3
/usr/bin/sqlite3
If the output returns a path to the SQLite3 binary proceed. If not run the following apt install
command:
$ sudo apt install sqlite3
Confirm that SQLite3 is installed with by checking the version
:
$ sqlite3 --version
3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1
Now we can begin building our SQLite3 Python module!
Creating a connection
One of the greatest advantages of SQLite3 is that the entire database is contained in one file. To initialize a database pass a file name with the .db
extension to sqlite3.connect()
. If the database with the passed name is not is not found a new database will be implictly generated. Calling sqlite3.connect()
returns a connection object that is used to run other actions against the database.
Becuase we want to be able to initialize a database and connect on demand the operation is wrapped in the connect_db()
function. This function takes a database string as an arguement, returns a database connection, and fails gracefully if something goes awry:
import sqlite3
def connect_db(db_file):
try:
conn = sqlite3.connect(db_file)
print("Connection Created")
except sqlite3.Error as e:
raise e
return conn
Define Themes Table
Now we need to define the structure of the lego themes table using the following SQL CREATE TABLE
statement:
create_themes_sql = """
CREATE TABLE IF NOT EXISTS themes (
theme_id TEXT PRIMARY KEY,
theme_name TEXT NOT NULL
);
"""
This statment checks first if the themes
table exists, creates one if it doesn’t, and then declares the constraints for two fields. The theme_id
is an text field and primary key representing the relational link to other tables in the database. The theme_name
is a text field that will only accept non null entries. In practical terms the theme_id
identifies each unique lego theme and the theme_name
gives the full descriptive name.
Define Sets Table
Now we will use a slightly more complicated CREATE TABLE
statement to define the structure of the lego sets table:
create_sets_sql = """
CREATE TABLE IF NOT EXISTS sets (
set_num TEXT PRIMARY KEY,
set_name TEXT NOT NULL,
set_year INT NOT NULL,
theme_id INT NOT NULL,
num_parts INT NOT NULL,
FOREIGN KEY (theme_id) REFERENCES themes (theme_id)
);
"""
Following the same pattern as above, this statement checks if the sets
table exists, creates if required, and then declares the constraints for five fields. The set_num
is a text field and primary key for the table. The set_name
is a non null text field. The set_year
, theme_id
, and num_parts
are all non null integer fields. The FOREIGN KEY
statement links the theme_id
in the sets
table to primary theme_id
key in the themes
table. Again in practical terms the set_num
identifies each lego set, the set_name
gives the official set name, the set_year
list the year the set was introduced, the num_parts
gives the number of bricks in the set. The theme_id
gives the identifier for the theme the set belongs to and can be used to reference theme data held in the themes
table.
Creating the Tables
Note that both of the above SQL statements are wrapped in triple quotes and assigned to the variables create_theme_sql
and create_sets_sql
. While they contain valid SQL statements, these variables are simply python strings. In order to create the tables the statements need to be passed to the database, evaluated by the SQLite3 engine, and executed. We will use the create_table()
function to carry out this tranasction for both the themes and sets tables:
def create_table(conn, table_sql):
try:
c = conn.cursor()
c.execute(table_sql)
print("Table Created")
except sqlite3.Error as e:
raise e
This function accepts a database connection and a CREATE TABLE
string as arguements. The cursor()
method is called on the connection to generate a cursor object. The cursor is the primary tool for executing SQL statements against a database and capturing the returned response if applicable. The CREATE TABLE
statement is now passed to the cursor’s execute()
method to be evaluated and the table is initialzed. This operation is wrapped in a try/except
block to responsibly handle errors.
Closing the connection
After intitializing the database by opening a connection and then creating tables we need to close the connection. Closing the database connection will explicitly commit the table creation transactions and protect the integrity of the database. It is always a good practice to close a SQLite3 database connection with the close()
method. In our case the connection closure is wrapped in the close_db()
function and includes a bit of error handling:
def close_db(conn):
try:
conn.close()
print("Connection Closed")
except sqlite3.Error as e:
raise e
Full Script
The database initialization and table creation should be performed in the following sequence:
- create connection
- define sets table
- define themes table
- create tables
- close connection
Because we chose to wrap the operations in functions the provisioning sequence can be organized in a main()
function at the end of the script. The main()
function can in turn be fired by running python -m lego-db.py
in a terminal. If all goes as planned print statements detailing each step will print in quick succession. The full script is included below, with the main()
function and corresponding __name__
check.
At this point we have provisioned the lego database and are ready to continue building the lego data pipeline.
# lego-db.py
import sqlite3
def connect_db(db_file):
try:
conn = sqlite3.connect(db_file)
print("Connection Created")
except sqlite3.Error as e:
raise e
return conn
def close_db(conn):
try:
conn.close()
print("Connection Closed")
except sqlite3.Error as e:
raise e
def create_table(conn, table_sql):
try:
c = conn.cursor()
c.execute(table_sql)
print("Table Created")
except sqlite3.Error as e:
raise e
def main():
db_file = "lego.db"
create_themes_sql = """
CREATE TABLE IF NOT EXISTS themes (
theme_id INT PRIMARY KEY,
theme_name TEXT NOT NULL
);
"""
create_sets_sql = """
CREATE TABLE IF NOT EXISTS sets (
set_num TEXT PRIMARY KEY,
set_name TEXT NOT NULL,
set_year INT NOT NULL,
theme_id INT NOT NULL,
num_parts INT NOT NULL,
FOREIGN KEY (theme_id) REFERENCES themes (theme_id)
);
"""
conn = connect_db(db_file)
create_table(conn, create_themes_sql)
create_table(conn, create_sets_sql)
close_db(conn)
if __name__ == "__main__":
main()