Counting Lego Bricks
Lego is a big deal around the house and we maintain a running list of all sets acquired over the years. It is pretty easy to track the total number of sets in the collection (currently 62) by just counting up set numbers. From this knowledge a deeper a question was asked:
We know how many Lego sets we have, but how many bricks are in the collection?
With sets on display and large boxes of bricks counting by hand was not an option. Enter the Rebrickable Lego Database. This database contains data on every set, brick, and minifig released by Lego ever. The best part is that the database is accessible via an API. So with a bit of Python we can responsibly interact with the API and count the number of bricks in the collection.
Below is a simple workflow for requesting set data and counting bricks:
- Acquire a Rebrickable API token
- Set up a Rebrickable
GET
request - Request data for a list of sets
- Convert response into a
dataframe
- Count bricks and report results
Acquire API token
Rebrickable requires a unique token to access the Lego database API. A token can can be aquired by:
- Make a Rebrickable account here and login
- Navigate to your account settings and select API
- Copy your 32 character key to an
.env
file - Place the
.env
file in the same directory aslego_api.py
The .env
should have the following format:
KEY=<copy 32 character key here>
API keys are sensitive information and should be obscured. Using an .env
file will keep the key out of source code and secure access to the Rebrickable API.
Now the key can be accessed in a Python script as an environment variable using the os
and dotenv
modules:
# lego_api.py
import os
from dotenv import load_dotenv
# load api key
load_dotenv()
KEY = os.getenv("KEY")
# set headers
headers = {
"Accept": "application/json",
"Authorization": "key " + KEY
}
Once the key is loaded and assigned to KEY
it can be used to construct a request header required to to authenticate with the API. Documentation on dotenv
can be found at the package repository and os
can be referenced in the standard library.
Set up GET request
Now that the API key is set we can construct a GET
request with the requests
package to pull set data from Rebrickable. One of the quirks of the interface is that data can only be returned for one set at a time. So a unique GET
request needs to be made for each set in the collection.
A helper function is used to build the unique request for a given set and return the data:
# lego-api.py
import requests
# helper to run request
def get_sets(set_id):
url = f"https://rebrickable.com/api/v3/lego/sets/{set_id}"
req = requests.get(url=url, headers=headers)
time.sleep(1.1)
return req.json()
The set id is passed as an arguement and then formatted into url string representing the body of the request. Next the request is executed with the url string and authorization header. The response is saved to a variable. Now the function sleeps for 1.1 seconds to avoid abusing the API. The data is extracted from the response as json and returned.
In Python json is effectively treated as a dictionary. The returned set data has the following form:
{
"set_num": "31062-1",
"name": "Robo Explorer",
"year": 2017,
"theme_id": 672,
"num_parts": 205,
"set_img_url": "https://cdn.rebrickable.com/media/sets/31062-1/9935.jpg",
"set_url": "https://rebrickable.com/sets/31062-1/robo-explorer/",
"last_modified_dt": "2018-11-29T23:47:51.919445Z"
}
There are several bits of interesting information returned in the set data blob. The num_parts
key wil be directly used to answer our question.
Request set data
Now that the get_sets()
helper function is in place requests can be efficiently executed for every set number in the collection.
With the help of pandas
and a list comprehension set ids are loaded into a list and get_sets()
is executed for every set in the collection:
# lego-api.py
import pandas
# load set ids
set_ids = pandas.read_csv("./input_data/sets.csv")["set_num"].tolist()
# request data for each set
set_data = [get_sets(set_id) for set_id in set_ids]
The output assigned to set_data
is a list of dictionaries. Every dictionary has the same keys and values unique to the set.
Convert to dataframe and count bricks
Now the accumulated set_data
can be converted to a dataframe and the num_parts
column can be summed to obtain a total brick count:
# lego-api.py
# convert request data to df
set_df = pandas.DataFrame(set_data)
# count sets
sets = len(set_ids)
# count bricks
bricks = set_df["num_parts"].sum()
# report
print(f"Total bricks in {sets} set collection: {bricks}")
The result is:
>>> Total bricks in 62 set collection: 23653
That is a lot of bricks.
The full script is given below. This is not the most rigourous approach, but it gets the job done. If this was production grade code the request response codes would be checked, errors would be handled, and there would be unit tests.
The next question is what to do with the blob of data acquired from the Rebrickbable API. One option is dropping set data for the collection into a SQLite
database. From that point it is possible to set up a pipeline to request set data and insert into the database for each new set added to the collection.
That may be a post for another time.
Full script
# lego-api.py
import os
import requests
import pandas
from dotenv import load_dotenv
# load api key
load_dotenv()
KEY = os.getenv("KEY")
# set headers
headers = {
"Accept": "application/json",
"Authorization": "key " + KEY
}
# helper to run request
def get_sets(set_id):
url = f"https://rebrickable.com/api/v3/lego/sets/{set_id}"
req = requests.get(url=url, headers=headers)
time.sleep(1.1)
return req.json()
# load set ids
set_ids = pandas.read_csv("./input_data/sets.csv")["set_num"].tolist()
# request data for each set
set_data = [get_sets(set_id) for set_id in set_ids]
# convert request data to df
set_df = pandas.DataFrame(set_data)
# count sets
sets = len(set_ids)
# count bricks
bricks = set_df["num_parts"].sum()
# report
print(f"Total bricks in {sets} set collection: {bricks}")