This project was completed as a practice exercise through DataCamp's virtual environment. The project explores patterns in London’s public transport usage using SQL queries on journey data from Transport for London (TfL).
Using SQL operations such as SUM()
, ROUND()
, GROUP BY
, and ORDER BY
, we analyzed journey volume trends across multiple transport types.
The dataset was sourced from DataCamp and reflects public transport journey data maintained by Transport for London (TfL). This includes usage statistics for various methods of transport such as the London Underground, Emirates Airline, and DLR (Docklands Light Railway).
The goal was to explore:
transport types
were most popular overall.Emirates Airline
.lowest usage
of Underground & DLR combined.Photo by bedneyimages on Freepik
TFL.JOURNEYS
Column | Description | Data Type |
---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
-- most_popular_transport_types
SELECT JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS DESC;
This query calculates the total number of journeys made by each transport type by summing up the JOURNEYS_MILLIONS
column.
Results are grouped by JOURNEY_TYPE
and sorted in descending order to identify the most popular transport modes in London.
-- emirates_airline_popularity
SELECT MONTH,
YEAR,
ROUND(SUM(JOURNEYS_MILLIONS), 2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Emirates Airline' AND JOURNEYS_MILLIONS IS NOT NULL
GROUP BY MONTH, YEAR
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5;
This query retrieves the top 5 month-year combinations where the Emirates Airline had the highest number of journeys.
The query uses ROUND()
to format the values and GROUP BY
on MONTH
and YEAR
.
-- least_popular_years_tube
SELECT YEAR,
JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE LIKE '%Underground%' OR JOURNEY_TYPE LIKE '%DLR%'
AND JOURNEYS_MILLIONS IS NOT NULL
GROUP BY YEAR, JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
LIMIT 5;
This query identifies the five lowest-performing years in terms of journey volume for transport types containing Underground
or DLR
.
SUM()
is used to aggregate journey volume by year and type.
Thanks to DataCamp for providing the dataset and learning environment for this SQL project.