Exploring London's Travel Network – SQL Project (DataCamp Practice)

Summary:


This project was completed as a practice exercise through DataCamp's virtual environment. The project explores patterns in London’s public transport usage using SQL queries on journey data from Transport for London (TfL). Using SQL operations such as SUM(), ROUND(), GROUP BY, and ORDER BY, we analyzed journey volume trends across multiple transport types.

🔍 Project Overview:



The goal was to explore:

  • Which transport types were most popular overall.
  • Which months were the most used for the Emirates Airline.
  • Which years had the lowest usage of Underground & DLR combined.

Photo by bedneyimages on Freepik

Table Overview: TFL.JOURNEYS

ColumnDescriptionData Type
MONTHMonth in number format, e.g., 1 equals JanuaryINTEGER
YEARYearINTEGER
DAYSNumber of days in the given monthINTEGER
REPORT_DATEDate that the data was reportedDATE
JOURNEY_TYPEMethod of transport usedVARCHAR
JOURNEYS_MILLIONSMillions of journeys, measured in decimalsFLOAT

SQL Queries:


                        -- most_popular_transport_types
                        SELECT JOURNEY_TYPE,
                            SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
                        FROM TFL.JOURNEYS
                        GROUP BY JOURNEY_TYPE
                        ORDER BY TOTAL_JOURNEYS_MILLIONS DESC;
                    

This query calculates the total number of journeys made by each transport type by summing up the JOURNEYS_MILLIONS column. Results are grouped by JOURNEY_TYPE and sorted in descending order to identify the most popular transport modes in London.


                        -- emirates_airline_popularity
                        SELECT MONTH,
                            YEAR,
                            ROUND(SUM(JOURNEYS_MILLIONS), 2) AS ROUNDED_JOURNEYS_MILLIONS
                        FROM TFL.JOURNEYS
                        WHERE JOURNEY_TYPE = 'Emirates Airline' AND JOURNEYS_MILLIONS IS NOT NULL
                        GROUP BY MONTH, YEAR
                        ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
                        LIMIT 5;
                    

This query retrieves the top 5 month-year combinations where the Emirates Airline had the highest number of journeys. The query uses ROUND() to format the values and GROUP BY on MONTH and YEAR.


                        -- least_popular_years_tube
                        SELECT YEAR,
                            JOURNEY_TYPE,
                            SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
                        FROM TFL.JOURNEYS
                        WHERE JOURNEY_TYPE LIKE '%Underground%' OR JOURNEY_TYPE LIKE '%DLR%'
                        AND JOURNEYS_MILLIONS IS NOT NULL
                        GROUP BY YEAR, JOURNEY_TYPE
                        ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
                        LIMIT 5;
                    

This query identifies the five lowest-performing years in terms of journey volume for transport types containing Underground or DLR. SUM() is used to aggregate journey volume by year and type.

Key Finding:

  • London's buses remain the most widely used mode of transport overall.
  • Emirates Airline experienced peaks in journey numbers in the warmer months, suggesting tourist traffic.
  • Underground & DLR usage saw the lowest volumes in early 2000s and during pandemic-related disruptions.

Thanks to DataCamp for providing the dataset and learning environment for this SQL project.