Cyclistic Bicycle Case Study

Introduction


As part of my Google Data Analytics Certificate, I completed a case study to represent all the new skills I learned throughout the course. In this case study, I was given a business task from a fictional company: Cyclistic, a bike-share company in Chicago.


The Scenario


The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Therefore, the team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, the marketing analyst team needs to first better understand how annual members and casual riders differ.


The Data


While the Cyclistic company is fictional, the data used has been made available by a real organization, Motivate International Inc., under this license. Therefore, I downloaded all of the 2020 trip data and used it to study trends. Besides the typical errors in the data, i.e. typos, misinformation, etc., the biggest issue with this public data set is data-privacy issues prohibit using riders' personally identifiable information. Meaning, I have no way to determine where the casual riders live (if they are locals or tourists), or how many times a specific person has used Cyclistic as a casual rider.

The Process

After importing and merging the data into Google's BigQuery console, I used SQL to clean, manipulate, and analyze the data.

Cleaning & Manipulation


It was important to import each dataset individually by month due to their large file size. Once imported into BigQuery, I used the UNION ALL clause to merge three data tables into one table for each quarter of the year. For example, the tables for April, May, and June were joined to create the second fiscal quarter. After the data was grouped by seasons of the year, I was ready to clean the data.


First, I created a “ride_length” column to calculate the length of each trip for each user. Calculated the length of each ride in seconds using the DATETIME_DIFF function.

Example: DATETIME_DIFF(ended_at, started_at, second) AS ride_length


This revealed that there were errors in the data. Ten percent of all trips came back with a negative length of time. The error was simply that the start time and end time were reversed. Therefore, I isolated the negative trips and deleted them from each quarter.

Example: DELETE FROM Q4_Negative_Trips WHERE ride_length <= 0


Ensured that all terminology was the same throughout the sheets and created synchronicity within the data. For example, changing subscriber to member and customer to casual. Next, I created another column to indicate the day of the week each trip takes place using the EXTRACT function.

Example: SELECT extract(DAYOFWEEK FROM started_at) AS day_of_week (1 indicates Sunday and 7 indicates Saturday)


Finally, I deleted extraneous columns throughout each quarter of data. Only ride_id, rideable_type, started_at, ended_at, ride_length, day_of_week, start_station_name, end_station_name, and member_casual remained.

Summary of Analysis


The SQL query:

SELECT

member_casual,

AVG(ride_length) AS avg_ride_length,

MIN(ride_length) AS min_ride_length,

MAX(ride_length) AS max_ride_length,

APPROX_TOP_COUNT(day_of_week, 1) [OFFSET (0)].value AS mode_day_of_week,

APPROX_TOP_COUNT(month, 1) [OFFSET (0)].value AS mode_month

FROM `eighth-anvil-337821.Bicycle_Case_Study.All_Trips_2020_Clean`

GROUP BY member_casual

Summary Analysis

Visualizations

Conclusion

From this analysis, it can be concluded that casual Cyclistic riders use the bikes most frequently during the summer months and weekends. However, while the casual riders use the bikes less often throughout the entire year, when they use the bikes the rides are longer than annual member rides. Annual members use the bikes much more throughout the week but for shorter trips.

Moving forward, it would be beneficial to retrieve more data on the casual and annual members. Specifically, where each rider calls home. From the data we have now, it is possible that annual members are locals from Chicago who use the bikes to commute and do daily errands; while the casual riders appear to be tourists probably from out of town. Therefore, the marketing team needs to target more local bike riders in the Chicago area to become annual members. Perhaps by creating a points/rewards system for annual members, this will attract more members. Points could be calculated depending on how many rides a biker takes per week or how many miles a rider bikes per week. Then the points can add up to special rewards that interest the bikers.

Bicycle Case Study