Uber Technologies, Inc., commonly known as Uber, is an American multinational ride-hailing company offering services that include peer-to-peer ridesharing, ride service hailing, food delivery (Uber Eats), and a micromobility system with electric bikes and scooters. The company is based in San Francisco and has operations in over 785 metropolitan areas worldwide. Its platforms can be accessed via its websites and mobile apps.
There are more than 75 million active Uber riders accross the world. Uber is available in more than 80 countries worldwide. Uber has completed more than 5 billion rides till now. Over 3 million people drive for Uber. In the United States, Uber fulfills 40 million rides per month. The average Uber driver earns $364 per month.
This project is more of data visualization that will guide you towards using the ggplot library for understanding the data. It is built for making profressional looking, plots quickly with minimal code.
We will be working on Uber drives dataset. The dataset is available on Kaggle or you can even download the dataset utilized in this project from here: UBER Dataset. This dataset contains 7 columns and 1156 entries.
We can perform our project analysis in four steps.
Let us start our project.
Import relevant libraries:
import pandas as pd
import numpy as np
import datetime
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
matplotlib.style.use('ggplot')
import calendar
Read the dataset and store it in a variable
dataset=pd.read_csv('Uber_data.csv')
Let's see the first few elements of the dataset:
dataset.head()
Let us now see the last few elements of the dataset:
dataset.tail()
The concise summary of our DataFrame is:
dataset.info() #Nearly 1156 entries, ~63.3+ KB
Removing the unecessary data:
dataset=dataset[:-1]
Show all the null values per column:
dataset.isnull().sum()
Representing these null values in a heatmap:
sns.heatmap(dataset.isnull(),yticklabels=False,cmap="viridis")
Drop/Remove the null values from the dataset and represent it in a heatmap:
dataset=dataset.dropna()
sns.heatmap(dataset.isnull(),yticklabels=False,cmap="viridis")
Assembling a datetime from multiple columns of a DataFrame:
dataset['START_DATE*'] = pd.to_datetime(dataset['START_DATE*'], format="%m/%d/%Y %H:%M")
dataset['END_DATE*'] = pd.to_datetime(dataset['END_DATE*'], format="%m/%d/%Y %H:%M")
Getting an hour, day, days of week and a month from the date of trip:
hour=[]
day=[]
dayofweek=[]
month=[]
weekday=[]
for x in dataset['START_DATE*']:
hour.append(x.hour)
day.append(x.day)
dayofweek.append(x.dayofweek)
month.append(x.month)
weekday.append(calendar.day_name[dayofweek[-1]])
dataset['HOUR']=hour
dataset['DAY']=day
dataset['DAY_OF_WEEK']=dayofweek
dataset['MONTH']=month
dataset['WEEKDAY']=weekday
Find the travelling time using the below method:
time=[]
dataset['TRAVEL_TIME']=dataset['END_DATE*']-dataset['START_DATE*']
for i in dataset['TRAVEL_TIME']:
time.append(i.seconds/60)
dataset['TRAVEL_TIME']=time
dataset.head()
Calculate the average speed:
dataset['TRAVEL_TIME']=dataset['TRAVEL_TIME']/60
dataset['SPEED']=dataset['MILES*']/dataset['TRAVEL_TIME']
dataset.head()
Describe the dataset:
dataset.describe()
Fix the data types of data columns:
def convert_time(column_name):
y=[]
for x in dataset[column_name]:
y.append(datetime.datetime.strptime('1/1/2016 21:11', "%m/%d/%Y %H:%M"))
dataset[column_name] = y
column_date=dataset[['START_DATE*','END_DATE*']]
for x in column_date:
convert_time(x)
Plot the different categories of the trip. Here we observe from the resulting visualization that most trips are made for business category when compared to personal category.
x = dataset['CATEGORY*'].value_counts().plot(kind='bar')
Histogram for miles travelled:
dataset['MILES*'].plot.hist()
Plot the purpose of the trip. We observe from the resulting graph that mostly the purpose were for meeting.
dataset['PURPOSE*'].value_counts().plot(kind='bar',figsize=(10,5),color='blue')
Plot the trips per hours in a day:
dataset['HOUR'].value_counts().plot(kind='bar',figsize=(10,5),color='green')
Plot the trips per day of the week. Most of the trips are on Friday.
dataset['WEEKDAY'].value_counts().plot(kind='bar',color='green')
Plot the trips per day in a month:
dataset['DAY'].value_counts().plot(kind='bar',figsize=(15,5),color='green')
Plot the trips carried out in a month:
dataset['MONTH'].value_counts().plot(kind='bar',figsize=(10,5),color='green')
Ploting the starting point of the trip. Most of them are hiring from Cary.
dataset['START*'].value_counts().plot(kind='bar',figsize=(25,5),color='red')
Comparing the overall purpose with miles, hour, day of week, month, travel time and speed.
dataset.groupby('PURPOSE*').mean().plot(kind='bar',figsize=(15,5))
In this project, I have aimed to expose all the interesting insights that can be derived from a detailed analysis of the dataset. The aim of this project was to visualize Uber's ridership growth by ploting them. There are different types of graphical representations used. They are, histogram, bar plot and heatmap. We have made use of packages like ggplot that allowed us to plot various types of data visualization.