Challenge: Corrected Metrics Across Taxi Types
Average trip duration across different taxi types looks a bit strange. Every taxi type has an average trip duration greater than 1 hour (most of them even greater than 2 hours), while the average distance is less than 10 km. That's extremely slow!
Let's make some corrections and assume that not all noisy data were removed.
Tarea
Swipe to start coding
- Within the
print
function calculate the proportion of long trips (with aduration
at least of 3 hours). Remember, thatduration
column is measured in seconds. - Calculate average trip distance (
dist_meters
) and trip duration (duration
) across each taxi type (vendor_id
column) for trips with a duration less than 3 hours.
Solución
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Defining functions and converting columns
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
df['duration'] = df['duration'].map(lambda x: x.total_seconds())
# Task 1 - calculate proportion of long trips (hours >= 3)
prop = round(len(df[df['duration']//3600 >= 3])/len(df) * 100, 2)
print(f"There are {prop}% observations having trip duration greater-equal than 3 hours.")
# Task 2 - calculate the average stats for filtered dataset
print(df[df['duration']//3600 < 3].groupby('vendor_id').mean().agg({'dist_meters': avg_m, 'duration': avg_dur}))
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 4. Capítulo 9
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Defining functions and converting columns
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
df['duration'] = df['duration'].map(lambda x: x.total_seconds())
# Task 1 - calculate proportion of long trips (hours >= 3)
prop = round(len(df[df['___']__3600 ___ 3])/len(df) * 100, 2)
print(f"There are {prop}% observations having trip duration greater-equal than 3 hours.")
# Task 2 - calculate the average stats for filtered dataset
print(df[df['___']__3600 < 3].___('___').mean().___({'dist_meters': ___, '___': ___}))
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla