Challenge: Average Metrics Across Taxi Types
Great! As for now, we have our dataset cleared from abnormally long rides and rides with ending time preceded starting. As we investigated, it happened because of misusage of 12 and 24-hour formats.
Let's try to find out some interesting insights from this dataset.
Opgave
Swipe to start coding
- Apply
.total_seconds()
function toduration
column usingmap
andlambda
functions. - Group observations by taxi type (
vendor_id
column). Then, choose columnsdist_meters
,duration
, and calculate mean. Then apply functionavg_m
todist_meters
andavg_dur
toduration
. The functions are defined in the code.
Løsning
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Defining functions
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
# Task 1 - use total_seconds method to duration column
df['duration'] = df['duration'].map(lambda x: x.total_seconds())
# Task 2 - calculate average distance and duration across taxi types
print(df.groupby('vendor_id')[['dist_meters', 'duration']].mean().agg({'dist_meters': avg_m, 'duration': avg_dur}))
Var alt klart?
Tak for dine kommentarer!
Sektion 4. Kapitel 8
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Defining functions
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
# Task 1 - use total_seconds method to duration column
df['duration'] = df['___'].map(lambda x: x.___())
# Task 2 - calculate average distance and duration across taxi types
print(df.___('vendor_id')[['___', '___']].___().agg({'___': ___, 'duration': ___}))
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat