Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Challenge: Average Metrics Across Taxi Types | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

book
Challenge: Average Metrics Across Taxi Types

Great! As for now, we have our dataset cleared from abnormally long rides and rides with ending time preceded starting. As we investigated, it happened because of misusage of 12 and 24-hour formats.

Let's try to find out some interesting insights from this dataset.

Taak

Swipe to start coding

  1. Apply .total_seconds() function to duration column using map and lambda functions.
  2. Group observations by taxi type (vendor_id column). Then, choose columns dist_meters, duration, and calculate mean. Then apply function avg_m to dist_meters and avg_dur to duration. The functions are defined in the code.

Oplossing

# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Defining functions
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")

# Task 1 - use total_seconds method to duration column
df['duration'] = df['duration'].map(lambda x: x.total_seconds())

# Task 2 - calculate average distance and duration across taxi types
print(df.groupby('vendor_id')[['dist_meters', 'duration']].mean().agg({'dist_meters': avg_m, 'duration': avg_dur}))

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 8
# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Defining functions
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")

# Task 1 - use total_seconds method to duration column
df['duration'] = df['___'].map(lambda x: x.___())

# Task 2 - calculate average distance and duration across taxi types
print(df.___('vendor_id')[['___', '___']].___().agg({'___': ___, 'duration': ___}))

Vraag AI

expand
ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

some-alt