Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Challenge: Corrected Metrics Across Taxi Types | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

book
Challenge: Corrected Metrics Across Taxi Types

Average trip duration across different taxi types looks a bit strange. Every taxi type has an average trip duration greater than 1 hour (most of them even greater than 2 hours), while the average distance is less than 10 km. That's extremely slow!

Let's make some corrections and assume that not all noisy data were removed.

Tarea

Swipe to start coding

  1. Within the print function calculate the proportion of long trips (with a duration at least of 3 hours). Remember, that duration column is measured in seconds.
  2. Calculate average trip distance (dist_meters) and trip duration (duration) across each taxi type (vendor_id column) for trips with a duration less than 3 hours.

Solución

# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Defining functions and converting columns
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
df['duration'] = df['duration'].map(lambda x: x.total_seconds())

# Task 1 - calculate proportion of long trips (hours >= 3)
prop = round(len(df[df['duration']//3600 >= 3])/len(df) * 100, 2)
print(f"There are {prop}% observations having trip duration greater-equal than 3 hours.")

# Task 2 - calculate the average stats for filtered dataset
print(df[df['duration']//3600 < 3].groupby('vendor_id').mean().agg({'dist_meters': avg_m, 'duration': avg_dur}))

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 4. Capítulo 9
# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column
url = 'https://drive.google.com/uc?id=1pQCA5C4Yvm86rjUIneefI31LNfoywtrU'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Defining functions and converting columns
avg_m = lambda x: str(round(x/1000, 2)) + ' km'
avg_dur = lambda x: pd.to_timedelta(round(x, 0), unit = "S")
df['duration'] = df['duration'].map(lambda x: x.total_seconds())

# Task 1 - calculate proportion of long trips (hours >= 3)
prop = round(len(df[df['___']__3600 ___ 3])/len(df) * 100, 2)
print(f"There are {prop}% observations having trip duration greater-equal than 3 hours.")

# Task 2 - calculate the average stats for filtered dataset
print(df[df['___']__3600 < 3].___('___').mean().___({'dist_meters': ___, '___': ___}))

Pregunte a AI

expand
ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt