Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Challenge: Fixing the Issues | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

book
Challenge: Fixing the Issues

Well, in the last chapter you saw, that there were only two rides with negative durations where minutes in both columns were different. But if paid your attention to seconds, you might notice, that that were the minute ending and starting (59 seconds, and 00 respectively). It means that all the inconsistencies can be interpreted as misuages of 12 and 24-hour formats.

Since we have investigated the real reason for the issue, we can now fix it! Let me remind you of one of the ways to replace values in dataframe based on some condition - .where function.

df['col_name'].where(~(condition), inplace = True, other = values_to_replace)
1
df['col_name'].where(~(condition), inplace = True, other = values_to_replace)
copy

Using the following approach all the values in col_name will be replaced with values_to_replace if (condition) is True.

Compito

Swipe to start coding

  1. For all the trips with negative duration add 12 hours to dropoff_datetime column.
  2. Calculate column duration again.
  3. Print first 5 rows of updated df.

Soluzione

# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column, and filtering to negative durations
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1: add 12 hours to dropoff duration for negative durations
df['dropoff_datetime'].where(~(df.duration < timedelta(0)), other = df.dropoff_datetime + timedelta(hours = 12), inplace = True)

# Task 2: recalculate duration column
df['duration'] = df['dropoff_datetime'] - df['pickup_datetime']

# Task 3: inspect first 10 rows with negative duration
print(df[df['duration'] < timedelta(0)][["pickup_datetime", "dropoff_datetime", "trip_duration", "dropoff_calculated"]].head())

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 7
# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column, and filtering to negative durations
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1: add 12 hours to dropoff duration for negative durations
df['___'].___(~(___ < timedelta(0)), other = ___ + timedelta(___ = ___), inplace = True)

# Task 2: recalculate duration column
df['duration'] = df['___'] - df['___']

# Task 3: inspect first 10 rows with negative duration
print(df[df['duration'] < timedelta(0)][["pickup_datetime", "dropoff_datetime", "trip_duration", "dropoff_calculated"]].___())

Chieda ad AI

expand
ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

some-alt