Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Challenge: Fixing the Issues | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

book
Challenge: Fixing the Issues

Well, in the last chapter you saw, that there were only two rides with negative durations where minutes in both columns were different. But if paid your attention to seconds, you might notice, that that were the minute ending and starting (59 seconds, and 00 respectively). It means that all the inconsistencies can be interpreted as misuages of 12 and 24-hour formats.

Since we have investigated the real reason for the issue, we can now fix it! Let me remind you of one of the ways to replace values in dataframe based on some condition - .where function.

df['col_name'].where(~(condition), inplace = True, other = values_to_replace)
1
df['col_name'].where(~(condition), inplace = True, other = values_to_replace)
copy

Using the following approach all the values in col_name will be replaced with values_to_replace if (condition) is True.

Uppgift

Swipe to start coding

  1. For all the trips with negative duration add 12 hours to dropoff_datetime column.
  2. Calculate column duration again.
  3. Print first 5 rows of updated df.

Lösning

# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column, and filtering to negative durations
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1: add 12 hours to dropoff duration for negative durations
df['dropoff_datetime'].where(~(df.duration < timedelta(0)), other = df.dropoff_datetime + timedelta(hours = 12), inplace = True)

# Task 2: recalculate duration column
df['duration'] = df['dropoff_datetime'] - df['pickup_datetime']

# Task 3: inspect first 10 rows with negative duration
print(df[df['duration'] < timedelta(0)][["pickup_datetime", "dropoff_datetime", "trip_duration", "dropoff_calculated"]].head())

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 7
# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset, creating duration column, and filtering to negative durations
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1: add 12 hours to dropoff duration for negative durations
df['___'].___(~(___ < timedelta(0)), other = ___ + timedelta(___ = ___), inplace = True)

# Task 2: recalculate duration column
df['duration'] = df['___'] - df['___']

# Task 3: inspect first 10 rows with negative duration
print(df[df['duration'] < timedelta(0)][["pickup_datetime", "dropoff_datetime", "trip_duration", "dropoff_calculated"]].___())

Fråga AI

expand
ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt