Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Challenge: Is this Common Issue? | Working with Dates and Times in pandas
Dealing with Dates and Times in Python

book
Challenge: Is this Common Issue?

In the previous chapter, we found out that issues with negative durations happened because of misusage of 12-h and 24-h formats. We printed the first 10 rows and saw that in all of these rides dropoff_calculated has the same minute and second (accurate to 1 second), but hours differ by 12.

Let's continue our investigation!

Tarea

Swipe to start coding

  1. Filter the observations in df dataframe to only with negative duration. Save it in df_neg variable.
  2. Iterate over rows of df_ned. If minute in dropoff_datetime and dropoff_calculated is not the same, you need to print this row.
  3. Within the same for loop count number of rows having an hour in dropoff_datetime greater or equal than 12.

Solución

# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset and creating duration column
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1 - filter to only rides with negative durations
df_neg = df[df["duration"] < timedelta(0)]

# Task 2 - iterate over df_neg rows to find inconsistencies
count = 0
for i, row in df_neg.iterrows():
# Compare minutes of dropoff_datetime and dropoff_calculated
if row["dropoff_datetime"].minute != row["dropoff_calculated"].minute:
# Print these two columns
print(row[["dropoff_datetime", "dropoff_calculated"]])
# Task 3 - count number of rows having hour greater-equal than 12
if row["dropoff_datetime"].hour >= 12:
count += 1

print(f"There are {count} rows in df_neg having hour greater-equal than 12.")

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 4. Capítulo 6
# Load libraries
import pandas as pd
from datetime import timedelta

# Loading dataset and creating duration column
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])

# Task 1 - filter to only rides with negative durations
df_neg = df[___["___"] < ___(___)]

# Task 2 - iterate over df_neg rows to find inconsistencies
count = 0
for i, row in df_neg.___():
# Compare minutes of dropoff_datetime and dropoff_calculated
if row["___"].___ != row["___"].minute:
# Print these two columns
print(___[["dropoff_datetime", "dropoff_calculated"]])
# Task 3 - count number of rows having hour greater-equal than 12
if row["___"].___ >= ___:
count ___

print(f"There are {count} rows in df_neg having hour greater-equal than 12.")

Pregunte a AI

expand
ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt