Challenge: Is this Common Issue?
In the previous chapter, we found out that issues with negative durations happened because of misusage of 12-h and 24-h formats. We printed the first 10 rows and saw that in all of these rides dropoff_calculated
has the same minute and second (accurate to 1 second), but hours differ by 12.
Let's continue our investigation!
Tarea
Swipe to start coding
- Filter the observations in
df
dataframe to only with negativeduration
. Save it indf_neg
variable. - Iterate over rows of
df_ned
. If minute indropoff_datetime
anddropoff_calculated
is not the same, you need to print this row. - Within the same
for
loop count number of rows having an hour indropoff_datetime
greater or equal than 12.
Solución
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset and creating duration column
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Task 1 - filter to only rides with negative durations
df_neg = df[df["duration"] < timedelta(0)]
# Task 2 - iterate over df_neg rows to find inconsistencies
count = 0
for i, row in df_neg.iterrows():
# Compare minutes of dropoff_datetime and dropoff_calculated
if row["dropoff_datetime"].minute != row["dropoff_calculated"].minute:
# Print these two columns
print(row[["dropoff_datetime", "dropoff_calculated"]])
# Task 3 - count number of rows having hour greater-equal than 12
if row["dropoff_datetime"].hour >= 12:
count += 1
print(f"There are {count} rows in df_neg having hour greater-equal than 12.")
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 4. Capítulo 6
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Load libraries
import pandas as pd
from datetime import timedelta
# Loading dataset and creating duration column
url = 'https://drive.google.com/uc?id=1YV5bKobzYxVAWyB7VlxNH6dmfP4tHBui'
df = pd.read_csv(url, parse_dates = ['pickup_datetime', 'dropoff_datetime', 'dropoff_calculated'])
df["duration"] = pd.to_timedelta(df["duration"])
# Task 1 - filter to only rides with negative durations
df_neg = df[___["___"] < ___(___)]
# Task 2 - iterate over df_neg rows to find inconsistencies
count = 0
for i, row in df_neg.___():
# Compare minutes of dropoff_datetime and dropoff_calculated
if row["___"].___ != row["___"].minute:
# Print these two columns
print(___[["dropoff_datetime", "dropoff_calculated"]])
# Task 3 - count number of rows having hour greater-equal than 12
if row["___"].___ >= ___:
count ___
print(f"There are {count} rows in df_neg having hour greater-equal than 12.")
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla