Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge 1: Probabilities and Distributions | Statistics
Data Science Interview Challenge

book
Challenge 1: Probabilities and Distributions

In the vast expanse of statistics, two foundational concepts reign supreme: probabilities and distributions. These twin pillars serve as the bedrock upon which much of statistical theory and application are built.

Probability is a measure of uncertainty. It quantifies the likelihood of an event or outcome occurring, always within the range of 0 to 1.

Distributions, on the other hand, provide a holistic view of all possible outcomes of a random variable and the associated probabilities of each outcome. They chart out the behavior of data, be it in the form of a series of coin tosses, heights of individuals in a population, or the time taken for a bus to arrive. Two primary categories of distributions exist:

  1. Discrete Distributions : These depict scenarios where the set of possible outcomes is distinct and finite. An example is the Binomial distribution, which could represent the number of heads obtained in a set number of coin tosses.

  2. Continuous Distributions : Here, the outcomes can take on any value within a given range. The Normal or Gaussian distribution is a classic example, representing data that clusters around a mean or central value.

Here's the dataset we'll be using in this chapter. Feel free to dive in and explore it before tackling the task.

import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = sns.load_dataset('tips')

# Sample of data
display(data.head())

# Visualize the distribution of 'total_bill'
sns.displot(data['total_bill'])
plt.title('Distribution of Total Bill')
plt.show()
12345678910111213
import matplotlib.pyplot as plt import seaborn as sns # Load the dataset data = sns.load_dataset('tips') # Sample of data display(data.head()) # Visualize the distribution of 'total_bill' sns.displot(data['total_bill']) plt.title('Distribution of Total Bill') plt.show()
copy
Завдання

Swipe to start coding

Using the Seaborn's tips dataset, you will:

  1. Extract key statistical metrics for the total_bill column to comprehend its central tendencies and spread.
  2. Use a Q-Q plot to visualize how the total_bill data conforms to a normal distribution.
  3. Utilize the Shapiro-Wilk test to statistically assess the normality of the total_bill distribution.
  4. Determine the probability that a randomly selected bill from the dataset is more than $20.

Рішення

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import shapiro, probplot, norm

# Load the dataset
data = sns.load_dataset('tips')

# 1. Compute descriptive statistics for the 'total_bill' column
print(data['total_bill'].describe())

# 2. Assess the normality of the data using a Q-Q plot
probplot(data['total_bill'], plot=plt)
plt.title('Q-Q Plot of Total Bill')
plt.show()

# 3. Assess the normality of the data using the Shapiro-Wilk test
stat, p = shapiro(data['total_bill'])
alpha = 0.05
if p > alpha:
print(f"The data appears to be normally distributed (p={p:.2f}).")
else:
print(f"The data does not appear to be normally distributed (p={p:.2f}).")

# 4. Compute the probability that a randomly chosen bill is more than $20
prob = len(data[data['total_bill'] > 20]) / len(data)
print(f"The probability that a randomly chosen bill is more than $20 is {prob:.2%}.")

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 6. Розділ 1
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import shapiro, probplot, norm

# Load the dataset
data = sns.load_dataset('tips')

# 1. Compute descriptive statistics for the 'total_bill' column
print(data['total_bill'].___())

# 2. Assess the normality of the data using a Q-Q plot
___(data['total_bill'], ___=plt)
plt.title('Q-Q Plot of Total Bill')
plt.show()

# 3. Assess the normality of the data using the Shapiro-Wilk test
stat, p = ___(data['total_bill'])
alpha = 0.05
if p ___ alpha:
print(f"The data appears to be normally distributed (p={p:.2f}).")
else:
print(f"The data does not appear to be normally distributed (p={p:.2f}).")

# 4. Compute the probability that a randomly chosen bill is more than $20
prob = ___
print(f"The probability that a randomly chosen bill is more than $20 is {prob:.2%}.")

Запитати АІ

expand
ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

some-alt