Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Setting Up a SparkSession | Section
Introduction to PySpark

Setting Up a SparkSession

Свайпніть щоб показати меню

Every PySpark application starts with a SparkSession. It is the single entry point for reading data, running SQL, and configuring Spark behavior. Before you can work with any DataFrame or RDD, you need one.

Creating a SparkSession

1234567891011121314
from pyspark.sql import SparkSession import urllib.request # Downloading the dataset urllib.request.urlretrieve( "https://content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("FlightsAnalysis") \ .master("local[*]") \ .config("spark.sql.shuffle.partitions", "4") \ .getOrCreate()

Each configuration option:

  • appName: a human-readable name shown in the Spark UI and logs;
  • master("local[*]"): run locally using all CPU cores. On a real cluster this would be a cluster URL;
  • config("spark.sql.shuffle.partitions", "4"): reduces the default 200 shuffle partitions to 4, which is more appropriate for small local datasets;
  • getOrCreate(): returns an existing session if one is already running, or creates a new one.

Loading the Flights Dataset

Once the session is created, you can load data immediately:

123456789101112
# Loading the flights dataset flights_df = spark.read.csv( "flights.csv", header=True, inferSchema=True ) # Printing the schema to verify column types flights_df.printSchema() # Previewing the first 5 rows flights_df.show(5)

inferSchema=True tells Spark to scan the file and detect column types automatically. For large files this adds a pass over the data – if performance matters, define the schema explicitly.

Stopping the Session

When you are done, release resources:

1
spark.stop()

In a notebook environment you typically leave the session running across cells. In a standalone script, always stop it at the end.

question mark

What does getOrCreate do?

Виберіть правильну відповідь

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 1. Розділ 3
some-alt