Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Setting Up a SparkSession | Section
Introduction to PySpark

Setting Up a SparkSession

メニューを表示するにはスワイプしてください

Every PySpark application starts with a SparkSession. It is the single entry point for reading data, running SQL, and configuring Spark behavior. Before you can work with any DataFrame or RDD, you need one.

Creating a SparkSession

1234567891011121314
from pyspark.sql import SparkSession import urllib.request # Downloading the dataset urllib.request.urlretrieve( "https://content-media-cdn.codefinity.com/courses/aa80ac56-0d50-49e8-9231-2c2374cd3e9d/flights.csv", "flights.csv" ) spark = SparkSession.builder \ .appName("FlightsAnalysis") \ .master("local[*]") \ .config("spark.sql.shuffle.partitions", "4") \ .getOrCreate()

Each configuration option:

  • appName: a human-readable name shown in the Spark UI and logs;
  • master("local[*]"): run locally using all CPU cores. On a real cluster this would be a cluster URL;
  • config("spark.sql.shuffle.partitions", "4"): reduces the default 200 shuffle partitions to 4, which is more appropriate for small local datasets;
  • getOrCreate(): returns an existing session if one is already running, or creates a new one.

Loading the Flights Dataset

Once the session is created, you can load data immediately:

123456789101112
# Loading the flights dataset flights_df = spark.read.csv( "flights.csv", header=True, inferSchema=True ) # Printing the schema to verify column types flights_df.printSchema() # Previewing the first 5 rows flights_df.show(5)

inferSchema=True tells Spark to scan the file and detect column types automatically. For large files this adds a pass over the data – if performance matters, define the schema explicitly.

Stopping the Session

When you are done, release resources:

1
spark.stop()

In a notebook environment you typically leave the session running across cells. In a standalone script, always stop it at the end.

question mark

What does getOrCreate do?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  3

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  3
some-alt