Merging Data
Swipe to show menu
Merging data from multiple sources is a common task in data wrangling, especially when you need to enrich your primary dataset with additional insights. In this chapter, you'll learn how to join games_df with spy_insights_df using the app_id column as the key. Polars provides flexible and efficient join operations, making it straightforward to combine datasets while controlling how unmatched rows are handled. The two most common join types you'll use are the left join and the inner join.
A left join returns all rows from the left DataFrame (games_df) and adds matching rows from the right DataFrame (spy_insights_df). If there is no match, the right-side columns will be filled with null values. An inner join returns only rows where there is a match in both DataFrames, discarding any rows from either DataFrame that do not have a corresponding app_id in the other.
1234567891011121314151617181920212223import polars as pl # Sample games_df games_df = pl.DataFrame({ "app_id": [1, 2, 3, 4], "game_name": ["Space Quest", "Jungle Run", "Mystery Manor", "Puzzle Island"] }) # Sample spy_insights_df spy_insights_df = pl.DataFrame({ "app_id": [2, 3, 5], "insight": ["High engagement", "Trending", "Low installs"] }) # Left join: all rows from games_df, matched data from spy_insights_df left_joined = games_df.join(spy_insights_df, on="app_id", how="left") print("Left Join Result:") print(left_joined) # Inner join: only rows with matching app_id in both DataFrames inner_joined = games_df.join(spy_insights_df, on="app_id", how="inner") print("\nInner Join Result:") print(inner_joined)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat