Course Content
Pandas Demystified: Unveiling the Power of Data Manipulation
Pandas Demystified: Unveiling the Power of Data Manipulation
DataFrames
Let's start from the basics. What is exactly a DataFrame?
A Pandas DataFrame is a two-dimensional, size-mutable, tabular data structure with rows and columns. It is similar to a spreadsheet or an SQL table or the data.frame in R. A DataFrame is a collection of Series, which is a one-dimensional labeled array.
You can think of a DataFrame as a group of Series objects that share an index (the column names). For example:
Output:
The code above produces a pandas
Dataframe with exactly 3 columns and 3 rows.
Note that the first number for every row corresponds to the index. What if we need to access a cell in a specific position?
In pandas
, loc()
and iloc()
are two different ways to access rows and columns of a DataFrame. They are both attributes of the DataFrame object and allow you to access and manipulate the data in the DataFrame in various ways.
The main difference between loc and iloc is that loc uses label-based indexing, while iloc uses integer-based indexing.
There are many other ways to create a DataFrame, such as from a list of dictionaries, from a numPy
array, or by loading data from a file. Pandas is a very powerful library for working with tabular data in Python.
We have already spent so many words on this. Let's practice these concepts!
TaskCompleted
- Import the
pandas
library (aspd
); - Create a new
pandas
DataFrame from a dictionary; - Inspect the type ;
- Print the
df
; - Access an element in a specific location (2 rows, 2 columns - note that indexing starts from 0).
Everything was clear?