Pandas Demystified: Unveiling the Power of Data Manipulation
Let's start from the basics. What is exactly a DataFrame?
A Pandas DataFrame is a two-dimensional, size-mutable, tabular data structure with rows and columns. It is similar to a spreadsheet or an SQL table or the data.frame in R. A DataFrame is a collection of Series, which is a one-dimensional labeled array.
You can think of a DataFrame as a group of Series objects that share an index (the column names). For example:
The code above produces a
pandas Dataframe with exactly 3 columns and 3 rows.
Note that the first number for every row corresponds to the index. What if we need to access a cell in a specific position?
iloc() are two different ways to access rows and columns of a DataFrame. They are both attributes of the DataFrame object and allow you to access and manipulate the data in the DataFrame in various ways.
The main difference between loc and iloc is that loc uses label-based indexing, while iloc uses integer-based indexing.
There are many other ways to create a DataFrame, such as from a list of dictionaries, from a
numPy array, or by loading data from a file. Pandas is a very powerful library for working with tabular data in Python.
We have already spent so many words on this. Let's practice these concepts!
- Import the
- Create a new
pandasDataFrame from a dictionary;
- Inspect the type ;
- Print the
- Access an element in a specific location (2 rows, 2 columns - note that indexing starts from 0).
Everything was clear?