Course Content
Pandas First Steps
Pandas First Steps
CSV Files
Since pandas
is the go-to library for data analysis and manipulation, one of its key features is its ability to read and write various file types, including CSV files.
A CSV (Comma-Separated Values) file is a plain text file used to store tabular data, where each row represents a record, and columns are separated by commas.
A CSV file can contain the following data:
- Numbers: integer or decimal values (e.g.,
42
,3.14
); - Text: strings or categorical data (e.g.,
John
,Active
); - Dates/Times: timestamps (e.g.,
2023-12-30
); - Booleans: logical values (
True
,False
).
Each row must have the same number of columns, and the first row often contains column headers.
Functions like read_csv()
and to_csv()
come in handy for dealing with CSV data.
The basic syntax of read_csv()
and key parameters are as follows:
filepath_or_buffer
: path to the CSV file (string or URL);sep
: delimiter (default is a comma,
);header
: row number to use as the column headers (default is the first row);names
: List of column names to use;usecols
: olumns to read (subset of columns).
# Loading the CSV into a `DataFrame` import pandas as pd salary_data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a43d24b6-df61-4e11-9c90-5b36552b3437/Salary+Dataset.csv') print(salary_data)
Note
Make sure that the dataset link is wrapped in quotation marks.
The basic syntax of to_csv()
and key parameters are as follows:
path_or_buf
: file path or object where the CSV should be written;sep
: delimiter for separating values (default is a comma,
);columns
: subset of columns to write (default is all columns);header
: whether to include column names as the header (default isTrue
);index
: whether to write row indices to the file (default isTrue
).
import pandas as pd countries_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(countries_data) countries.to_csv('countries.csv') print('Done')
Swipe to show code editor
- Read the CSV file into a DataFrame.
- Display the contents on your screen.
Solution
Thanks for your feedback!
CSV Files
Since pandas
is the go-to library for data analysis and manipulation, one of its key features is its ability to read and write various file types, including CSV files.
A CSV (Comma-Separated Values) file is a plain text file used to store tabular data, where each row represents a record, and columns are separated by commas.
A CSV file can contain the following data:
- Numbers: integer or decimal values (e.g.,
42
,3.14
); - Text: strings or categorical data (e.g.,
John
,Active
); - Dates/Times: timestamps (e.g.,
2023-12-30
); - Booleans: logical values (
True
,False
).
Each row must have the same number of columns, and the first row often contains column headers.
Functions like read_csv()
and to_csv()
come in handy for dealing with CSV data.
The basic syntax of read_csv()
and key parameters are as follows:
filepath_or_buffer
: path to the CSV file (string or URL);sep
: delimiter (default is a comma,
);header
: row number to use as the column headers (default is the first row);names
: List of column names to use;usecols
: olumns to read (subset of columns).
# Loading the CSV into a `DataFrame` import pandas as pd salary_data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a43d24b6-df61-4e11-9c90-5b36552b3437/Salary+Dataset.csv') print(salary_data)
Note
Make sure that the dataset link is wrapped in quotation marks.
The basic syntax of to_csv()
and key parameters are as follows:
path_or_buf
: file path or object where the CSV should be written;sep
: delimiter for separating values (default is a comma,
);columns
: subset of columns to write (default is all columns);header
: whether to include column names as the header (default isTrue
);index
: whether to write row indices to the file (default isTrue
).
import pandas as pd countries_data = {'country' : ['Thailand', 'Philippines', 'Monaco', 'Malta', 'Sweden', 'Paraguay', 'Latvia'], 'continent' : ['Asia', 'Asia', 'Europe', 'Europe', 'Europe', 'South America', 'Europe'], 'capital':['Bangkok', 'Manila', 'Monaco', 'Valletta', 'Stockholm', 'Asuncion', 'Riga']} countries = pd.DataFrame(countries_data) countries.to_csv('countries.csv') print('Done')
Swipe to show code editor
- Read the CSV file into a DataFrame.
- Display the contents on your screen.
Solution
Thanks for your feedback!