Lesson 1
Loading and Viewing Data in Pandas
Introduction

Hello and welcome to our journey into data analysis with Python and pandas. Today we'll discover pandas DataFrames and learn about Loading and Viewing Data.

Pandas, a fantastic Python library, simplifies data manipulation and analysis. Our focus today is DataFrames — the go-to structure in pandas for data handling.

We will read data from different sources using pandas, load it into a DataFrame, and then explore this data. Let's begin!

Installing and Importing pandas

Installing and importing the pandas library is like getting our recipe book ready before we start cooking. In our CodeSignal kitchen, pandas comes pre-installed. To open the book, we just need to import pandas into our script. It's as simple as:

Python
1import pandas as pd # Pandas successfully imported

This line sets a short alias, pd, for pandas so we don't have to write out pandas each time we use it.

Introduction to DataFrames

In pandas, a DataFrame is like a table, with the data as the dishes on the table. Creating a DataFrame out of a list or a dictionary is a snap with pandas. Here's how:

Python
1import pandas as pd 2 3# From a list 4data_list = ['apple', 'banana', 'cherry'] 5df_list = pd.DataFrame(data_list, columns=['Fruit']) 6print(df_list) 7# Output: 8# Fruit 9# 0 apple 10# 1 banana 11# 2 cherry
Creating from Dictionary

And here is how to create a dataframe from dictionary:

Python
1# From a dictionary 2data_dict = {'Fruit': ['apple', 'banana', 'cherry'], 'Count': [10, 20, 15]} 3df_dict = pd.DataFrame(data_dict) 4print(df_dict) 5# Output: 6# Fruit Count 7# 0 apple 10 8# 1 banana 20 9# 2 cherry 15
Viewing Data in a DataFrame: Head and Tail

Now that we have our data in a DataFrame, how do we look at it and understand it? Pandas provides us with methods like head(), tail(), and info(). Here's how to use them:

Python
1# First 5 rows 2print(df.head()) # Output: First 5 rows of DataFrame 'df' 3 4# Last 5 rows 5print(df.tail()) # Output: Last 5 rows of DataFrame 'df'

In our case, we have just three rows in the dataframe, so both head() and tail() will simply output the whole dataframe. However, for real data with lots of rows, they are quite useful!

Viewing Data in a DataFrame: overview

Let's take a look at the dataframe's Overview:

Python
1# Overview 2print(df.info()) 3# <class 'pandas.core.frame.DataFrame'> 4# RangeIndex: 3 entries, 0 to 2 5# Data columns (total 2 columns): 6# # Column Non-Null Count Dtype 7# --- ------ -------------- ----- 8# 0 Fruit 3 non-null object 9# 1 Count 3 non-null int64 10# dtypes: int64(1), object(1)

As you see, the overview contains information about column's names, amount of present data and data types for each column.

Concatenating DataFrames: `pd.concat`

Sometimes, you might need to combine multiple DataFrames into a single one. This can be done using the pd.concat function for any dataframes with the same set of columns. Here's a simple example:

Python
1import pandas as pd 2 3# Creating first DataFrame 4data1 = {'Fruit': ['apple', 'banana'], 'Count': [10, 20]} 5df1 = pd.DataFrame(data1) 6 7# Creating second DataFrame 8data2 = {'Fruit': ['cherry', 'date'], 'Count': [15, 25]} 9df2 = pd.DataFrame(data2) 10 11# Concatenating DataFrames 12df_combined = pd.concat([df1, df2]) 13print(df_combined) 14# Output: 15# Fruit Count 16# 0 apple 10 17# 1 banana 20 18# 0 cherry 15 19# 1 date 25

In this example, pd.concat takes a list of DataFrames as its argument and combines them along their rows by default. Notice that the indices are preserved. If you want to ignore the original indices and create a new continuous index, you can pass the argument ignore_index=True to pd.concat:

Python
1df_combined = pd.concat([df1, df2], ignore_index=True) 2print(df_combined) 3# Output: 4# Fruit Count 5# 0 apple 10 6# 1 banana 20 7# 2 cherry 15 8# 3 date 25

This way, the resulting DataFrame will have a new set of sequential indices.

Series

In a DataFrame, each column is a Series object. A Series in pandas is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, Python objects, etc.). It is essentially a list of values with an associated label (index) for each value. Here is an example:

Python
1import pandas as pd 2 3# Create a Series from a list 4fruits = ['apple', 'banana', 'cherry'] 5series = pd.Series(fruits) 6 7print(series) 8# Output: 9# 0 apple 10# 1 banana 11# 2 cherry 12# dtype: object

Whenever you work with a single dataframe's column, you work with a series. Series objects have their own set of methods, but most of them overlap with the dataframe's method. For example, series also have methods like head, tail or describe.

Lesson Summary and Practice

Well done! You've learned how to load data into a pandas DataFrame and view the data. It's a solid start to data analysis.

In this lesson, we've covered what pandas and a DataFrame are, how to load data into a DataFrame, and methods to view the data.

Remember, practice makes perfect, so look forward to reinforcing your newfound skills in the upcoming practice exercises. Stick with it, and you'll build a strong foundation to excel in data analysis. Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.