Introduction
What is Pandas?
Pandas is a open-source data analysis and manipulation tool for Python.
What are the primary data structures in Pandas?
The primary data structures in Pandas are: 1. Series (1-dimensional labeled array) 2. DataFrame (2-dimensional labeled data structure with columns of potentially different types)
What are the key features of Pandas?
The key features of Pandas include: 1. Handling missing data 2. Data merging and joining 3. Data reshaping and pivoting 4. Data cleaning and preprocessing 5. Data analysis and grouping
What is a Series in Pandas?
A Series is a one-dimensional labeled array of values that can be of any data type, including strings, integers, and floats.
How do I create a Series in Pandas?
You can create a Series in Pandas by passing a list or array of values to the pd.Series()
function.
# create a series using dict
# Step 01: import the pandas library and assign it the alias pd.
import pandas as pd
# Step 02: Define a dictionary x_stock_dict containing stock price data.
x_stock_dict = {"Open": 100, "High": 105, "Close": 103, "Low": 99}
# Step 02: Create a Pandas Series using the pd.Series() constructor.
pd.Series(x_stock_dict)
Dictionary to Series Conversion:
When passing a dictionary to pd.Series(), Pandas automatically: - Uses the dictionary keys as the Series index. - Uses the dictionary values as the Series values.
# Create a series using list
# Step 01: import the pandas library and assign it the alias pd.
import pandas as pd
# Step 02: Define a list x_stock_prices containing four stock price values.
x_stock_prices = [100, 105, 103, 99]
# Step 03: Create a Pandas Series using the pd.Series() constructor.
pd.Series(x_stock_prices, index=["Open", "High", "Close", "Low"])
Series Constructor Arguments:
x_stock_prices
: The list of values to be used as the Series data.index=["Open", "High", "Close", "Low"]
: A list of labels to be used as the Series index.
What is a DataFrame in Pandas?
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types, similar to an Excel spreadsheet or a table in a relational database.
How do I create a DataFrame in Pandas?
You can create a DataFrame in Pandas by passing a dictionary of values or a list of lists to the pd.DataFrame() function.
# Create from dict of Series
import pandas as pd
x_stocks_data = {
"Open": pd.Series([99, 100, 101], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
"High": pd.Series([102, 103, 104], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
"Close": pd.Series([101, 102, 103], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
"Low": pd.Series([98, 99, 100], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
}
df = pd.DataFrame(x_stocks_data)
print(df)
# Create from a list of dicts
import pandas as pd
x_stocks_data = [
{"Open": 99, "High": 102, "Close":101, "Low":98},
{"Open": 100, "High": 103, "Close":102, "Low":99},
{"Open": 101, "High": 104, "Close":103, "Low":100},
]
df = pd.DataFrame(x_stocks_data)
print(df)