Skip to content

Introduction

What is Pandas?

Pandas is a open-source data analysis and manipulation tool for Python.

What are the primary data structures in Pandas?

The primary data structures in Pandas are: 1. Series (1-dimensional labeled array) 2. DataFrame (2-dimensional labeled data structure with columns of potentially different types)

What are the key features of Pandas?

The key features of Pandas include: 1. Handling missing data 2. Data merging and joining 3. Data reshaping and pivoting 4. Data cleaning and preprocessing 5. Data analysis and grouping

What is a Series in Pandas?

A Series is a one-dimensional labeled array of values that can be of any data type, including strings, integers, and floats.

How do I create a Series in Pandas?

You can create a Series in Pandas by passing a list or array of values to the pd.Series() function.

    # create a series using dict

    # Step 01: import the pandas library and assign it the alias pd.
    import pandas as pd

    # Step 02: Define a dictionary x_stock_dict containing stock price data.
    x_stock_dict = {"Open": 100, "High": 105, "Close": 103, "Low": 99}

    # Step 02: Create a Pandas Series using the pd.Series() constructor.
    pd.Series(x_stock_dict)

Dictionary to Series Conversion:

When passing a dictionary to pd.Series(), Pandas automatically: - Uses the dictionary keys as the Series index. - Uses the dictionary values as the Series values.

    # Create a series using list

    # Step 01: import the pandas library and assign it the alias pd.
    import pandas as pd

    # Step 02: Define a list x_stock_prices containing four stock price values.
    x_stock_prices = [100, 105, 103, 99]

    # Step 03: Create a Pandas Series using the pd.Series() constructor.

    pd.Series(x_stock_prices, index=["Open", "High", "Close", "Low"])

Series Constructor Arguments:

  • x_stock_prices: The list of values to be used as the Series data.
  • index=["Open", "High", "Close", "Low"]: A list of labels to be used as the Series index.

What is a DataFrame in Pandas?

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types, similar to an Excel spreadsheet or a table in a relational database.

How do I create a DataFrame in Pandas?

You can create a DataFrame in Pandas by passing a dictionary of values or a list of lists to the pd.DataFrame() function.

# Create from dict of Series
    import pandas as pd

    x_stocks_data = {
        "Open": pd.Series([99, 100, 101], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
        "High": pd.Series([102, 103, 104], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
        "Close": pd.Series([101, 102, 103], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
        "Low": pd.Series([98, 99, 100], index=["2024-09-14", "2024-09-15", "2024-09-16"]),
    }
    df = pd.DataFrame(x_stocks_data)
    print(df)
    # Create from a list of dicts
    import pandas as pd

    x_stocks_data = [
        {"Open": 99, "High": 102, "Close":101, "Low":98}, 
        {"Open": 100, "High": 103, "Close":102, "Low":99}, 
        {"Open": 101, "High": 104, "Close":103, "Low":100}, 
        ]

    df = pd.DataFrame(x_stocks_data)
    print(df)
    print("\nAccessing data using loc:")
    print(df.loc[0, 'Open']) # Accessing a single value 
    print(df.loc[[0, 2], 'Open'])  # Accessing multiple rows
    print(df.loc[0, ['Open', 'High']])  # Accessing multiple columns
    print(df.loc[0:2, 'Open':'High'])  # Accessing a range of rows and columns
    # Using iloc
    print("\nAccessing data using iloc:")
    print(df.iloc[0, 0])  # Accessing a single value
    print(df.iloc[[0, 2], 0])  # Accessing multiple rows
    print(df.iloc[0, [0, 1]])  # Accessing multiple columns
    print(df.iloc[0:2, 0:2])  # Accessing a range of rows and columns