Skip to content

indexing

What is the difference between loc and iloc in Pandas?

loc is used for label-based indexing, while iloc is used for position-based indexing.

In Pandas, loc and iloc are two powerful indexing methods used to access and manipulate data in DataFrames. The primary difference between them lies in how they access data:

loc (Label-based indexing)

  • loc is primarily label-based, meaning you have to specify rows and columns based on their row label(s) and column name(s).
  • It includes the end index.

iloc (Integer-based indexing)

  • iloc is primarily integer-based, meaning you have to specify rows and columns by their integer position.
  • It excludes the end index.
    import pandas as pd

    # Create a DataFrame
    data = {
        'Open': [99,100,101],
        'High': [102, 103, 104],
        'Close': [101, 102, 103],
        'Low': [98, 99, 100]
    }
    df = pd.DataFrame(data)
    print(df)
    print("\nAccessing data using loc:")
    print(df.loc[0, 'Open']) # Accessing a single value 
    print(df.loc[[0, 2], 'Open'])  # Accessing multiple rows
    print(df.loc[0, ['Open', 'High']])  # Accessing multiple columns
    print(df.loc[0:2, 'Open':'High'])  # Accessing a range of rows and columns
    # Using iloc
    print("\nAccessing data using iloc:")
    print(df.iloc[0, 0])  # Accessing a single value
    print(df.iloc[[0, 2], 0])  # Accessing multiple rows
    print(df.iloc[0, [0, 1]])  # Accessing multiple columns
    print(df.iloc[0:2, 0:2])  # Accessing a range of rows and columns

How do you select a subset of columns from a DataFrame?

You can select a subset of columns from a DataFrame by passing a list of column names to the [] operator, or by using the loc or iloc methods.

Select column using column name

    selected_by_column = df['Open']
    print(selected_by_column)

    # Select columns using column names
    selected_by_columns  = df[['Open', 'High']]
    print(selected_by_columns)

Select columns using column names

    selected_by_columns  = df[['Open', 'High']]
    print(selected_by_columns)

Select columns by label using loc

    selected_columns_loc = df.loc[:, ['Open', 'High']]

    print(selected_columns_loc)

Select columns using iloc

    selected_columns_iloc = df.iloc[:, [0, 1]]

    print(selected_columns_iloc)

Select columns based on a condition

    # The axis=1 parameter in the filter() method specifies that we want to filter columns, not rows.
    selected_columns_filter = df.filter(like='H', axis=1) # 

    print(selected_columns_filter)