indexing
What is the difference between loc and iloc in Pandas?
loc is used for label-based indexing, while iloc is used for position-based indexing.
In Pandas, loc and iloc are two powerful indexing methods used to access and manipulate data in DataFrames. The primary difference between them lies in how they access data:
loc (Label-based indexing)
- loc is primarily label-based, meaning you have to specify rows and columns based on their row label(s) and column name(s).
- It includes the end index.
iloc (Integer-based indexing)
- iloc is primarily integer-based, meaning you have to specify rows and columns by their integer position.
- It excludes the end index.
import pandas as pd
# Create a DataFrame
data = {
'Open': [99,100,101],
'High': [102, 103, 104],
'Close': [101, 102, 103],
'Low': [98, 99, 100]
}
df = pd.DataFrame(data)
print(df)
print("\nAccessing data using loc:")
print(df.loc[0, 'Open']) # Accessing a single value
print(df.loc[[0, 2], 'Open']) # Accessing multiple rows
print(df.loc[0, ['Open', 'High']]) # Accessing multiple columns
print(df.loc[0:2, 'Open':'High']) # Accessing a range of rows and columns
# Using iloc
print("\nAccessing data using iloc:")
print(df.iloc[0, 0]) # Accessing a single value
print(df.iloc[[0, 2], 0]) # Accessing multiple rows
print(df.iloc[0, [0, 1]]) # Accessing multiple columns
print(df.iloc[0:2, 0:2]) # Accessing a range of rows and columns
How do you select a subset of columns from a DataFrame?
You can select a subset of columns from a DataFrame by passing a list of column names to the []
operator, or by using the loc
or iloc
methods.
Select column using column name
selected_by_column = df['Open']
print(selected_by_column)
# Select columns using column names
selected_by_columns = df[['Open', 'High']]
print(selected_by_columns)