Python Tutorial (13) Pandas rename column

Python Tutorial 13

Introduction to Pandas in Python

Pandas is a powerful open-source data manipulation and analysis library for Python. It provides easy-to-use data structures, such as Series and DataFrame, along with a variety of functions to perform efficient data manipulation and analysis tasks.

Developed by Wes McKinney, Pandas is particularly well-suited for working with structured data and is widely used in data science, statistics, and finance.

Key Features of Pandas:

DataFrame and Series:

DataFrame: A two-dimensional, tabular data structure with labeled axes (rows and columns).

Series: A one-dimensional labeled array, similar to a column in a DataFrame.

Data Cleaning:

Pandas simplifies tasks like handling missing data, filtering, and filling in gaps, making data cleaning more efficient.

Data Manipulation:

Easily manipulate and transform data using functions like grouping, merging, and reshaping.

Data Analysis:

Perform various statistical and mathematical operations on the data, such as mean, median, and standard deviation.

Data Input/Output:

Read and write data in various formats, including CSV, Excel, SQL databases, and more.

Time Series Data:

Pandas provides robust support for time-series data, making it a valuable tool for analyzing time-stamped data.

Integration with NumPy:

Built on top of NumPy, Pandas seamlessly integrates with the broader Python data science ecosystem.

Easy Plotting:

Integrated plotting functionality using Matplotlib, allowing for quick data visualization.

Getting Started with Pandas:

To use Pandas, you typically start by importing the library:

import pandas as pd

Creating a DataFrame:

# Creating a DataFrame from a dictionary

data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’],

        ‘Age’: [25, 30, 35],

        ‘City’: [‘New York’, ‘San Francisco’, ‘Los Angeles’]}

df = pd.DataFrame(data)

Reading Data:

# Reading data from a CSV file

df = pd.read_csv(‘example.csv’)

Basic Operations:

# Displaying the first few rows of the DataFrame

print(df.head())

# Getting summary statistics

print(df.describe())

# Selecting columns

print(df[‘Name’])

# Filtering data

print(df[df[‘Age’] > 30])

Pandas is an essential tool for anyone working with data in Python. Its versatility, combined with a vast community and extensive documentation, makes it a go-to library for data manipulation and analysis tasks.

Whether you’re cleaning messy data, analyzing trends, or preparing data for machine learning, Pandas is a valuable asset in the Python data science toolkit.

What is Pandas with Example

Example: Analyzing Student Data with Pandas

# Import Pandas

import pandas as pd

# Sample student data

data = {

    ‘StudentID’: [1, 2, 3, 4, 5],

    ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eva’],

    ‘Age’: [20, 21, 20, 22, 21],

    ‘Grade’: [85, 90, 78, 92, 88]

}

# Create a DataFrame

df = pd.DataFrame(data)

# Display the DataFrame

print(“Original DataFrame:”)

print(df)

print(“\n”)

# Basic Operations

# Display the first few rows of the DataFrame

print(“First 3 Rows:”)

print(df.head(3))

print(“\n”)

# Get summary statistics

print(“Summary Statistics:”)

print(df.describe())

print(“\n”)

# Selecting a specific column

print(“Names of Students:”)

print(df[‘Name’])

print(“\n”)

# Filtering data based on a condition

print(“Students with Grade >= 90:”)

print(df[df[‘Grade’] >= 90])

print(“\n”)

# Adding a new column

df[‘GradeCategory’] = df[‘Grade’].apply(lambda x: ‘Excellent’ if x >= 90 else ‘Good’)

print(“DataFrame with Grade Categories:”)

print(df)

print(“\n”)

# Grouping data by Age and calculating the average Grade

average_grade_by_age = df.groupby(‘Age’)[‘Grade’].mean()

print(“Average Grade by Age:”)

print(average_grade_by_age)

print(“\n”)

# Writing the DataFrame to a CSV file

df.to_csv(‘student_data.csv’, index=False)

print(“DataFrame saved to ‘student_data.csv'”)

This example covers several basic Pandas operations:

Creating a DataFrame: We create a DataFrame from a dictionary containing student information.

Basic Operations:

Displaying the first few rows.

Getting summary statistics.

Data Selection and Filtering:

Selecting a specific column (‘Name’).

Filtering students with a grade greater than or equal to 90.

Adding a New Column: We create a new column ‘GradeCategory’ based on the ‘Grade’ column.

Grouping and Aggregation: We group the data by ‘Age’ and calculate the average ‘Grade’ for each age group.

Writing to a CSV File: The final DataFrame is saved to a CSV file (‘student_data.csv’).

This is just a simple example, but it demonstrates how Pandas can be used to manipulate, analyze, and visualize data efficiently in Python.

How to use Pandas in Python with Example

Using Pandas in Python involves several key steps, such as creating DataFrames, manipulating data, and performing analysis. Here’s a step-by-step guide with examples:

Step 1: Import Pandas

import pandas as pd

Step 2: Create a DataFrame

You can create a DataFrame from various data sources like lists, dictionaries, CSV files, Excel files, databases, etc.

Example: Creating a DataFrame from Lists

data = {

    ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eva’],

    ‘Age’: [20, 21, 20, 22, 21],

    ‘Grade’: [85, 90, 78, 92, 88]

}

df = pd.DataFrame(data)

Step 3: Display the DataFrame

print(df)

Step 4: Basic DataFrame Operations

# Display the first few rows

print(df.head())

# Get summary statistics

print(df.describe())

# Select a specific column

print(df[‘Name’])

# Filter data based on a condition

print(df[df[‘Grade’] >= 90])

Step 5: Manipulate Data

# Add a new column

df[‘GradeCategory’] = df[‘Grade’].apply(lambda x: ‘Excellent’ if x >= 90 else ‘Good’)

# Drop a column

df = df.drop(‘Grade’, axis=1)

# Sort the DataFrame by a column

df = df.sort_values(by=’Age’)

Step 6: Grouping and Aggregation

# Group by ‘Age’ and calculate the average ‘Grade’

average_grade_by_age = df.groupby(‘Age’)[‘GradeCategory’].count()

print(average_grade_by_age)

Step 7: Save DataFrame to a File

# Save DataFrame to CSV file

df.to_csv(‘student_data.csv’, index=False)

Complete Example:

Step 1: Import Pandas

import pandas as pd

# Step 2: Create a DataFrame

data = {

    ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eva’],

    ‘Age’: [20, 21, 20, 22, 21],

    ‘Grade’: [85, 90, 78, 92, 88]

}

 

df = pd.DataFrame(data)

 

# Step 3: Display the DataFrame

print(“Original DataFrame:”)

print(df)

print(“\n”)

# Step 4: Basic DataFrame Operations

print(“First 3 Rows:”)

print(df.head(3))

print(“\n”)

# Step 5: Manipulate Data

df[‘GradeCategory’] = df[‘Grade’].apply(lambda x: ‘Excellent’ if x >= 90 else ‘Good’)

df = df.drop(‘Grade’, axis=1)

df = df.sort_values(by=’Age’)

# Step 6: Grouping and Aggregation

average_grade_by_age = df.groupby(‘Age’)[‘GradeCategory’].count()

print(“Average Grade by Age:”)

print(average_grade_by_age)

print(“\n”)

# Step 7: Save DataFrame to a File

df.to_csv(‘student_data.csv’, index=False)

print(“DataFrame saved to ‘student_data.csv'”)

This example covers the basic steps of using Pandas, from creating a DataFrame to performing operations and saving the result. You can adapt these steps to your specific data and analysis needs.

Where we can use Pandas in Python With Example

Pandas is a versatile library for data manipulation and analysis in Python. It finds application in various domains. Here are some examples of where Pandas can be used:

Data Cleaning and Preprocessing:

Example: Handling missing data, removing duplicates, and converting data types.

import pandas as pd

# Read data from a CSV file

data = pd.read_csv(‘data.csv’)

# Handle missing values

data = data.fillna(0)

# Remove duplicates

data = data.drop_duplicates()

# Convert data types

data[‘Date’] = pd.to_datetime(data[‘Date’])

Data Exploration and Analysis:

Example: Analyzing sales data, calculating statistics, and generating insights.

import pandas as pd

# Read data from a CSV file

sales_data = pd.read_csv(‘sales_data.csv’)

# Calculate total sales per product

total_sales = sales_data.groupby(‘Product’)[‘Amount’].sum()

# Visualize sales trends

sales_data.plot(x=’Date’, y=’Amount’, kind=’line’)

Time Series Analysis:

Example: Analyzing stock prices, weather data, or any time-stamped data.

import pandas as pd

# Read time series data

stock_prices = pd.read_csv(‘stock_prices.csv’, parse_dates=[‘Date’], index_col=’Date’)

# Resample data to monthly frequency

monthly_prices = stock_prices[‘Close’].resample(‘M’).mean()

# Plotting

monthly_prices.plot(kind=’bar’)

Data Merging and Joining:

Example: Combining data from multiple sources.

import pandas as pd

# Read data from two CSV files

orders = pd.read_csv(‘orders.csv’)

customers = pd.read_csv(‘customers.csv’)

# Merge data on the ‘customer_id’ column

merged_data = pd.merge(orders, customers, on=’customer_id’)

Data Input/Output:

Example: Reading and writing data in different formats like CSV, Excel, SQL, etc.

import pandas as pd

# Read data from an Excel file

data_excel = pd.read_excel(‘data.xlsx’)

# Write data to a CSV file

data_excel.to_csv(‘data_output.csv’, index=False)

Machine Learning Data Preparation:

Example: Getting data ready for machine learning models.

import pandas as pd

from sklearn.model_selection import train_test_split

# Read data from a CSV file

data = pd.read_csv(‘machine_learning_data.csv’)

# Split data into training and testing sets

train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

Web Scraping and Data Collection:

Example: Collecting data from websites and organizing it.

import pandas as pd

import requests

from bs4 import BeautifulSoup

# Web scraping example

url = ‘https://example.com’

response = requests.get(url)

soup = BeautifulSoup(response.text, ‘html.parser’)

# Extracting data from HTML and creating a DataFrame

data = pd.DataFrame({‘Title’: [title.text for title in soup.find_all(‘h2’)],

 ‘Link’: [link[‘href’] for link in soup.find_all(‘a’)]})

These examples showcase the versatility of Pandas across various tasks, making it an indispensable tool for data scientists, analysts, and developers working with structured data in Python.

Tips and Tricks Pandas

Read Data Efficiently:

Use parameters like nrows and usecols in read_csv to read a specific number of rows or columns.

import pandas as pd

data = pd.read_csv(‘data.csv’, nrows=100, usecols=[‘Column1’, ‘Column2’])

Handling Missing Data:

Use isnull() or notnull() to identify missing values.

Use dropna() or fillna() to handle missing values.

df.isnull().sum()

df.dropna()

df.fillna(value)

Filtering Data:

Use boolean indexing for filtering rows based on conditions.

df[df[‘Column’] > 10]

Applying Functions:

Use apply() to apply a function along the axis of a DataFrame or Series.

df[‘Column’].apply(lambda x: x * 2)

Grouping Data:

Use groupby() for grouping data based on one or more columns.

df.groupby(‘Category’)[‘Value’].mean()

Merging and Joining:

Use merge() for combining DataFrames based on a key.

merged_df = pd.merge(df1, df2, on=’key_column’)

Reshaping Data:

Use pivot() or melt() for reshaping data.

pivoted_df = df.pivot(index=’Date’, columns=’Category’, values=’Value’)

Datetime Operations:

Convert strings to datetime objects using pd.to_datetime().

Extract date components using dt accessor.

df[‘Date’] = pd.to_datetime(df[‘Date’])

df[‘Year’] = df[‘Date’].dt.year

Working with Text Data:

Use str accessor for various string operations.

df[‘Name’].str.lower()

Saving Data:

Use to_csv(), to_excel(), or other to_* methods to save data.

df.to_csv(‘output.csv’, index=False)

Memory Optimization:

Use appropriate data types (e.g., category for categorical data) to save memory.

df[‘Category’] = df[‘Category’].astype(‘category’)

Method Chaining:

Combine multiple operations in a single line using method chaining.

df = pd.read_csv(‘data.csv’).dropna().groupby(‘Category’)[‘Value’].mean()

Avoiding SettingWithCopy Warning:

Use copy() to explicitly create a copy of a DataFrame if needed.

subset = df[df[‘Column’] > 10].copy()

Handling Categorical Data:

Use pd.Categorical for efficient handling of categorical data.

df[‘Category’] = pd.Categorical(df[‘Category’])

Check Memory Usage:

Use info() to get information about the DataFrame, including memory usage.

df.info()

These tips and tricks should help you work more efficiently with Pandas and make your data manipulation and analysis tasks more streamlined.

How to use Pandas rename column with Example

You can rename columns in a Pandas DataFrame using the rename() method. Here’s an example:

import pandas as pd

# Sample DataFrame

data = {

    ‘OldName1’: [1, 2, 3],

    ‘OldName2’: [‘A’, ‘B’, ‘C’]

}

df = pd.DataFrame(data)

# Display the original DataFrame

print(“Original DataFrame:”)

print(df)

print(“\n”)

# Rename columns

df.rename(columns={‘OldName1’: ‘NewName1’, ‘OldName2’: ‘NewName2’}, inplace=True)

# Display the DataFrame after renaming columns

print(“DataFrame after renaming columns:”)

print(df)

In this example:

We create a DataFrame with columns named ‘OldName1’ and ‘OldName2’.

We use the rename() method to rename these columns to ‘NewName1’ and ‘NewName2’ respectively.

The columns parameter in the rename() method is a dictionary where keys are the old column names, and values are the new column names.

The inplace=True parameter modifies the original DataFrame in place. If inplace is set to False (which is the default), a new DataFrame with the updated column names will be returned, and the original DataFrame will remain unchanged.

After running this code, the DataFrame will have columns named ‘NewName1’ and ‘NewName2’:

DataFrame after renaming columns:

   NewName1 NewName2

0         1        A

1         2        B

2         3        C

This is a basic example, and you can adapt the rename() method to your specific needs, such as renaming only some columns, using a function to generate new column names, or combining renaming with other operations.

Leave a Reply