Effortless Data Merging in Pandas: Left & Right Joins with Real-World Use Cases
Efficiently merge datasets, handle missing data, and improve data analysis using Left and Right Joins in Pandas.
Merging datasets is a crucial step in data analysis and data engineering. Often, we need to combine information from multiple tables while ensuring that no critical data is lost.
In this article, we will explore:
- Left Joins and Right Joins in Pandas
- How they work, their key differences, and when to use them
- Best practices for handling missing records and improving data quality
By the end of this guide, you will be able to merge datasets efficiently and apply real-world data quality checks using Pandas’ .merge()
function.
Why This Module is Important?
Merging datasets correctly ensures data consistency and accuracy for reporting, visualization, and machine learning.
Here’s why Left and Right Joins are essential:
✅ Retaining Parent or Child Records — Left Joins keep all records from the left dataset, while Right Joins retain all records from the right dataset.
✅ Identifying Missing or Orphaned Data — Helps in finding unmatched records and improving data quality.
✅ Handling Missing Data Efficiently — Managing NaN values ensures complete and reliable datasets.
✅ Generating Meaningful Reports — Enables analysis of inactive sales reps, orphaned sales transactions, and other key business insights.
📌 This guide will provide a solid foundation for working with relational datasets in Pandas.
Understanding Left and Right Joins in Pandas
Left Join (how=’left’)
- Includes all rows from the left table and only the matching rows from the right table.
- Unmatched rows from the right table are filled with NaN values.
- Example Use Case: Getting a complete list of sales reps, including those who haven’t made any sales.
Right Join (how=’right’)
- Includes all rows from the right table and only the matching rows from the left table.
- Unmatched rows from the left table are filled with NaN values.
- Example Use Case: Getting a full list of sales transactions, including those without an assigned sales rep.
You can download the datasets from the following GitHub link: GitHub Datasets
Step-by-Step Implementation in Pandas
First, let’s create two sample DataFrames to demonstrate Left and Right Joins.
import pandas as pd
# Sales Representatives Table (Parent)
sales_reps = pd.DataFrame({
"rep_id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"region": ["North", "South", "West"]
})
# Sales Data Table (Child)
sales = pd.DataFrame({
"sale_id": [101, 102, 103, 104],
"sale_rep_id": [1, 2, None, 4], # Includes a missing and an unmatched ID
"sale_amount": [500, 1000, 750, 1200]
})
Observations:
- Charlie (rep_id=3) has no sales records.
- One sale record (sale_id=104) has an unknown sales rep (rep_id=4).
Performing a Left Join
import pandas as pd
# Performing a LEFT JOIN
left_join = pd.merge(
sales_reps,
sales,
left_on="rep_id",
right_on="sale_rep_id",
how="left"
)
Expected Output:
💡 Key Observations:
✅ All sales reps are included, even if they don’t have sales.
✅ Charlie (rep_id=3) appears, but the sale_id
and sale_amount
fields are NaN.
Performing a Right Join
import pandas as pd
# Performing a RIGHT JOIN
right_join = pd.merge(
sales_reps,
sales,
left_on="rep_id",
right_on="sale_rep_id",
how="right"
)
Expected Output:
💡 Key Observations:
✅ All sales transactions are included, even if there’s no corresponding sales rep.
✅ Sale ID 104 (sales_rep_id=4) is present, but the name
and region
fields are NaN.
Practical Use Cases
1️⃣ Identifying Sales Reps Without Sales (Inactive Reps)
To find sales reps who haven’t made any sales, we can filter for NaN values in sale_id:
inactive_reps = left_join[left_join["sale_id"].isnull()]
print(inactive_reps)
Expected Output:
2️⃣ Detecting Orphaned Sales Transactions
To find sales transactions that don’t have an assigned sales rep, we can filter for NaN values in rep_id:
orphaned_sales = right_join[right_join["rep_id"].isnull()]
print(orphaned_sales)
Expected Output:
Best Practices for Left and Right Joins
✅ Choose the Right Join Type:
- Use Left Join when the parent dataset (e.g., sales reps) is more important.
- Use Right Join when the child dataset (e.g., sales transactions) is more important.
✅ Inspect NaN Values:
- Check for missing values using
.isnull()
to identify unmatched rows.
✅ Handle Missing Data Efficiently:
Ensures that missing values do not impact downstream analysis.
- Replace NaN values with
"Unknown"
where necessary:
right_join.fillna({"name": "Unknown", "region": "Unknown"}, inplace=True)
Expected Output:
🔗 Practice Assignment
💡 Want to practice? Attempt the Left and Right Joins in Pandas Assignment
👉 Click here.
What’s Next?
Now that we have explored Left and Right Joins, we will move on to Full Outer Joins in Pandas. We’ll see how to combine datasets while keeping all records from both sources, filling in missing values where necessary.
Click 👉 [Here] to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this article, we covered:
✅ How Left and Right Joins work and when to use them.
✅ The difference between retaining unmatched rows from left or right datasets.
✅ How to handle missing values (NaN
) effectively.
✅ Real-world use cases such as finding inactive reps and orphaned sales.
By mastering these techniques, you’ll be able to confidently merge and integrate data for data analysis, reporting, and machine learning workflows.
📢 Engage With Us!
👉 Follow Durga Gadiraju for more insightful articles on Python Pandas and data handling.
🔁 Share this article with your peers and help them master data import techniques.
💬 We welcome your feedback and questions — let’s make this a collaborative learning experience!
✨ Stay informed. Stay ahead.✨