Brian Achaye
Brian Achaye

Data Scientist

Data Analyst

ODK/Kobo Toolbox Expert

BI Engineer

Data Solutions Consultant

Brian Achaye

Data Scientist

Data Analyst

ODK/Kobo Toolbox Expert

BI Engineer

Data Solutions Consultant

Articles

Table Joins Explained With SQL: How to Combine Data Like a Pro

October 1, 2023 Data Analysis, Python, SQL
Table Joins Explained With SQL: How to Combine Data Like a Pro

As a data analyst, I quickly learned that real-world data is messy and scattered across multiple tables. Customer info lives in one table, transactions in another, and product details in a third. To make sense of it all, you need table joins—the backbone of relational data analysis.

In this guide, I’ll break down what joins are, the different types (with visuals!), and when to use each—complete with SQL and Excel examples.

1. What Are Table Joins?

A join combines rows from two or more tables based on a related column (a key).

Why Joins Matter:

  • Avoid data duplication (normalized databases store data in separate tables).
  • Answer business questions like:
    • “What products did each customer buy?”
    • “Which regions have the highest sales for a given category?”

2. Types of Joins (With Visuals & Use Cases)

A. INNER JOIN (The Default Join)

What it does: Returns only matching rows from both tables.

Inner Join Diagram

When to use: When you only want records with matches in both tables.

Example (SQL):

SELECT orders.order_id, customers.customer_name  
FROM orders  
INNER JOIN customers ON orders.customer_id = customers.customer_id;  

Real-world use:

  • “Show me all orders with valid customer records.”

B. LEFT JOIN (Keep All Records from the First Table)

What it does: Returns all rows from the left table + matching rows from the right table (or NULL if no match).

Left Join Diagram

When to use: When you want all records from the primary table, even if some are missing in the secondary table.

Example (SQL):

SELECT customers.customer_name, orders.order_id  
FROM customers  
LEFT JOIN orders ON customers.customer_id = orders.customer_id;  

Real-world use:

  • “List all customers, even if they haven’t placed an order yet.”

C. RIGHT JOIN (Keep All Records from the Second Table)

What it does: The opposite of a LEFT JOIN—returns all rows from the right table + matching rows from the left.

Right Join Diagram

When to use: Rare (most analysts just swap tables and use LEFT JOIN).

Example (SQL):

SELECT orders.order_id, customers.customer_name  
FROM orders  
RIGHT JOIN customers ON orders.customer_id = customers.customer_id;  

Real-world use:

  • “Find all products, even those never ordered.”

D. FULL OUTER JOIN (Keep All Records from Both Tables)

What it does: Returns all rows from both tables, with NULL where no match exists.

Full Outer Join Diagram

When to use: When you need a complete merge (e.g., reconciling two datasets).

Example (SQL):

SELECT customers.customer_name, orders.order_id  
FROM customers  
FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;  

Real-world use:

  • “Show all customers and all orders, even if they don’t match.”

E. CROSS JOIN (Every Possible Combination)

What it does: Returns the Cartesian product (all possible row combinations).

When to use: Rare—mostly for generating test data or scenarios.

Example (SQL):

SELECT shirts.color, pants.style  
FROM shirts  
CROSS JOIN pants;  

Real-world use:

  • “Generate all possible product-color combinations.”

3. Joins in the Real World (Beyond SQL)

A. Joins in Excel (Power Query / VLOOKUP)

  • VLOOKUP = LEFT JOIN (but less flexible).
  • Power Query Merge = Full join support (like SQL).

B. Joins in Python (Pandas)

# Inner Join  
pd.merge(df1, df2, on='key', how='inner')  

# Left Join  
pd.merge(df1, df2, on='key', how='left')  

C. Joins in Tableau / Power BI

  • Handled automatically if relationships are defined in the data model.

4. Common Join Pitfalls & How to Avoid Them

🚨 Problem: Duplicate rows after joining.
Fix: Ensure join keys are unique or aggregate first.

🚨 Problem: Slow performance on large tables.
Fix: Use indexed columns for joins.

🚨 Problem: NULL values breaking matches.
Fix: Use COALESCE() or clean data beforehand.

5. Cheat Sheet: Which Join to Use?

Join TypeWhen to UseSQL Syntax
INNER JOINOnly matching recordsFROM A INNER JOIN B ON A.x = B.y
LEFT JOINAll left table + matching right recordsFROM A LEFT JOIN B ON A.x = B.y
RIGHT JOINAll right table + matching left recordsFROM A RIGHT JOIN B ON A.x = B.y
FULL JOINAll records from both tablesFROM A FULL JOIN B ON A.x = B.y
CROSS JOINAll possible combinationsFROM A CROSS JOIN B

Final Thoughts

Mastering joins is essential for any data analyst. Start with INNER and LEFT joins (they cover 90% of cases), then explore advanced scenarios.

Pro Tip: Always validate join results with a row count check!

Further Learning

Got a tricky join scenario? Ask me in the comments—I’ll help! 🚀

Related Posts
Write a comment