Retail Sales Analysis

SQL Concepts & Features

  1. Aggregation (SUM, AVG, COUNT)
  2. Conditional aggregation (CASE WHEN)
  3. Grouping (GROUP BY)
  4. Formatting (CAST, CONCAT)
  5. Common Table Expressions (CTEs) for advanced calculations like percentage contributions

Project Details

A retail company wanted to gain insights into its sales data to optimize inventory management, marketing strategies, and customer engagement. The dataset contained transaction-level information including sales, customer demographics, product categories, and timestamps. However, it required data cleaning and exploration before meaningful analysis could be conducted.

==> The main objectives of the project were to:

  1. Clean and validate the sales dataset to ensure accuracy.

  2. Analyze sales patterns by product category, customer demographics, and time of day.

  3. Identify top-performing products and customers to inform strategic decisions.

  4. Provide actionable insights for inventory management, promotions, and staffing optimization.

==> To achieve these objectives, the following steps were undertaken using SQL:

1. Data Cleaning:

  • Checked for null values in all critical columns (transactions_id, sale_date, sale_time, customer_id, gender, age, category, quantity, price_per_unit, cogs, total_sale).

  • Ensured dataset consistency for accurate analysis.

2. Data Exploration:

  • Calculated total number of sales and total number of unique customers.

  • Identified distinct product categories.

3. Business Analysis & Insights:

  • Retrieved sales for a specific date and analyzed high-quantity transactions in a particular category.

  • Calculated total sales per category and average age of customers per category.

  • Identified transactions exceeding 1000 in total sales to spot high-value transactions.

  • Counted transactions by gender and category.

  • Determined the top 5 customers based on total sales.

  • Counted unique customers per category to understand reach.

4. Time-of-Day Analysis:

  • Classified transactions into morning (<12), afternoon (12–17), and evening (>17) shifts.

  • Counted the number of transactions in each time segment to identify peak sales periods.

the following outcomes were generated

  • Data Cleaning: Ensured a clean, consistent dataset for analysis.

  • Sales Insights:

    • Identified top-selling product categories and peak revenue periods.

    • Discovered high-value customers and transactions.

  • Customer Insights: Provided demographic insights for targeted marketing campaigns.

  • Operational Insights: Determined peak hours for staffing and promotions.

  • Strategic Outcomes: Enabled informed decisions on inventory planning, marketing focus, and resource allocation, leading to better revenue optimization and customer satisfaction.

Skills & Tools Demonstrated:

 

  • SQL Proficiency: Aggregation, conditional logic, filtering, grouping, and ranking.

  • Data Cleaning: Handling nulls and inconsistencies.

  • Analytical Thinking: Translating raw data into actionable business insights.

  • Business Acumen: Identifying KPIs and metrics relevant to sales, customers, and operations.

Insights and Tasks

Task 1: Data Cleaning & Exploration

Objective:
Ensure the sales dataset is clean, complete, and well-understood to provide a reliable foundation for further analysis.

Description:

  • Data Cleaning: Checked all columns (transactions_id, sale_date, sale_time, customer_id, gender, age, category, quantity, price_per_unit, cogs, total_sale) for null values to identify missing or incomplete records.

  • Data Exploration:

    • Counted the total number of sales to understand dataset size.

    • Counted the total number of unique customers to gauge customer reach.

    • Retrieved all distinct product categories to understand product diversity.

Purpose & Outcome:

  • Purpose: Ensure data quality and gain a preliminary understanding of sales patterns, customer base, and product variety.

  • Outcome:

    • Confirmed dataset completeness (no critical null values).

    • Established key metrics such as total sales, customer count, and product categories.

    • Created a solid foundation for deeper business analysis and insights.

SQL Query:

Task 2: Sales Analysis by Date, Category, and Total Sales

Objective:
Analyze sales transactions to gain insights on daily sales, high-volume transactions, and category-wise revenue, helping the business make informed operational and marketing decisions.

Description:

  • Sales by Specific Date: Retrieved all transactions for 2022-11-05 to analyze daily sales patterns and identify trends on specific dates.

  • High-Volume Transactions in a Category: Filtered transactions in the Clothing category where the quantity sold exceeded 4 in November 2022. This helped in understanding customer buying behavior and identifying potential bulk purchases.

  • Category-Wise Total Sales: Calculated total sales for each product category by multiplying price_per_unit with quantity and aggregating per category. This provided a clear picture of the most profitable product lines.

Purpose & Outcome:

  • Purpose: Identify key revenue drivers, peak transaction dates, and high-demand products to support inventory management and marketing strategy.

  • Outcome:

    • Gained visibility into daily sales trends for targeted analysis.

    • Identified high-volume transactions to support promotional planning.

    • Determined top-performing categories, providing actionable insights for stocking and prioritization.

Task 3: Customer Demographics and Category-Wise Sales Analysis

Objective:
To analyze sales distribution across categories, understand customer demographics, and evaluate purchasing behavior based on gender and unique customer participation.

Description:
This task focused on uncovering key business insights related to sales performance, customer engagement, and demographic trends using SQL queries.

  1. Category-Wise Total Sales:

    • Calculated total revenue per category to identify top-performing product segments and understand which categories drive the most sales.

  2. Average Age of Beauty Category Buyers:

    • Determined the average age of customers purchasing beauty products to identify the target demographic for marketing and product positioning.

  3. Unique Customer Count per Category:

    • Counted distinct customers for each category to measure customer reach, loyalty, and product popularity across different customer groups.

  4. Gender and Category-Wise Transaction Distribution:

     
    • Analyzed the number of transactions by gender for each category to understand gender-based purchasing trends and preferences.

Purpose & Outcome:

  • Purpose:

    • To evaluate category performance through sales metrics.

    • To analyze customer demographics and behavior patterns.

    • To identify opportunities for targeted marketing based on gender and age.

  • Outcome:

    • Identified top-selling product categories that contribute most to revenue.

    • Determined that specific age groups are more inclined toward certain product types like beauty and fashion.

    • Discovered gender-based purchasing trends, enabling more personalized product recommendations and campaigns.

    • Provided actionable insights for customer segmentation, marketing optimization, and category-level strategy development.

Task 4: High-Value Transactions and Top Customer Analysis

Objective:
Identify high-value transactions and top-performing customers to support strategic decision-making related to customer loyalty, revenue concentration, and targeted marketing initiatives.

Description:
This task aimed to uncover which transactions and customers contribute most to overall business revenue. Two key SQL analyses were performed:

  1. High-Value Transactions:

    • Retrieved all transactions with a total sale greater than 1000, highlighting large or premium purchases.

    • These transactions were further analyzed to identify patterns in product categories, timing, and customer types.

  2. Top 5 Customers by Total Sales:

    • Aggregated total sales by customer and ranked them in descending order.

    • Identified the top 5 customers responsible for the highest total sales, providing insights into the company’s most valuable customer base.

Purpose & Outcome:

  • Purpose:

    • To recognize high-value transactions that significantly impact total revenue.

    • To identify and understand the purchasing behavior of top customers for loyalty and retention programs.

  • Outcome:

    • Revealed that a small subset of customers contributes a disproportionately large share of total sales.

    • Enabled the business to focus on high-value customer retention through personalized offers and engagement strategies.

    • Provided a foundation for VIP customer programs, targeted marketing, and premium product recommendations.

    • Enhanced overall customer lifetime value (CLV) understanding and revenue optimization potential.

Task 5: Time-of-Day Sales Analysis (Shift-Based Analysis)

Objective:
To analyze the distribution of sales across different times of the day—Morning, Afternoon, and Evening—to understand customer purchasing patterns and optimize store operations, staffing, and promotions accordingly.

Description:
This task focused on segmenting sales transactions into time-based shifts using SQL’s CASE logic and then counting the number of orders in each time segment.

  1. Creating Time-Based Shifts:

    • Classified each transaction into a time segment based on the hour of sale_time:

      • Morning: Before 12 PM

      • Afternoon: Between 12 PM and 5 PM

      • Evening: After 5 PM

  2. Aggregating Sales by Time Segment:

    • Counted total transactions (NumberOfOrders) per shift to identify which part of the day experienced the highest sales activity.

Purpose & Outcome:

  • Purpose:

    • To understand when customers are most active throughout the day.

    • To support staffing optimization, inventory management, and marketing campaigns based on sales patterns.

  • Outcome:

    • Identified peak sales hours, such as increased transactions during afternoon or evening shifts.

    • Provided valuable insights for operational efficiency, such as adjusting employee schedules and restocking times.

    • Enabled data-driven planning for time-sensitive promotions and customer engagement campaigns.

    • Strengthened overall business intelligence by connecting temporal sales trends with customer behavior.

Conclusion

The Retail Sales Data Analysis Project successfully utilized SQL to transform raw transactional data into meaningful business intelligence. Through systematic data cleaning, exploration, and analysis, the project provided a clear, data-driven understanding of customer behavior, sales performance, and operational efficiency.

The project began by ensuring data integrity and consistency, eliminating null values and validating critical columns to establish a reliable foundation for analysis. This was followed by a series of targeted analytical tasks that uncovered key business insights:

  • Sales and Revenue Insights: Identified total and category-wise sales performance, highlighting which product categories generated the highest revenue.

  • Customer Behavior Analysis: Analyzed customer demographics (age, gender, and purchasing category), revealing which customer groups contributed most to sales.

  • High-Value Transactions and Top Customers: Discovered that a small segment of customers accounted for a significant portion of overall sales—critical for loyalty and retention strategies.

  • Time-of-Day Sales Trends: Identified sales distribution across morning, afternoon, and evening shifts, enabling optimization of staffing, promotions, and operational planning.

Key Outcomes:

  • Delivered a comprehensive overview of sales dynamics, including performance by product category, time of day, and customer profile.

  • Enhanced understanding of revenue concentration and customer segmentation, leading to more effective marketing and retention strategies.

  • Provided actionable insights for business decision-making, such as inventory allocation, staff scheduling, and targeted promotional campaigns.

  • Demonstrated proficiency in SQL-based data cleaning, aggregation, and analytical reporting, reflecting strong technical and business acumen.

Final Summary:

This project highlights the critical role of data analytics in the retail industry. By leveraging SQL for structured exploration and insight generation, the analysis not only illuminated patterns in sales and customer behavior but also delivered tangible strategies for growth and efficiency. The project reinforced the importance of data quality, analytical precision, and actionable interpretation, forming a complete data-to-decision workflow that empowers smarter business outcomes.