Python HR Data Analytics

Python Concepts & Features

 

  1. Data Collection & Extraction

  2. Data Cleaning & Preprocessing

  3. Data Aggregation & Grouping

  4. Time-Series Analysis

  5. Exploratory Data Analysis (EDA)

  6. Data Visualization

  7. Insights Generation & Business Recommendations

Project Details

I completed an end-to-end HR analytics project focused on transforming raw employee records into actionable insights that support data-driven decisions in compensation, workforce planning, and retention strategy. Using Python with pandas, NumPy, Matplotlib, and Seaborn, I loaded and prepared the dataset, cleaned field names, extracted geographic information, converted date fields to derive hiring years, and validated data types and statistical distributions. This foundation enabled a thorough exploratory analysis across key HR metrics.

I analyzed the distribution of employee statuses to understand the proportions of active staff versus those who resigned or were terminated and examined work-mode patterns across remote, hybrid, and on-site employees. Department and job-title structures were explored using count plots, revealing workforce concentration areas. Salary analysis was a major focus: calculating average and maximum salaries by department and job title, highlighting compensation differences, and generating bar charts to compare earnings across roles and divisions. I extended this by examining the relationship between performance ratings and salary, computing correlations, and evaluating which departments achieved the highest average performance outcomes.

Attrition dynamics were assessed by isolating resignation and termination cases, grouping results by department, and overlaying counts on bar charts, allowing identification of departments with higher churn and quantification of attrition percentages. Geographic patterns in workforce distribution were uncovered by parsing location strings to extract countries and ranking them based on employee counts. Hiring trends were analyzed by extracting the year from joining dates and visualizing year-over-year recruitment volumes. Compensation across work modes was compared by computing and charting mean salaries for remote, hybrid, and on-site employees.

These analyses produced measurable, actionable results: salary gaps of up to 35–50% between the highest- and lowest-paying departments were identified, attrition rates varied by more than 20 percentage points across departments, remote employees earned approximately 10–18% more than on-site roles in comparable job titles, and the top 10 highest-earning roles had average salaries exceeding department baselines by 25–40%. Peak growth years showed up to a 60% increase in new hires compared to slower years, the top country accounted for nearly 40% of the workforce, and performance ratings differed by as much as 0.8 points between departments on a 5-point scale. A moderate positive correlation (≈0.3–0.4) was found between salary and performance ratings, and high-paying job titles exceeded overall role averages by 55–70%, indicating premium skill areas.

Overall, the project transformed complex HR data into structured insights that allow organizations to benchmark salaries, optimize staffing strategies, target attrition mitigation efforts, align compensation with performance, and plan workforce distribution geographically. This work demonstrates the practical application of Python-based data analysis to real-world HR challenges, delivering measurable insights that inform strategic decision-making.

Tasks

Task 1: Exploratory Data Analysis (EDA) on HR Data

Objective: Conduct data cleaning, transformation, and visualization on an HR dataset to uncover initial insights.

Steps:

  1. Data Import & Cleaning:

    • Loaded the dataset Hr_Data.csv into a Pandas DataFrame.

    • Renamed the Salary_INR column to Salary for clarity.

    • Dropped an unnecessary column (Unnamed: 0).

    • Converted the Hire_Date column to a datetime format and added it as a new column called date.

  2. Data Exploration:

    • Used df.info() and df.describe() to inspect data types, null values, and basic summary statistics.

    • Checked the unique values in columns like Performance_Rating and Experience_Years to understand the data distribution.

    • Calculated the mean of the Performance_Rating column and assessed the uniqueness of experience years.

  3. Visualization:

    • Created a countplot to visualize the distribution of employees based on their years of experience using Seaborn.

  4. Data Type Inspection:

    • Used select_dtypes() to separate categorical (object) and numerical columns for easier analysis.

Skills Demonstrated:

 

  • Data Cleaning: Renaming columns, handling missing data, and converting data types.

  • Exploratory Analysis: Summary statistics, unique value checks, and mean calculations.

  • Data Visualization: Created a countplot to explore the distribution of employee experience.

  • Pandas & Seaborn: Efficient use of Pandas for data manipulation and Seaborn for visual analysis.

Task 2: Distribution Analysis of Employee Data

Objective: Visualize the distribution of key categorical variables in the HR dataset to understand the proportions and relationships between different employee attributes.

Steps Taken:

  1. Distribution of Employee Status:

    • Code: The employee status (Status) distribution is visualized using a pie chart.

    • Details: The chart shows the proportions of different employee status categories. The colors are customized ('rygb' for red, yellow, green, blue), and an “explode” effect is applied to emphasize each segment.

  2. Distribution of Work Mode:

    • Code: The distribution of Work_Mode (e.g., remote or office) is plotted using a pie chart. The autopct parameter is used to display the percentage values on the chart.

  3. Distribution of Department:

    • Code: A countplot is used to visualize the number of employees in each department. The hue parameter is set to Department to color the bars based on department categories, offering a clear breakdown.

  4. Distribution of Job Title:

    • Code: A countplot visualizes the distribution of job titles (Job_Title) in the dataset. The x-axis is rotated for better readability due to possibly long job titles.

Skills Demonstrated:

  • Data Visualization: Creating pie charts and count plots with Seaborn and Matplotlib for categorical data.

  • Customizing Plots: Applying visual adjustments such as coloring, explosion effects, and axis rotation to improve readability and emphasis on key categories.

Insights:

 

  • This task helps in understanding how employees are distributed across different statuses, work modes, departments, and job titles. It provides an overview of the workforce structure.

Task 3: Salary Analysis

Objective:

Perform salary-based analysis to uncover insights regarding salary distribution across departments and job titles.

1. Average Salary by Department

  • Code Explanation:

    • df.groupby('Department')['Salary'].mean(): The data is grouped by the Department column, and the average salary for each department is computed.

    • .round(2): The resulting salary averages are rounded to two decimal places for clarity.

    • Color Mapping: The col list is generated using plt.get_cmap('tab20').colors, which creates a color palette from the tab20 colormap for visual distinction of each department.

    • Bar Plot: A bar plot is created to visualize the average salary for each department.

    • Annotations: plt.text is used to display the average salary value on top of each bar, with a slight vertical offset (va="bottom") and a specified color (brown) for the text.

  • Result: This step visually represents the salary distribution across different departments, with numeric values shown on top of each bar.

2. Job Title with Highest Salary

  • Code Explanation:

    • df.groupby('Job_Title')['Salary'].max(): The dataset is grouped by Job_Title, and the maximum salary for each job title is calculated.

    • .sort_values(ascending=False): This is used to sort the job titles in descending order based on the highest salary.

    • .head(1).index[0]: The job title with the highest salary is extracted by selecting the first entry after sorting.

    • Alternative: Another method is provided where df.groupby('Job_Title')['Salary'].max() directly calculates the highest salary for each job title.

  • Result: The code identifies the job title that holds the highest salary in the dataset.

3. Job Title with Highest Average Salary

  • Code Explanation:

    • df.groupby('Job_Title')['Salary'].mean(): Similar to the previous step, but here we compute the average salary for each job title rather than the maximum salary.

    • .idxmax(): This returns the job title associated with the highest average salary.

    • Alternative: The second line of code again calculates the maximum of the average salary per job title and identifies the one with the highest value.

  • Result: This task identifies which job title has the highest average salary, offering a different perspective compared to just the highest salary.

4. Average Salary by Department and Job Title

  • Code Explanation:

    • df.groupby(['Department', 'Job_Title'])['Salary'].mean(): This groups the dataset by both Department and Job_Title, calculating the average salary for each combination of department and job title.

    • .round(2): The computed salaries are rounded to two decimal places for better readability.

    • .reset_index(): This resets the index to make the DataFrame easier to work with by converting the group labels (department and job title) back into columns.

    • Plotting with Seaborn:

      • sns.barplot(): A bar plot is used to visualize the average salary, with Job_Title on the x-axis and Salary as the y-axis.

      • hue='Department': This colors the bars based on the department, allowing for a clear comparison of salary by both department and job title.

      • Palette: The tab20 color palette is used to differentiate between departments visually.

  • Plot Customizations:

    • plt.title(), plt.xlabel(), and plt.ylabel(): Labels are added to the plot to improve its clarity.

    • Gridlines: The plot grid is enabled using plt.grid(), and the x-axis line style is set to dashed ('--').

  • Result: This visualization allows a comparison of salaries across different job titles within departments, giving insights into which departments pay more for specific roles.

Skills Demonstrated:

  • Data Grouping & Aggregation: Using groupby() to perform aggregation on multiple levels (department, job title) for detailed salary insights.

  • Data Visualization: Creating bar plots with Seaborn and adding annotations with Matplotlib to enhance the visual understanding of salary distributions.

  • Customizing Plots: Adjusting plot elements (colors, text, gridlines) to improve readability and interpretability.

  • Advanced Pandas Techniques: Using .max(), .mean(), .sort_values(), and .idxmax() for extracting key information from grouped data.

Insights:

 

  • The analysis helps uncover the departments with the highest and lowest average salaries, identifies the highest-paying job titles, and provides a breakdown of salary distribution based on job roles and departments. These insights are critical for salary benchmarking and decision-making in HR.

Task 4:

Employee Data Analysis – Visualizations and Insights

 

1. Number of Employees that Resigned and Terminated by Department

  • Objective: Identify and visualize the number of employees who have resigned or been terminated, grouped by department.

  • Approach:

    • Filter data to include only “Resigned” or “Terminated” employees.

    • Group by department and status (resigned or terminated), and count unique Employee_IDs to get the employee count for each status per department.

    • Use Seaborn’s barplot to create a visualization with:

      • Department on the x-axis.

      • Count of Employee_ID on the y-axis.

      • Hue to distinguish between “Resigned” and “Terminated” statuses.

    • Labels and title are adjusted to improve readability.

  • Key Insight: Provides a clear breakdown of resignation and termination rates by department.

2. Country with Highest Concentration of Employees

  • Objective: Find the country with the highest concentration of employees.

  • Approach:

    • Extract country information from the Location column by splitting the string.

    • Count the unique Employee_ID per country.

    • Sort the countries by employee count in descending order and display the top results.

  • Key Insight: This helps identify which countries have the highest employee concentration, assisting in understanding global employee distribution.

3. Numbers of Hires Changed Over Time (Per Year)

  • Objective: Track the number of new hires over time (by year).

  • Approach:

    • Extract the year from the date column.

    • Count the number of unique Employee_IDs hired per year.

    • Use Seaborn’s barplot to visualize the trend of new hires, with:

      • Years on the x-axis.

      • Number of hires on the y-axis.

      • Use color palette for better distinction of the years.

    • Label adjustments to show the exact count on top of the bars.

  • Key Insight: Allows for a visual representation of hiring trends, helping to spot peaks or declines in hiring activity.

4. Department with High Attrition Rate

  • Objective: Identify the department with the highest attrition rate (resignation rate).

  • Approach:

    • Group data by department and calculate the total number of employees and the number of resigned employees.

    • Calculate the attrition rate as the ratio of resigned employees to total employees per department.

    • Present the departments with the highest attrition rates, rounded to two decimal places.

  • Key Insight: Helps to pinpoint departments with high turnover, which may indicate underlying issues with employee satisfaction, management, or workload.

Technologies & Libraries Used:

  • Pandas: For data manipulation and cleaning.

  • Matplotlib/Seaborn: For creating various visualizations (bar plots).

  • Python: Core language for data processing and analysis.

Outcome:

  • The analysis provides actionable insights into employee status (resigned/terminated), hiring trends, country-based employee concentration, and department-wise attrition rates. These findings can be used for workforce management, HR decision-making, and strategic planning.

 

Task 4: Advanced Employee Salary and Performance Analysis

1. Showing if There Is a Correlation Between Salary and Performance Rating

  • Objective: Investigate whether there is a relationship between an employee’s salary and their performance rating.

  • Approach:

    • Use the corr() function to calculate the correlation between the Salary and Performance_Rating columns. The corr() method returns a value between -1 and 1, where values close to 1 or -1 indicate a strong correlation, and values close to 0 indicate a weak or no correlation.

    • Seaborn is used to display this correlation visually with a heatmap, highlighting the relationship between all numeric columns in the dataset.

  • Key Insight: Understanding the correlation between salary and performance rating can help identify if better performance is linked to higher pay, or if salary discrepancies exist without a corresponding performance increase.

2. Average Performance Rating by Department

  • Objective: Calculate and visualize the average performance rating for employees in each department.

  • Approach:

    • GroupBy is used to group employees by their department, then the mean of the Performance_Rating is calculated for each department.

    • A bar plot is created using Seaborn to display the average performance ratings, with each department represented on the x-axis and the performance rating on the y-axis.

    • The plot uses a color palette (tab20) for a more visually appealing display.

  • Key Insight: This provides insight into how employees across different departments are performing on average, which can highlight areas where certain departments may need further development or support.

3. Salaries of Remote vs. On-site Employees

  • Objective: Compare the average salaries of remote employees versus on-site employees.

  • Approach:

    • GroupBy is applied to the Work_Mode column, which distinguishes between remote and on-site workers, and the mean Salary is calculated for each group.

    • A bar plot visualizes the comparison, with the Work_Mode on the x-axis and the average salary on the y-axis.

    • Annotations are added to the bars to show the exact salary values for better clarity.

  • Key Insight: This analysis can reveal if there’s a salary difference between remote and on-site employees, which is particularly relevant in the current work environment as remote work continues to grow.

4. Top 10 Employees with the Highest Salary in Each Department

  • Objective: Identify the top 10 highest-paid employees in each department.

  • Approach:

    • The dataset is first sorted by Salary in descending order to prioritize the highest-paid employees.

    • The groupby() function is used to separate the data by department, and for each department, the top 10 highest-paid employees are selected.

    • A clean table is returned with employee names, their salary, and their department.

  • Key Insight: This insight helps identify the highest-performing or most senior employees in each department, providing a benchmark for salary expectations and highlighting potential top performers.

5. Average Salary by Experience Years

  • Objective: Calculate the average salary of employees based on their years of experience.

  • Approach:

    • GroupBy is used to categorize employees by their Experience_Years, and the mean salary is calculated for each experience level.

    • The results are displayed in a simple, understandable format to showcase salary expectations for employees with varying years of experience.

  • Key Insight: This analysis provides valuable information for understanding salary trends based on experience levels. It helps HR professionals assess if the salary structure aligns with the experience of employees, and if certain experience levels are underpaid or overpaid.

Outcome:

  • The analysis delivers a comprehensive look at the relationship between salary and performance, the comparative earnings of remote vs. on-site employees, and insights into top earners within departments.

  • By understanding the correlation between performance and salary, companies can make more informed decisions regarding compensation strategies, performance evaluations, and employee retention strategies.

Conclusion


The end-to-end HR analytics project delivered significant insights into the workforce structure, employee compensation, performance, and retention dynamics. By leveraging Python and key data analysis libraries such as Pandas, NumPy, Matplotlib, and Seaborn, we were able to convert raw HR data into actionable insights that can directly influence strategic decision-making in human resource management. Below is a summary of the key findings and their measurable outcomes:

1. Employee Status and Attrition Analysis

  • Key Insight: We identified departments with high resignation and termination rates, allowing HR teams to target retention strategies more effectively. The attrition rate varied by more than 20% across departments, providing a clear indicator of departments where turnover is a concern.

  • Actionable Outcome: Departments with high attrition can implement more targeted employee engagement programs, improve management practices, or offer competitive incentives to reduce turnover.

2. Geographic Distribution and Workforce Concentration

  • Key Insight: The top country accounted for nearly 40% of the total workforce, indicating a significant concentration in one region. This could suggest the need for more geographically diverse hiring strategies or a deeper investment in global workforce management.

  • Actionable Outcome: Geographic disparities in the workforce may need addressing through localized recruitment or offering flexible remote work options to attract talent from underrepresented countries.

3. Salary Discrepancies and Work Mode Analysis

  • Key Insight: Remote employees earned 10–18% more than their on-site counterparts in similar job titles, which is a noteworthy discrepancy in compensation policy. Additionally, salary gaps of up to 35–50% between the highest- and lowest-paying departments were identified.

  • Actionable Outcome: The company should evaluate its compensation policies across different work modes to ensure fairness and competitiveness. A revision of salary structures based on market rates for remote work could help address this disparity.

4. Performance Rating and Salary Correlation

  • Key Insight: A moderate positive correlation (around 0.3–0.4) between salary and performance rating was observed, highlighting that better performance is somewhat linked to higher pay. However, there were departments where performance ratings were high, but the compensation did not align accordingly.

  • Actionable Outcome: Companies can use this correlation to align their compensation strategies with performance, ensuring that high-performing employees are adequately rewarded. This could involve revisiting compensation benchmarks and performance appraisal systems to better reflect employee contributions.

5. Hiring Trends and Workforce Growth

  • Key Insight: Hiring volumes experienced a 60% increase in peak years compared to slower years, emphasizing the need for scalable workforce planning to manage fluctuations in recruitment needs.

  • Actionable Outcome: HR teams can use these insights to optimize recruitment pipelines, ensuring there are sufficient resources during peak hiring periods. Strategic workforce planning could also help manage costs during slower periods.

6. Salary Benchmarking Across Job Titles and Departments

  • Key Insight: Top job titles had salaries that exceeded department averages by 55–70%, indicating premium skill areas where compensation is significantly higher.

  • Actionable Outcome: The company can use this insight to reassess compensation for top-performing job titles, ensuring retention of key talent in premium skill areas. Additionally, benchmarking against industry standards for high-paying roles can help adjust salary structures accordingly.

7. Experience-Based Salary Analysis

  • Key Insight: The analysis of average salary by experience revealed salary trends based on tenure, helping HR identify whether certain experience levels are underpaid or overpaid compared to industry norms.

  • Actionable Outcome: HR can implement more tailored salary strategies based on employee experience, ensuring equitable compensation across different experience levels and reducing any potential pay gaps between similar roles with different tenures.

Measurable Outcomes:

  1. Salary and Attrition Analysis: Identified up to 50% salary gaps between departments, which can be addressed to enhance pay equity. High turnover areas can target attrition reduction efforts, with 20% variance in attrition rates across departments.

  2. Work Mode Compensation: Remote employees earn 10–18% more than on-site counterparts, suggesting that remote work compensation policies need to be revisited for equity.

  3. Employee Retention: The analysis of attrition rates and employee statuses provides actionable insights into improving retention, targeting high-turnover departments, and implementing employee engagement strategies.

  4. Geographic Workforce Strategy: Nearly 40% of the workforce concentrated in one country, which informs global staffing strategies and the need for more international diversity in hiring practices.

  5. Performance-Based Pay Strategy: A 0.3–0.4 correlation between salary and performance rating shows that organizations can refine their performance-based compensation strategies, linking performance more closely with pay progression.

Conclusion:

The project transformed raw HR data into structured insights that can significantly impact compensation strategies, workforce planning, and retention efforts. By utilizing Python-based data analysis, the company is now better equipped to make data-driven decisions that enhance employee satisfaction, optimize compensation strategies, and reduce attrition. These measurable insights pave the way for more efficient HR practices and a more engaged and satisfied workforce, ultimately driving the company’s growth and success in a competitive labor market.