Data Collection & Extraction
Data Cleaning & Preprocessing
Data Aggregation & Grouping
Time-Series Analysis
Exploratory Data Analysis (EDA)
Data Visualization
Key Performance Indicators (KPIs)
Insights Generation & Business Recommendations
The organization needed a more efficient and dependable way to evaluate loan activity because manual reporting made it difficult to track operational performance and understand monthly changes in lending operations. Increasing application volumes highlighted the need for a system that could compute key metrics with accuracy and consistency.
The objective was to build a Python-based analytical solution that could automatically process loan data, calculate essential operational metrics, and generate clear, technically accurate visual reports. Core requirements included month-to-date loan application counts, month-to-date funded amounts, total funded amounts, total payments received, and customer-level financial ratios such as interest rate averages and debt-to-income levels.
To meet these needs, a modular processing pipeline was developed using pandas, NumPy, Matplotlib, and Seaborn. The system performs date parsing and dynamic filtering based on the latest issue date to compute month-to-date metrics. It aggregates funded amounts and payments, derives numerical averages for interest and DTI, and classifies loans into performing and non-performing categories using conditional filtering. Visual components generate monthly trend charts for funded amounts, payments received, and loan applications, as well as a state-level breakdown of funded amounts. Each figure uses categorical ordering, annotated values, and consistent styling to ensure readability and reproducibility.
As a result, the organization now has an automated and technically reliable reporting workflow that reduces manual effort, ensures consistent calculations across operational teams, and delivers timely insights into lending activity. The system also provides a solid foundation for future enhancements such as automated scheduling or integration with real-time data sources.
As part of the initial phase of the analysis, I began by calculating various Key Performance Indicators (KPIs) related to loan applications and funded amounts. This process was executed within a Jupyter Notebook environment, which allows for interactive code execution, real-time analysis, and the visualization of results, making it an essential tool for financial analysis and reporting.
The goal of this task was to calculate and display a series of KPIs that provide valuable insights into loan performance, focusing on metrics such as loan applications, total funded amounts, average interest rates, and the health of loans (good vs. bad loans). Understanding these KPIs is crucial for assessing the company’s current lending performance and identifying areas for improvement.
The following key performance metrics were calculated from the dataset:
MTD Total Loan Application: The total number of loan applications received month-to-date.
Total Funded Amount: The total dollar amount of loans funded, converted to millions for easier comprehension.
MTD Total Funded Amount: The total funded amount for the current month-to-date.
Average Interest Rate: The average interest rate across all loans, expressed as a percentage.
Average Debt-to-Income Ratio (DTI): The average debt-to-income ratio, which is important for assessing borrower financial health.
Good Loan Metrics: The percentage of loans marked as “Fully Paid” or “Current” (good loans).
Bad Loan Metrics: The percentage of loans marked as “Charged Off” (bad loans).
These KPIs are essential for understanding loan performance and guiding decision-making, such as adjusting lending policies or assessing risk.
In this phase, I ensured that the data used for KPI calculation was clean and well-structured. I performed a quick review of the dataset, focusing on:
Data Type Verification: Ensured fields like issue_date were correctly set as datetime objects to filter by month and year. Other fields, such as loan amounts and interest rates, were confirmed to be numerical for aggregation and calculation purposes.
Missing Values: Checked for missing data in key columns like loan_amount, int_rate, and dti. Any missing or null values would have needed to be handled to avoid skewing the calculations.
Memory Considerations: Given the size of the dataset, I estimated memory usage to ensure efficient processing, especially when calculating aggregate KPIs for a large volume of loans.
This step was essential to ensure that the KPIs were calculated using clean, accurate data, which directly impacts the reliability of the metrics.
The purpose of this KPI tracking phase was to:
Measure loan performance across different time periods (e.g., month-to-date loan applications, funded amounts).
Monitor the financial health of the loan portfolio by tracking good and bad loans.
Provide business insights that can inform lending strategies and risk management (e.g., identifying high interest rates or a high percentage of bad loans).
Validate data accuracy and completeness by checking for null values and ensuring that all necessary columns were properly formatted for calculation.
Ensure scalability and efficiency in processing large datasets for real-time KPI tracking.
The KPI calculations provided a clear snapshot of the loan performance and trends. The key outcomes included:
Loan application trends: Monthly loan applications and total funded amounts were calculated and tracked.
Interest rate and DTI insights: The average interest rate and DTI were calculated to provide an understanding of borrower financial health and lending conditions.
Loan quality tracking: Good and bad loan metrics were calculated to assess the overall health of the loan portfolio.
These insights laid the foundation for further analysis, allowing for deeper dives into loan performance and potentially guiding decision-making related to lending strategies, risk management, or identifying areas for improvement.
In this phase, I created visualizations to display the monthly trends for Total Funded Amount and Total Received Amount (total payments received). The goal of these charts is to provide an overview of how loan disbursements and repayments have evolved over time, which is crucial for understanding loan activity trends and financial health.
Objective: The chart shows how the total loan amount funded varies month by month, providing insights into the seasonality of loan issuance.
Steps:
First, the dataset is grouped by month using the issue_date column, and the total funded amount (loan_amount) for each month is calculated.
The loan_amount values are converted to millions for better readability.
The months are ordered in the natural sequence from January to December, ensuring that the visualization follows the correct timeline.
A fill-between plot is used to display the area under the trend line, creating a smooth curve showing the changes in the total funded amount.
Monthly totals are annotated on the plot to highlight exact values for each month, helping viewers quickly understand the financial trends.
The plot also includes labels for the x-axis (representing the months) and y-axis (representing the total funded amount in millions).
Chart Customization:
The x-axis is rotated to make the month names legible.
A dashed line style ('--') and brown color are used to enhance the aesthetics of the chart.
Objective: This chart tracks the total amount received in payments month-by-month, helping to visualize the repayment trends and financial inflows.
Steps:
Similar to the total funded amount chart, this one groups the data by month, but instead of loan_amount, it aggregates the total_payment column (representing payments made).
The data is again converted to millions to ensure the figures are easily readable.
A fill-between plot is applied with a light green color, displaying the trend of payments over time.
Annotations on the chart show the exact payment values for each month.
The x-axis and y-axis labels are customized to display the year and the total payments in millions.
Chart Customization:
The months are displayed on the x-axis, and similar formatting is applied as in the first chart for consistency.
These visualizations provide essential insights into the following:
Trends in Loan Disbursements: Understanding when loans are typically issued can help identify seasonal patterns in borrowing behavior.
Payment Trends: Monitoring payment trends is critical for assessing the repayment behavior of borrowers and the company’s cash flow.
Comparison of Funded vs. Received Amounts: Comparing the monthly trends of loan funding and repayments can provide insights into the company’s financial health, especially if there is a large gap between these two metrics, indicating potential future collection risks.
The charts give a clear, visual representation of loan issuance and repayment patterns throughout the year, offering an easy-to-interpret comparison of these two key financial metrics.
This step helps in decision-making, such as adjusting lending strategies or assessing the risk of delayed repayments.
In this phase, I created visualizations to track monthly loan applications and perform a regional analysis by state on the total funded amounts. These charts are designed to offer insights into temporal trends and geographic patterns in the loan dataset.
Objective: This visualization tracks the total number of loan applications submitted each month, helping to identify seasonal patterns and fluctuations in loan demand.
Steps:
First, the dataset is grouped by month using the issue_date column, and the count of loan applications (i.e., the number of loan ids) for each month is calculated.
The months are ordered sequentially (January to December) to ensure the plot follows the natural flow of time.
A fill-between plot is used to fill the area beneath the trend line with a grey color, providing a visual representation of the fluctuation in loan applications.
The actual count of loan applications is annotated on the chart for each month to show exact values.
The x-axis displays the months of the year, while the y-axis shows the total number of loan applications.
Chart Customization:
The months are rotated 90 degrees on the x-axis for readability.
A dashed grid line style is used to improve visual clarity.
The x-axis is labeled “month (for 2021)”, and the y-axis is labeled “Total Loan Applications”.
Objective: This chart shows the total funded loan amount by state, providing insights into geographic trends in loan distribution and identifying states with higher or lower loan funding amounts.
Steps:
The dataset is grouped by address_state, and the sum of loan_amount for each state is calculated, then converted to thousands (by dividing by 1000) for easier interpretation.
A bar plot is generated to represent the total funded amount by state, with bars colored in grey to visually distinguish the different states.
For each bar, the height (representing the total funded amount) is annotated at the top of the bar to show the exact value for each state.
The x-axis is labeled “State” and the y-axis is labeled “Funded Amount” (in thousands).
Chart Customization:
The bar plot includes annotations placed at the center of each bar to indicate the exact funded amount for each state.
These visualizations offer valuable insights, including:
Loan Application Trends: The monthly trend chart helps to identify peak months for loan applications, which can inform marketing and lending strategies based on periods of higher demand.
Geographic Insights: The regional analysis by state highlights where the most loans are being funded, allowing the company to assess regional performance, understand market penetration, and optimize strategies for states with lower loan funding.
Comparing Temporal and Regional Trends: By analyzing both the monthly application trends and regional funding patterns, you can gain a deeper understanding of the broader financial landscape and identify opportunities for growth or risk mitigation.
The Monthly Loan Applications trend provides insights into seasonality and demand fluctuations, while the Regional Analysis by State offers a geographic breakdown of loan activity.
These visualizations help stakeholders understand key metrics, enabling better decision-making for future funding and marketing strategies.
Task 4 – Visualizing Regional Trends in Loan Amounts and Applications
In this phase, I created two types of visualizations: one tracking the total loan amount received by state and another showing the total loan applications by state. These visualizations were designed to offer insights into regional patterns and trends in the dataset.
Objective:
The first visualization shows the total amount of funds received by state, helping to understand where the most funds have been distributed geographically.
Steps:
The dataset is grouped by state (address_state) and the total amount received (total_payment) is summed up for each state. The sum is divided by 1000 to convert the values into thousands for easier interpretation.
A bar plot is created, where each bar represents a state and the height of the bar corresponds to the total funded amount for that state.
Annotations are added at the top of each bar to display the exact total funded amount in thousands.
The x-axis is labeled as “State” and the y-axis shows the “Total Received Amount” (in thousands).
Chart Customization:
The x-axis labels are rotated 90 degrees for better readability.
A grey color is used for the bars to keep the chart neutral and focused on the values.
The plt.xticks(rotation=90) ensures the state names are readable.
Insights:
This chart helps to identify which states have received the highest and lowest loan amounts, allowing the company to adjust funding strategies based on state performance.
Objective:
The second visualization tracks the total number of loan applications by state, providing insights into where the demand for loans is highest.
Steps:
The dataset is grouped by state (address_state) and the count of loan applications (id.count()) is calculated for each state.
A bar plot is generated, with each bar representing a state and the height of the bar showing the total number of loan applications for that state.
Annotations are added to the center of each bar to display the exact number of loan applications.
The x-axis is labeled as “State” and the y-axis is labeled “Total Loan Application.”
Chart Customization:
The x-axis labels are rotated 90 degrees to ensure the state names fit neatly.
The bars are colored grey to maintain visual consistency with the previous plot.
Insights:
This chart helps to visualize where the highest number of loan applications are coming from, providing insights into regions with higher interest or need for loans.
Loan Funding Trends: The regional analysis of total loan amounts shows where the most funding is allocated, which could indicate regions of stronger market presence or demand.
Loan Application Trends: By examining the total number of loan applications, stakeholders can understand where demand is increasing and potentially target underfunded areas for marketing or expansion.
Comparing Geographic Trends: These visualizations combined give a comprehensive view of how loan applications and funded amounts align across states, enabling more informed decision-making in loan distribution strategies.
The visualizations provide actionable insights into both the volume of loan applications and the amount of loan funding distributed by state. These trends help the company understand regional market dynamics and optimize funding strategies accordingly.
In this phase, I created multiple visualizations to better understand loan data from different perspectives, such as by loan term, employee length, and loan purpose. These visualizations provide detailed insights into the distribution of loan amounts, applications, and funded amounts across various categories.
Objective:
This visualization tracks the total funded amount for each loan term, helping to understand how loan funding is distributed based on the term of the loan.
Steps:
The dataset is grouped by loan term (term) and the total loan amount (total_payment) is summed up for each term.
A pie chart is generated to show the proportion of total funded amounts by term.
Labels are formatted to display the percentage share of each term.
The pie chart’s wedge properties are customized for better aesthetics.
Chart Customization:
The labels on the pie chart display the percentage of each loan term’s funded amount.
The chart title is set to “Total Funded Amount by Term” for clear context.
Insights:
This chart helps identify which loan term (e.g., short-term vs long-term) contributes more to the total loan funding. It’s useful for understanding how the length of loan terms impacts the overall loan portfolio.
Objective:
This visualization shows the total amount received for each loan term, offering insights into how the received loan amounts vary by loan term.
Steps:
The dataset is grouped by loan term (term), and the total received amount (total_payment) is summed for each term.
A pie chart is used to visualize the total received amount by loan term, with percentages displayed on the chart.
A color map (tab20) is applied to differentiate the slices.
Chart Customization:
Pie chart labels show the percentage of the total received amount for each loan term.
The chart title is “Total Received Amount by Term,” providing a direct reference.
Insights:
This pie chart allows you to assess how much of the total received amount is associated with each loan term, helping to understand regional funding distribution and term-specific impacts.
Objective:
The third chart tracks the total number of loan applications submitted for each loan term, providing insights into demand based on loan term.
Steps:
The dataset is grouped by loan term (term), and the number of loan applications (id.count()) is calculated for each term.
A pie chart is generated to represent the total loan applications by term, with each slice showing the relative proportion.
Chart Customization:
The pie chart labels show the percentage of total loan applications by term.
The title of the chart is “Total Loan Application by Term.”
Insights:
This chart highlights the distribution of loan applications across different loan terms, helping to identify which loan term has the highest demand and might require more attention or resources.
Objective:
This visualization provides a breakdown of the total funded amount by the employee length category, offering insights into funding trends based on employee tenure.
Steps:
The dataset is grouped by employee length (emp_length), and the total loan amount (loan_amount) is summed up for each category.
A bar chart is created to show the total funded amount by employee length, with values displayed on top of each bar.
The total amount is scaled to millions for easier interpretation.
Chart Customization:
Labels are added to each bar, showing the exact funded amount in millions.
The x-axis is labeled “Employee Length” and the y-axis is labeled “Total Funded Amount (in Millions).”
Insights:
This chart helps assess the distribution of loan amounts across different employee length categories, helping identify trends or patterns in employee tenure and loan funding.
Objective:
This visualization shows the total funded amount based on loan purpose, providing insights into which loan purposes are receiving the most funding.
Steps:
The dataset is grouped by loan purpose (purpose), and the total funded amount (loan_amount) is summed for each purpose.
A bar chart is created to visualize the total funded amount for each loan purpose, with values annotated at the center of each bar.
Chart Customization:
Labels are added to each bar to show the funded amount.
The chart is labeled with “Purpose” on the x-axis and “Total Funded Amount (in Millions)” on the y-axis.
Insights:
This bar chart provides insights into which loan purposes are receiving the most funding, helping to identify key areas of loan activity (e.g., debt consolidation, home improvement, etc.).
Objective:
The goal of this visualization is to analyze the total loan amount distributed across different home ownership categories (e.g., homeowners vs renters). This insight helps understand how the loan amounts vary between those who own homes and those who do not.
Steps:
The dataset is grouped by home ownership status (home_ownership), and the total loan amount (loan_amount) is summed up for each category.
The summed loan amounts are then converted to millions for better readability.
A bar chart is generated where each bar represents a different home ownership status, and the height of the bar indicates the total loan amount for each group.
Chart Customization:
The bars are color-coded using a color palette (tab20), which is automatically applied to the bars.
The exact funded amount is annotated on top of each bar to display the value in millions.
The x-axis is labeled as “Home Ownership,” and the y-axis is labeled as “Total Funded Amount (in Millions).”
Insights:
This chart helps identify if there is a significant difference in loan funding between homeowners and non-homeowners. Understanding this distribution is valuable for evaluating the potential market for loan products based on home ownership status.
Loan Term Analysis: These visualizations help track how funding, applications, and received amounts are distributed across loan terms. This can inform strategic decisions for marketing or product offerings.
Employee Length Impact: Understanding how employee length impacts loan amounts can reveal lending patterns based on job stability or tenure.
Loan Purpose Insights: The loan purpose chart provides valuable data on which types of loans are most common, offering insights into the broader needs of the loan applicants.
The visualizations help stakeholders understand trends in loan distribution based on terms, employee length, and loan purposes. These insights can guide business decisions around loan offerings, marketing strategies, and targeting specific customer groups.
This project successfully achieved its goal of developing an automated, Python-based analytical solution to evaluate loan activity and generate meaningful insights for the organization. By leveraging powerful data processing and visualization tools such as pandas, NumPy, Matplotlib, and Seaborn, we were able to create a comprehensive pipeline that not only calculates essential key performance indicators (KPIs) but also generates clear, visually compelling reports that track critical loan metrics over time.
The creation of monthly trends for loan applications, funded amounts, and payments received, combined with the regional breakdown of loan activity, has provided the organization with a deeper understanding of both the temporal and geographic dynamics of its lending operations. These visualizations highlight key patterns, such as seasonal fluctuations in loan disbursements and repayment behaviors, as well as regional performance disparities that can guide strategic decision-making for targeted marketing and funding.
Moreover, the project’s focus on loan term, employee length, loan purpose, and home ownership as factors influencing loan amounts and applications has offered a nuanced understanding of the various demographic and behavioral aspects that affect loan activity. Insights into loan terms and home ownership reveal how lending strategies can be tailored to better serve different customer segments, while understanding the impact of employee length on loan amounts opens opportunities to refine credit risk assessments.
The automation of KPI calculation and report generation has significantly reduced the manual effort involved in generating loan performance reports, providing the organization with consistent, real-time insights that enhance operational efficiency. By ensuring data accuracy, scalability, and flexibility, the system is poised to integrate with future enhancements such as real-time data feeds or automated scheduling, ensuring the company remains agile in responding to evolving market conditions.
Ultimately, this project not only empowers stakeholders with actionable insights to improve lending strategies and risk management but also lays the groundwork for future data-driven initiatives that can enhance decision-making, improve customer targeting, and optimize loan offerings across diverse market segments.
Through this project, the organization now possesses a robust, reliable, and scalable analytics solution capable of delivering continuous, actionable insights into loan activity, helping the business to stay ahead of industry trends and respond proactively to changing financial dynamics.
Emmanuel Olorunda
Copyright © 2025 | All Rights Reserved
Every project starts with a chat, Emmanuel is keen to conversation and open to discussing your project. he will pull in the right expertise when needed