In today’s highly competitive business environment, informed decision-making is essential for long-term success. Employee retention and job satisfaction are critical issues for any organization as they directly impact productivity, morale, and profitability.
With the aim of reducing employee turnover and enhancing job satisfaction, ABC Corporation has hired us to undertake a powerful data analysis and A/B testing project. Our mission is to identify key factors influencing job satisfaction and, ultimately, employee retention.
In this project, we will present the results of our exploratory data analysis, design an A/B experiment to test critical hypotheses, and analyze the results to provide ABC Corporation with valuable insights informing their strategic decisions.
Scenario
Company Overview
ABC Corporation, founded in 1980 in California, is a technology consultancy specializing in providing artificial intelligence (AI) and machine learning solutions to businesses across various sectors. The company’s primary focus is on automating and optimizing business processes through cutting-edge technologies.
The company distinguishes itself by having a multidisciplinary team comprising experts in UX/UI, marketing, analysts and other relevant fields. This diversity allows for a unique synergy between specialized technical knowledge and varied perspectives, enabling them to deliver customized solutions tailored to the individual needs of each client.
The latest project undertaken by the company involved the optimization of personnel selection processes. They developed an intelligent selection platform where employees can automatically analyze CVs of potential candidates, identify their key skills, and finally classify candidates based on their suitability for specific roles. Additionally, they created a recommendation system to suggest the best candidates to recruiters.
Data Source
The data has been collected internally by the company and shared with us in a CSV file.
Structure
The file comprises 1614 rows of information about each employee of the company, including those who left, and 41 columns with the following information:
- Age: The age of the employee.
- Attrition: Indicates whether the employee has left the company.
- BusinessTravel: Describes the frequency of work-related travel for the employee (e.g., “Travel_Rarely” for rarely).
- DailyRate: The daily rate of the employee.
- Department: The department in which the employee works (e.g., “Research & Development,” “Sales,” etc.).
- DistanceFromHome: The distance from the employee’s home to their workplace.
- Education: The education level of the employee (usually on a scale from 1 to 5).
- EducationField: The field of education of the employee.
- EmployeeCount: A counter usually set to 1 and used to count employees.
- EmployeeNumber: A unique identification number for the employee.
- EnvironmentSatisfaction: The employee’s satisfaction level regarding their work environment, with values ranging from 1 to 4, where 4 is the highest level of satisfaction.
- Gender: The gender of the employee, where 0 corresponds to “male” and 1 corresponds to “female”.
- HourlyRate: The hourly rate of the employee.
- JobInvolvement: The level of involvement of the employee in their work.
- JobLevel: The hierarchical level of the employee in the company.
- JobRole: The role or job position of the employee.
- JobSatisfaction: The employee’s satisfaction level with their job.
- MaritalStatus: The marital status of the employee (e.g., “Single,” “Married,” etc.).
- MonthlyIncome: The monthly income of the employee.
- MonthlyRate: The monthly rate of the employee.
- NumCompaniesWorked: The number of companies the employee has worked for.
- Over18: Indicates whether the employee is over 18 years old.
- OverTime: Indicates whether the employee works overtime.
- PercentSalaryHike: The percentage of salary increase for the employee.
- PerformanceRating: The performance rating of the employee.
- RelationshipSatisfaction: The level of satisfaction in the employee’s interpersonal relationships.
- StandardHours: The standard working hours.
- StockOptionLevel: The level of stock options for the employee.
- TotalWorkingYears: The total number of years of work experience for the employee.
- TrainingTimesLastYear: The number of times the employee received training in the last year.
- WorkLifeBalance: The balance between work and personal life for the employee.
- YearsAtCompany: The number of years the employee has worked in the current company.
- YearsInCurrentRole: The number of years the employee has been in their current role.
- YearsSinceLastPromotion: The number of years since the employee’s last promotion.
- YearsWithCurrManager: The number of years the employee has been under the supervision of the current manager.
- SameAsMonthlyIncome: The monthly income of the employee.
- DateBirth: The year of birth of the employee (considering that the data was collected in 2023).
- Salary: The salary of the employees.
- RoleDepartament: The department and role of the employee.
- NumberChildren: The number of children of the employees.
- RemoteWork: Whether the employee can work remotely or not.
Phase 1: Exploratory Data Analysis (EDA)
Before conducting the project, A/B testing and formulating hypotheses, it is crucial to better understand the dataset and its characteristics. A detailed exploratory analysis of the dataset was conducted to familiarize ourselves with the information and understand what data we had.
Phase 2: Data Transformation
In this phase, we focused on refining and enhancing the quality of the extracted data through a series of meticulous transformations. The key steps undertaken include:
- Column Standardization:
- Transformed all column names to lowercase and snake case, ensuring uniformity and simplifying further data handling processes.
- Data Type Uniformity:
- Standardized values across columns to ensure consistent data types. For instance, mapping binary values (0 and 1) in the ‘Gender’ column to ‘Male’ and ‘Female’ or converting responses in the ‘remotework’ column to ‘Yes’ or ‘No’ for clarity.
- Typographical Error Correction:
- Addressed typographical errors in categorical columns to maintain accuracy and integrity in the dataset.
- Duplicate Removal:
- Implemented measures to identify and eliminate duplicate records, enhancing data integrity.
- Column-Specific Data Type Conversion:
- Modified data types of specific columns to align with the analysis requirements and improve overall consistency.
- Null Value Handling:
- Evaluated and managed null values using various techniques. Some were replaced with the mode, others with “Unknown” while others were imputed using the k-nearest neighbors (KNN) algorithm, ensuring a comprehensive approach to missing data.
- Inconsistent Value Resolution:
- Identified and addressed inconsistent values, such as negative distances in the ‘DistanceFromHome’ column, to enhance the reliability of the data.
These transformations were implemented using Python functions tailored to the unique characteristics of the dataset. The result is a refined and standardized dataset, laying a solid foundation for subsequent analytical processes.
Phase 3: Database Design and Data Insertion
In this pivotal phase, the objective is to meticulously design the database structure, create the database and insert the refined data. The key components of this phase include:
- Database Structure Design:
- Carefully designed the database structure by identifying essential tables and establishing their relationships. This involved defining primary and foreign keys to ensure data integrity and facilitate efficient querying.
- Data Transformation for Database Insertion:
- Transformed the processed data into a format suitable for database insertion. This included modifying the CSV data into a list of tuples, aligning it with the structure of the designed database tables.
- Database Creation:
- Utilized Python and SQL tools to create the database, establishing connections to facilitate smooth data flow between the application and the database. This step ensures a robust foundation for storing and retrieving information.
- Table Creation and Key Definitions:
- Implemented the designed structure by creating tables within the database. Defined primary and foreign keys to establish relationships and maintain data consistency.
- Insertion of Transformed Data:
- Executed the insertion of meticulously transformed employee data into the respective tables of the database. This step completes the integration of the cleaned and standardized data into the newly created database.
This comprehensive approach not only ensures the establishment of a well-structured database but also guarantees the successful insertion of data, setting the stage for subsequent data analysis and retrieval processes.
With the completion of this ETL process, data has seamlessly transitioned from its source through meticulous transformations, culminating in its insertion into a well-structured database. This organized flow not only ensures the integrity and reliability of the data but also sets the stage for robust analytical processes and insights.




Phase 4: Data Analysis
- Data Analysis with Visualizations in PythonIn this section, the goal is to conduct a detailed analysis of the data by leveraging Python visualizations, enhancing the interpretation of patterns and trends in job satisfaction and employee turnover.
- Utilize Python libraries such as Matplotlib and Seaborn to create a variety of visualizations, in order to better observe differences.
- Analyze visualizations to explore potential relationships between job satisfaction levels and employee turnover. Look for patterns, trends and potential insights that may inform the subsequent A/B testing phase.
- A/B Testing for Job Satisfaction and Employee TurnoverIn this crucial phase, the primary goal is to investigate the relationship between job satisfaction and employee turnover, guided by the hypothesis that lower job satisfaction levels are significantly associated with a higher likelihood of employee departure. The structured steps for this analysis are as follows:
- Group Division: Categorize employees into two distinct groups – Group A “Satisfied” and Group B “Not Satisfied,” based on predetermined criteria (satisfaction rating >= 3).
- Turnover Rate Calculation: Independently calculate the employee turnover rate for each group, providing insights into departure patterns among satisfied and not satisfied employees.
- Statistical Analysis: Perform a robust statistical analysis using the chi-square test to determine if there is a statistically significant difference in turnover rates between Group A and Group B.
- Results Analysis: Thoroughly analyze the obtained results, focusing on patterns, trends, and statistical significance. Quantify the magnitude of the observed relationship using statistical measures such as the chi-square test, providing a numerical understanding of the substantial impact of job satisfaction on employee turnover.
This meticulous approach ensures not only a thorough exploration of the relationships but also places a strong emphasis on statistically significant findings, reinforcing the reliability and importance of the results in informing decision-making processes.
Phase 5: Reporting and Dashboard Creation with Power BI
In this reporting phase, the objective is to convey the findings effectively through visualizations and dashboards created using Power BI
- Data Preparation: Ensure that the data is formatted appropriately for Power BI, addressing any necessary transformations.
- Power BI Dashboard Design: Create a dashboard that encapsulates key insights from the A/B testing and visual analysis, including relevant visualizations to present the data in a compelling and easily understandable manner.
- Insightful Storytelling: Craft a narrative within the dashboard, guiding viewers through the story of the analysis. Clearly communicate the significance of the A/B testing results and visual findings.
- Sharing and Distribution: Share the dashboard with relevant stakeholders, ensuring accessibility and ease of understanding. Provide any necessary documentation to aid interpretation.
Conclusions
The key findings of the study were:
- There is overall positive job satisfaction.
- There is a significant relationship between job dissatisfaction and quit rate. .
- The quit rate is even higher when salaries are low. .
- Improve working conditions and pay, particularly for those employees who want to be retained. .