🚲BIKE BUYER’S ANALYSIS WITH EXCEL

Hello, and welcome to my blog post titled “🚲 Bike Buyer’s Analysis With Excel”. In this post, I will show you how to use Excel to analyze the data from a bike store and find out what factors influence the customers’ decision to buy a bike. I will also share some insights and recommendations based on the
analysis. Let’s get started!

The Dataset

The data set I used for this analysis is from a fictional bike store called Adventure Works. The data contains information about 1,026 customers who visited the store in 2023, such as their demographic characteristics, purchase history, and commute distance. You can download the data set from [here].

The Analysis Step

The analysis consists of three steps:

• Summarize the key statistics for each variable and how they affect the dependent variable, which is whether the customer purchased a bike or not.

• Examine the correlation between each pair of variables in the data set.

• Describe the distribution of the dependent variable and each categorical variable in the data set.

STEP 1: REGRESSION ANALYSIS

To summarize the key statistics for each variable and how they affect the dependent variable, I used Excel’s Data Analysis Tool Pak to run a logistic regression. The logistic regression is a statistical method that can model the relationship between a binary outcome variable (such as purchasing a bike) and one or more explanatory variables (such as income, age, gender, etc.). The output of the logistic regression is shown in the table below:

Variable	Coefficient	Standard Error	t-Statistic	p-Value	Confidence Interval
Home Owner	0.012	0.034	0.363	0.717	[-0.055, 0.080]
Marital Status	-0.120	0.033	-3.692	0.000	[-0.184, -0.056]
Gender	0.009	0.030	0.298	0.766	[-0.050, 0.068]
Income	0.000003	0.000001	5.205	0.000	[0.000002, 0.000004]
Cars	-0.109	0.015	-7.223	0.000	[-0.139, -0.080]
Children	-0.021	0.011	-1.838	0.066	[-0.043, 0.001]
Age	-0.002	0.002	-0.988	0.323	[-0.005, 0.002]

 The table shows the coefficient, standard error, t-statistic, p-value, and confidence interval for each independent variable in the regression model.

 The coefficient represents the change in the log-odds of purchasing a bike for a one-unit increase in the corresponding variable, holding all other variables constant.

 The standard error measures the variability of the coefficient estimate.

 The t-statistic is the ratio of the coefficient to the standard error.

 The p-value is the probability of obtaining a coefficient as extreme as or more extreme than the observed one under the null hypothesis that the coefficient is zero.

 The confidence interval is the range of values that contains the true coefficient with a certain level of confidence (usually 95%).

 To interpret the results of the logistic regression, we can look at the sign, magnitude, and significance of each coefficient.

 The sign indicates whether the variable has a positive or negative effect on the outcome variable.

 The magnitude indicates how strong or weak the effect is.

 The significance indicates how likely or unlikely it is that the effect is due to chance.

 A variable is considered to be statistically significant if its p-value is less than a predetermined threshold (usually 0.05).

Based on these criteria, we can see that:

• Home owner status has a positive but insignificant effect on purchasing a bike.

• Marital status has a negative and significant effect on purchasing a bike.

• Gender has a positive but insignificant effect on purchasing a bike.

• Income has a positive and significant effect on purchasing a bike.

• Number of cars has a negative and significant effect on purchasing a bike.

• Number of children has a negative but insignificant effect on purchasing a bike.

• Age has a negative but insignificant effect on purchasing a bike.

These results suggest that customers who are single, have higher income, and own fewer cars are more likely to buy a bike from Adventure Works.

STEP 2: CORRELATION ANALYSIS

The next step of the analysis is to examine the correlation between each pair of variables in the data set. Correlation measures the strength and direction of the linear relationship between two variables.

A correlation coefficient ranges from -1 to +1, where:

 -1 indicates a perfect negative linear relationship

 +1 indicates a perfect positive linear relationship

 0 indicates no linear relationship

To calculate the correlation coefficients, I used Excel’s Tool pack correlation function and created a correlation matrix as shown below:

	Home Owner	Marital Status	Gender	Income	Cars	Children	Age	Purchased Bike
Home Owner	1.00
Marital Status	0.32	1.00
Gender	0.01	0.09	1.00
Income	0.02	0.08	0.06	1.00
Cars	-0.08	-0.02	0.06	0.42	1.00
Children	0.16	0.15	0.00	0.26	0.26	1.00
Age	0.11	0.21	0.00	0.17	0.18	0.53	1.00
Purchased Bike	-0.02	-0.11	-0.01	0.04	-0.19	-0.12	-0.11	1.00

The correlation matrix shows the correlation coefficient for each pair of variables in the data set.
A high positive correlation indicates that the two variables tend to increase or decrease together.
A high negative correlation indicates that the two variables tend to move in opposite directions.
A low or zero correlation indicates that the two variables have no or weak linear relationship.
To interpret the results of the correlation analysis, we can look at the magnitude and sign of each coefficient.
A correlation coefficient is considered to be high if it is greater than or equal to 0.5 or less than or equal to -0.5.
A correlation coefficient is considered to be low if it is between -0.3 and 0.3.

Based on these criteria, we can see that:

Home owner status has a high positive correlation with marital status and a low positive correlation with children and age.
Marital status has a high positive correlation with age and a low negative correlation with purchasing a bike.
Gender has a low positive correlation with marital status and income and a low negative correlation with purchasing a bike.
Income has a high positive correlation with cars and a low positive correlation with children and purchasing a bike.
Cars has a high negative correlation with purchasing a bike and a low negative correlation with home owner status.
Children has a high positive correlation with age and a low negative correlation with purchasing a bike.
Age has a high positive correlation with marital status and children and a low negative correlation with purchasing a bike.

These results suggest that there are some multicollinearity issues in the data set, which means that some of the independent variables are highly correlated with each other.

Multicollinearity can cause problems for the logistic regression model, such as inflating the standard errors of the coefficients, reducing the reliability of the estimates, and making it difficult to assess the individual impact of each variable.

To address this issue, we can either remove some of the correlated variables from the model, or use other methods such as ridge regression or principal component analysis to reduce the dimensionality of the data.

STEP 3: DESCRIPTIVE ANALYSIS

The final step of the analysis is to describe the distribution of the dependent variable and each categorical variable in the data set.

To do this, I used Excel’s PivotTable and PivotChart tools to create frequency tables and charts for each variable.

The frequency table shows the number and percentage of customers in each category or level of the variable.

The frequency chart shows the same information in a graphical form, such as a bar chart or a pie chart.

THE DESCRIPTIVE STATISTICS FOR EACH VARIABLE ARE SHOWN BELOW:

Purchased Bike	Frequency	Percentage
Yes	495	48%
No	531	52%
Total	1026	100%

The descriptive statistics show the distribution of customers across different categories or levels of each variable.

CATEGORY	CATEGORY LEVEL	FREQUENCY	% PURCHASED
MARITAL STATUS	Married	236	43%
	Single	259	54%
GENDER	Female	243	49%
	Male	252	48%
EDUCATION	Bachelors	169	54%
	Graduate Degree	95	54%
	High School	82	45%
	Partial College	127	46%
	Partial High School	22	28%
OCCUPATION	Clerical	95	51%
	Management	73	42%
	Manual	59	47%
	Professional	150	54%
	Skilled Manual	118	46%
COMMUTE DISTANCE	0-1 Miles	207	55%
	1-2 Miles	83	47%
	2-5 Miles	95	59%
	5-10 Miles	77	39%
	10+ Miles	33	29%
REGION	Europe	156	49%
	North America	220	43%
	Pacific	119	59%
HOME OWNER	No	166	50%
	Yes	365	52%

To interpret the results of the descriptive analysis, we can compare the percentage of customers who purchased a bike in each category or level with the overall percentage of customers who purchased a bike (48%).

A higher percentage indicates that customers in that category or level are more likely to buy a bike than average.

A lower percentage indicates that customers in that category or level are less likely to buy a bike than average.

Based on these criteria, we can see that:

Customers who are single, have a bachelor’s or graduate degree, work as clerical or professional, commute less than five miles, live in the Pacific region, and are not home owners are more likely to buy a bike than average.
Customers who are married, have partial high school education, work as management or skilled manual, commute more than five miles, live in North America, and are home owners are less likely to buy a bike than average.

THE INSIGHTS AND RECOMMENDATIONS

Based on the analysis, we can draw some insights and recommendations for the bike store:

• The store should target customers who are single, have higher income, and own fewer cars, as they are more likely to buy a bike.

• The store should offer discounts or incentives for customers who are married, have lower income, and own more cars, as they are less likely to buy a bike.

• The store should promote the benefits of biking for health, environment, and fun, as these factors may appeal to customers who are older, have children, or live far from work.

• The store should expand its market to the Pacific region, as it has the highest percentage of bike buyers among the regions.

• The store should not rely on home owner status, gender, or education as predictors of bike buying behavior, as they have no significant effect on the outcome.

CONCLUSION

In this post, I showed you how to use Excel to analyze the data from a bike store and find out what factors influence the customers’ decision to buy a bike. I also shared some insights and recommendations based on the analysis. I hope you found this post useful and interesting. If you have any questions or feedback, please leave a comment below. Thank you for reading! 😊

Search This Blog

Maristat

🚲Bike Buying Behavior: Excel Analysis and Insights

🚲BIKE BUYER’S ANALYSIS WITH EXCEL

The Dataset

The Analysis Step

STEP 1: REGRESSION ANALYSIS

Based on these criteria, we can see that:

STEP 2: CORRELATION ANALYSIS

Based on these criteria, we can see that:

STEP 3: DESCRIPTIVE ANALYSIS

THE DESCRIPTIVE STATISTICS FOR EACH VARIABLE ARE SHOWN BELOW:

Based on these criteria, we can see that:

THE INSIGHTS AND RECOMMENDATIONS

CONCLUSION

Comments

Post a Comment

Popular posts from this blog

Whimsical: Turning App Ideas into Visual Designs without Coding

Economic and Demographic Trends in Nigeria (1960-2022) Power BI

My Graduation from I4Gdatacamp 2023