🚲Bike Buying Behavior: Excel Analysis and Insights

 🚲BIKE BUYER’S ANALYSIS WITH EXCEL

Hello, and welcome to my blog post titled “🚲 Bike Buyer’s Analysis With Excel”. In this post, I will show you how to use Excel to analyze the data from a bike store and find out what factors influence the customers’ decision to buy a bike. I will also share some insights and recommendations based on the
analysis. Let’s get started!

The Dataset

The data set I used for this analysis is from a fictional bike store called Adventure Works. The data contains information about 1,026 customers who visited the store in 2023, such as their demographic characteristics, purchase history, and commute distance. You can download the data set from [here].

The Analysis Step

The analysis consists of three steps:

Summarize the key statistics for each variable and how they affect the dependent variable, which is whether the customer purchased a bike or not.

Examine the correlation between each pair of variables in the data set.

Describe the distribution of the dependent variable and each categorical variable in the data set.

STEP 1: REGRESSION ANALYSIS

To summarize the key statistics for each variable and how they affect the dependent variable, I used Excel’s Data Analysis Tool Pak to run a logistic regression. The logistic regression is a statistical method that can model the relationship between a binary outcome variable (such as purchasing a bike) and one or more explanatory variables (such as income, age, gender, etc.). The output of the logistic regression is shown in the table below:

Variable

Coefficient

Standard Error

t-Statistic

p-Value

Confidence Interval

Home Owner

0.012

0.034

0.363

0.717

[-0.055, 0.080]

Marital Status

-0.120

0.033

-3.692

0.000

[-0.184, -0.056]

Gender

0.009

0.030

0.298

0.766

[-0.050, 0.068]

Income

0.000003

0.000001

5.205

0.000

[0.000002, 0.000004]

Cars

-0.109

0.015

-7.223

0.000

[-0.139, -0.080]

Children

-0.021

0.011

-1.838

0.066

[-0.043, 0.001]

Age

-0.002

0.002

-0.988

0.323

[-0.005, 0.002]

The table shows the coefficient, standard error, t-statistic, p-value, and confidence interval for each independent variable in the regression model.

The coefficient represents the change in the log-odds of purchasing a bike for a one-unit increase in the corresponding variable, holding all other variables constant.

The standard error measures the variability of the coefficient estimate.

The t-statistic is the ratio of the coefficient to the standard error.

The p-value is the probability of obtaining a coefficient as extreme as or more extreme than the observed one under the null hypothesis that the coefficient is zero.

The confidence interval is the range of values that contains the true coefficient with a certain level of confidence (usually 95%).

To interpret the results of the logistic regression, we can look at the sign, magnitude, and significance of each coefficient.

The sign indicates whether the variable has a positive or negative effect on the outcome variable.

The magnitude indicates how strong or weak the effect is.

The significance indicates how likely or unlikely it is that the effect is due to chance.

A variable is considered to be statistically significant if its p-value is less than a predetermined threshold (usually 0.05).



Based on these criteria, we can see that:

Home owner status has a positive but insignificant effect on purchasing a bike.

Marital status has a negative and significant effect on purchasing a bike.

Gender has a positive but insignificant effect on purchasing a bike.

Income has a positive and significant effect on purchasing a bike.

Number of cars has a negative and significant effect on purchasing a bike.

Number of children has a negative but insignificant effect on purchasing a bike.

Age has a negative but insignificant effect on purchasing a bike.

These results suggest that customers who are single, have higher income, and own fewer cars are more likely to buy a bike from Adventure Works.

STEP 2: CORRELATION ANALYSIS

The next step of the analysis is to examine the correlation between each pair of variables in the data set. Correlation measures the strength and direction of the linear relationship between two variables.

A correlation coefficient ranges from -1 to +1, where:

-1 indicates a perfect negative linear relationship

+1 indicates a perfect positive linear relationship

0 indicates no linear relationship

To calculate the correlation coefficients, I used Excel’s Tool pack correlation function and created a correlation matrix as shown below:

 

 

Home Owner

Marital Status

Gender

Income

Cars

Children

Age

Purchased Bike

Home Owner

1.00

Marital Status

0.32

1.00

Gender

0.01

0.09

1.00

Income

0.02

0.08

0.06

1.00

Cars

-0.08

-0.02

0.06

0.42

1.00

Children

0.16

0.15

0.00

0.26

0.26

1.00

Age

0.11

0.21

0.00

0.17

0.18

0.53

1.00

Purchased Bike

-0.02

-0.11

-0.01

0.04

-0.19

-0.12

-0.11

1.00


  • The correlation matrix shows the correlation coefficient for each pair of variables in the data set.
  • A high positive correlation indicates that the two variables tend to increase or decrease together.
  • A high negative correlation indicates that the two variables tend to move in opposite directions.
  • A low or zero correlation indicates that the two variables have no or weak linear relationship.
  • To interpret the results of the correlation analysis, we can look at the magnitude and sign of each coefficient.
  • A correlation coefficient is considered to be high if it is greater than or equal to 0.5 or less than or equal to -0.5.
  • A correlation coefficient is considered to be low if it is between -0.3 and 0.3.

Based on these criteria, we can see that:

  • Home owner status has a high positive correlation with marital status and a low positive correlation with children and age.
  • Marital status has a high positive correlation with age and a low negative correlation with purchasing a bike.
  • Gender has a low positive correlation with marital status and income and a low negative correlation with purchasing a bike.
  • Income has a high positive correlation with cars and a low positive correlation with children and purchasing a bike.
  • Cars has a high negative correlation with purchasing a bike and a low negative correlation with home owner status.
  • Children has a high positive correlation with age and a low negative correlation with purchasing a bike.
  • Age has a high positive correlation with marital status and children and a low negative correlation with purchasing a bike.

These results suggest that there are some multicollinearity issues in the data set, which means that some of the independent variables are highly correlated with each other.

Multicollinearity can cause problems for the logistic regression model, such as inflating the standard errors of the coefficients, reducing the reliability of the estimates, and making it difficult to assess the individual impact of each variable.

To address this issue, we can either remove some of the correlated variables from the model, or use other methods such as ridge regression or principal component analysis to reduce the dimensionality of the data.

STEP 3: DESCRIPTIVE ANALYSIS

The final step of the analysis is to describe the distribution of the dependent variable and each categorical variable in the data set.

To do this, I used Excel’s PivotTable and PivotChart tools to create frequency tables and charts for each variable.

The frequency table shows the number and percentage of customers in each category or level of the variable.

The frequency chart shows the same information in a graphical form, such as a bar chart or a pie chart.

THE DESCRIPTIVE STATISTICS FOR EACH VARIABLE ARE SHOWN BELOW:


Purchased Bike

Frequency

Percentage

Yes

495

48%

No

531

52%

Total

1026

100%




'image.png' failed to upload. TransportError: There was an error during the transport or processing of this request. Error code = 7, Path = /_/BloggerUi/data/batchexecute : Unknown HTTP error in underlying XHR (HTTP Status: 0) (XHR Error Code: 6) (XHR Error Message: ' [0]')


 The descriptive statistics show the distribution of customers across different categories or levels of each variable.

CATEGORY

CATEGORY LEVEL

FREQUENCY

% PURCHASED

MARITAL STATUS

Married

236

43%

Single

259

54%

GENDER

Female

243

49%

Male

252

48%

EDUCATION

Bachelors

169

54%

Graduate Degree

95

54%

High School

82

45%

Partial College

127

46%

Partial High School

22

28%

OCCUPATION

Clerical

95

51%

Management

73

42%

Manual

59

47%

Professional

150

54%

Skilled Manual

118

46%

COMMUTE DISTANCE

0-1 Miles

207

55%

1-2 Miles

83

47%

2-5 Miles

95

59%

5-10 Miles

77

39%

10+ Miles

33

29%

REGION

Europe

156

49%

North America

220

43%

Pacific

119

59%

HOME OWNER

No

166

50%

Yes

365

52%

To interpret the results of the descriptive analysis, we can compare the percentage of customers who purchased a bike in each category or level with the overall percentage of customers who purchased a bike (48%).

A higher percentage indicates that customers in that category or level are more likely to buy a bike than average.

A lower percentage indicates that customers in that category or level are less likely to buy a bike than average.

 



Based on these criteria, we can see that:

  • Customers who are single, have a bachelor’s or graduate degree, work as clerical or professional, commute less than five miles, live in the Pacific region, and are not home owners are more likely to buy a bike than average.
  • Customers who are married, have partial high school education, work as management or skilled manual, commute more than five miles, live in North America, and are home owners are less likely to buy a bike than average.

THE INSIGHTS AND RECOMMENDATIONS

Based on the analysis, we can draw some insights and recommendations for the bike store:

The store should target customers who are single, have higher income, and own fewer cars, as they         are more likely to buy a bike.

The store should offer discounts or incentives for customers who are married, have lower income,         and own more cars, as they are less likely to buy a bike.

The store should promote the benefits of biking for health, environment, and fun, as these factors            may appeal to customers who are older, have children, or live far from work.

The store should expand its market to the Pacific region, as it has the highest percentage of bike             buyers among the regions.

The store should not rely on home owner status, gender, or education as predictors of bike buying             behavior, as they have no significant effect on the outcome.

CONCLUSION

In this post, I showed you how to use Excel to analyze the data from a bike store and find out what factors influence the customers’ decision to buy a bike. I also shared some insights and recommendations based on the analysis. I hope you found this post useful and interesting. If you have any questions or feedback, please leave a comment below. Thank you for reading! 😊


Comments

Popular posts from this blog

Whimsical: Turning App Ideas into Visual Designs without Coding

Economic and Demographic Trends in Nigeria (1960-2022) Power BI

My Graduation from I4Gdatacamp 2023