Introduction to Quantitative Research for Business Students

Sean van der Merwe

Introduction

Overview

These slides discuss the absolute basics of research, with a focus on numbers
We will discuss the process of going from Research problems \(\rightarrow\) Research questions \(\rightarrow\) Collecting data that can be turned into numbers \(\rightarrow\) Making sense of the numbers

While these slides are draw up and presented by Sean van der Merwe of the UFS Statistical Consultation Unit, they are inspired by slides of Dudu Ndlovu, previously with the UFS Faculty of Economic and Management Sciences.

Slides

Short link to this presentation: seanvdm.co.za/quantitative
Link to download slides if needed
I recommend you click the first link if you have a second screen to view the slides in maximum quality and interact at your pace
- Press m for table of contents
- o for overview
- f for full screen
- c to draw

Scientific method (refresh if not showing)

Wikipedia Sciencebuddies

Where does “quantitative” fit in?

Quantitative research is where the results of the experiment are turned into numbers, one way or another
Things that are in categories that cannot be ordered (nominal) can only be counted
Things that are in ordered categories (ordinal) can be counted or put on an approximate scale
- This means you can (roughly) see the spread of observations
Things that are measured directly as numbers are always the most objective and powerful to work with
- With direct measurements you can directly quantify the uncertainty, leading to less false confidence and talking of nonsense

The more detailed your scale the better. A 4 level Likert scale is going to hurt you in the long run, just ask them to rate their agreement on a scale of 0 to 10 instead.

The role of statistics

Obvious data is obvious
- If you can clearly see that two things are the same, or two things are different as night and day, then you don’t need fancy statistics
- Still, it is your responsibility to show others what convinced you, don’t just tell
Statistics comes in with the grey areas
A statistician’s primary job is understanding and quantifying uncertainty
- We help you to separate the patterns from the coincidences

Ask the right questions

Everything starts with asking the right questions
Statistics isn’t magic, you can’t use data you didn’t collect to answer questions you didn’t ask
Student research questions are usually too vague
- Big questions are exciting and motivating, but seldom have simple answers
- If you want simple and convincing answers then you need to ask very specific questions
- So before you even think about collecting data you need to break down your research question into tiny pieces

Deep understanding

Asking the right questions requires a deep understanding of the field and problem
Your research is not completely novel
- Other people have asked similar questions and you can learn from them
- Yes, there are aspects of your problem that are unique to you, but those are always less than you think
- Don’t reinvent the wheel

Read, read, listen, watch, discuss, and read some more.

Research types

Research aims to research questions

Research questions often fall in one of these categories

Description: What is the current situation? What do people think about it? Where is it the worst? Who is doing something? How was it addressed in the past? Are there obvious patterns?
Explanation: Why did this happen? What is the relationship between these things? What might be behind these patterns?
Prediction: What is to be expected? How sure are we about an outcome?
Intervention: If A changes, what happens to B?

Research design

Important factors to consider in research design include

Strategy: Controlled experiment, quasi-experiment, survey, or exploration?
Setting: Workplace, public, personal/private, lab, or simulation?
Participants: What is the population being studied, and how is the sample selected?
Variables: What are all the variables that could affect your outcome that you need to measure?

Causal inference

One of the most powerful conclusions that can come from research is when you can say that A causes B. Causal inference is easily conveyed and generally actionable, so highly desired. Unfortunately, it is very difficult to get right.

To be able to say that A causes B you must show that:

B happens after A happens \((A\Rightarrow B)\)
- time ordering
B doesn’t happen if A doesn’t happen \((\neg A \Rightarrow \neg B)\)
- it isn’t just happening anyway

If you don’t have both of these conditions then don’t use the word ‘causes’.

Experiments

We can create the conditions for causal inference by finding groups that are statistically identical in every way that is related to the outcome, except for the treatment. We look at the change in the groups over time and then study the differences in changes between groups.

Example: you have an online storefront and are considering two layouts. You can implement both layouts and then each time a new person comes to the website they get one of the layouts at random. After a month you see which layout resulted in more purchases on average. Since the assignment is completely random, the two groups will tend to be similar, and you can confidently say that the difference in purchases is due to the layout because it logically can’t be anything else.

This is known as A/B testing and is statistically similar to a medical trial.

Quasi-experiments

Counter example: you also have a physical storefront and need to downscale one of the channels. You can’t just look at purchases/profit total or purchases/profit per customer, because there are many critical systematic differences between customers that walk into a physical store and customers that buy online.

To answer this question you must first seek to understand the differences between the groups and their motivations, say via questionnaires, interviews, or focus groups.

Then you must selectively match people that are similar in background and ask the questions, “What would this person have done had they used the other channel? Would they have spent more or less? Would they have generated more profit?” This is incredibly difficult to do right and requires a very large sample.

Predictions

Another school of thought says that if you can accurately predict the results of your actions then you don’t need to fully understand the details of the processes. While technically true, it is also the easiest road to massive mistakes.

To properly make useful predictions you must:

Collect all the relevant information in an accurate and systematic way
Build a predictive model (simplification of reality) on a subset of the data
Make predictions on a different subset of the data that you have but that the model hasn’t seen, in order to get an idea of how well the model will work in future
Make predictions into the future and hope that the world doesn’t change overnight (as has been happening often lately)

Getting data from people

While it would be nice to always know exactly what causes what and thus make definitive and unquestionable business decisions with known outcomes, in practice we need to make do with what is practical. Getting data from people can lead to useful insights, but we must remember that we are getting perceptions and opinions, not a true reflection of reality.

Focus groups: these require skilled facilitators to lead them through all the topics that need to be discussed without getting derailed, and yet allowing enough freedom for brainstorming and insightful discussions that don’t just say what you want to hear
Interviews: these require a lot of time to execute, and then the responses need to be analysed too. Usually a thematic analysis is required. While the UFS does have software to help you with this, you still need to do it yourself to a large extent
Questionnaires: these are very popular but vary greatly in quality.

Questionnaires

Pro: You can ask what you want to know.
- Con: People will misunderstand your question.
Pro: Can mix qualitative and quantitative approaches.
- Con: Connecting the two is very difficult.
Pro: you can use an existing, tested, questionnaire.
- Con: existing questionnaires aren’t localised and may no longer be valid if adjusted.
Pro: you can use online survey tools to make data collection, entry, and analysis much easier.
- Con: if your study population includes people with limited internet access then you can bias results to those with better internet and technical knowledge

Survey tools

Even if you are using interviewers or data capturers or have to start on paper for some reason, you must still create an online version and use it to capture the data, it will massively improve the quality of your results.
Do not type data into Excel by hand, you will make mistakes, sometimes thousands of them.
Do not code your questionnaire data. Statistical software can do coding or recoding better and faster than you can, and it does not make data entry faster (if you are using an online survey tool).
Most survey tools allow you to download the data in various formats, but in all cases the layout of the data is in the correct form for statistical analysis already. Anything you do to the data is probably making it worse, so don’t modify it.
- To add information or play around first make extra sheets or copies in the data workbook and add it there.

Reliability and validity

There are lots of types of reliability and validity
The easiest one to check is congeneric reliability
- Are the respondents answering in a consistent way?
- Are related questions getting related responses?
- Cronbach \(\alpha\) between 0.7 and 0.95?
But validity is actually more important and cannot be established from a survey alone
- Validity can be supported by finding other information sources that agree with your survey results

Sampling

Representative sample

Regardless of anything else about the study, if you want to do inference then you must have a representative sample.

Very rarely is research purely deductive: where you look at a specific thing in its entirety from all angles and draw conclusions about that thing alone. Examples include mathematical theorems, or looking at the accounting history of a specific company only and making deductions about that company’s history only (say that they paid too little tax).

Almost all research is inferential, where you look at a sample (small picture) and based on that you say something about the population (big picture). This includes talking about the future based on the past, or talking to a few people and extrapolating to many people.

If there are systematic differences between your sample and your population then your inferences will be wrong, no matter how much effort you put into everything else you do.

Selection and biases

Throughout your analysis, you must remember at all times that you are making judgements on the perceptions of people such as those who respond to your survey. The survey will only be presented to a specific group of people, and we should not try to extend the results beyond that group without seriously considering and accounting for any systematic differences between that group and any broader group.

City people and rural people are not the same
Free State people and Gauteng people are not the same
People who choose to respond to surveys may be systematically different from the general population. See the Wikipedia page on Response Bias for more information.

Examples of biases

If you are only talking to people in Bloemfontein, then be very careful to not talk about people in South Africa in general, stick to talking about Bloemfontein people.

If you are going to hang out at a mall and talk to passers by then that is called convenience sampling. This is not a simple random sample.

You can improve your chances of getting a representative sample by taking pro-active steps. For example, make sure you ask different types of people, move to different parts of the mall, and go to at least 3 malls in different suburbs with different demographics.

Unexplained variation

The world is full of unexplained variation and all sorts of uncertainty.
If you repeat a survey with a new group of people then you will get a different set of responses.
If you repeat a survey with the same group of people then you will get a different set of responses!
Nothing stays fixed over a long time anyway.

The best you can do explain all the variation you can, report what is left, and hope it stays stable for a while.

Tips

Tips for questionnaires

Get to the point - respondent fatigue is real
- No questions that are not directly related to research questions
  - Make a spreadsheet where you link each questionnaire question formally to one or more research questions and include that in your proposal
Don’t ask the same thing multiple times
Privacy matters: Don’t ask for names, emails, addresses
- Don’t ask so much detail that data becomes re-identifiable, e.g. don’t ask for department name in company
People usually answer for free, so be grateful and show them respect
- Time is money, don’t make it longer than necessary
- And do a proper spelling and grammar check
  - Never use ALLCAPS, it is not spell checked and more difficult to read

Tips for asking questions

Make items clear
- Simple language that your parents will understand
- No jargon at all
  - If you must use a technical term then define it clearly in advance
- Other people have not studied what you studied and don’t have your background (even if they are in your industry/company)
No double-barrelled questions
Keep questions short
Don’t put your own biases into your questions, and no leading questions
Don’t ask questions that people would not want to answer
Don’t ask questions that your respondents won’t know the answer to

Scales

Questions with a scale response are easier and faster to answer
More scale detail is better
- I recommend 5, 7, or 11 levels:
  - Strongly disagree, Disagree, Slightly Disagree, Neutral, Slightly Agree, Agree, Strongly Agree
  - 0 to 10
Phrase all questions positively
- Don’t try to catch people out, you will only catch yourself out when you try to interpret the responses
Stick to one scale if you can, maximum two for a whole questionnaire

Demographics

Demographic questions are things like gender, area, language, and other background information
They serve two purposes:
- You want to see the same demographic spread in your respondents as you have in your target population
- You might want to see if a particular demographic affects the other responses
Never ask a demographic question unless it is specifically for one of these reasons
- Formally state your reasons for asking each demographic question
- Asking too many questions like this can trigger POPIA, even if you think your survey is anonymous!

Question examples, good and bad

Nominal

Nominal questions are those with no ordering in the categories. Demographic questions are often nominal.

“Gender” is the most popular question in surveys
- Be inclusive but not too specific: Female, Male, Other is actually better than a long list in many ways
“Have you ever purchased Product X? (Yes/No)” is nominal
“Province” is nominal, even if you put it in alphabetical order
Do not ask questions with too many categories (like “Department”) as it poses a risk and serves no real purpose
- If it does matter in your study then reduce the categories to a minimum, ideally so that you have a bunch of respondents in each category

Ordinal

Ordinal questions are those with ordered categories. These are often things that have an underlying scale but are better to measure in steps.

“Income level” is better measured in big intervals
- “Monthly net income: R0 to R10k; R10k to R20k; R20k to R30k; R30k to R40k; R40k+”
- Always be clear and specific when asking this question: gross vs net, monthly vs annual
- Keep the left and right intervals big, otherwise those people won’t answer
  - Low income people are sometimes embarrassed and high income people often want to hide
Age should also be asked in categories to protect privacy
Education level is a natural ordinal variable, but again you should use big categories
- Perhaps “Some school, Matric, Certificate or Diploma, Degree, Postgraduate Degree”

Other ordinal scales commonly used

Strongly Disagree, Disagree, Neutral, Strongly Agree, Agree
Poor, Fair, Good, Very Good, Excellent
Never, Almost Never, Sometimes, Fairly Often, Very Often, Always
Completely Dissatisfied, Dissatisfied, Neutral, Satisfied, Completely Satisfied

Stick to 1 or 2 scales for a survey and don’t have them in between each other.

Ask yourself, “Does a neutral level on my scale make sense for this question?”

Integer and real scales

Real scales are theoretically the best for analysis
But people seldom know exact numbers
Instead, we usually reserve this for things we can measure with tools
- Thermometer, pH meter, weight scale, satellite image
Exception: if you only need an approximate answer
- e.g. “How many times do you go to the gym per month?”

Clarity

Poor: “How do you feel about building an arena in Botshabelo where the railroad property has been sitting unused for a number of years?”
Better: “An arena should be built on the railroad property in Botshabelo.”
- Strongly Disagree, Disagree, Neutral, Strongly Agree, Agree
Poor: “How often do you punish your toddler?”
Better: “How often do you put your toddler into timeout? Choose only one.”
- Never; Once a day; Several times a day; Once a week; Several times a week; Once a month; Several times a month

Be concrete

Poor: “Did you enjoy the book?”
Better: “Have you recommended the book to anyone else?”

Use mutually exclusive and exhaustive categories

Poor: “What is your marital status? Married; Single”
Better: “What is your marital status? Married; Other Union/Partnership; Divorced; Separated; Widowed; Never Married”

Don’t do branching and skipping

The following may seem like a good way to get lots of information from “one” question:

Do you participate in sports?

1 = No; 2 = Yes

If No, Go to Question 3

If Yes, choose all sports that apply from the following: Football; Volleyball; Basketball; Soccer; Swimming; Other (Specify_________)

But it is not feasible to analyse quantitatively at all.
It really only results in missing information, empty space, and biased/misleading interpretations.
This is actually 7 questions, 6 of which will be mostly non-response

More tips

Group questions by topic and type
- Put Likert scale questions in a block
Get peer evaluation
- Have someone you know pretend to answer the survey and time them to know how long it really takes, then add a few minutes
- Ask them how they understood each question. Does it match what you were thinking?
Think of all the ways people can misunderstand your questions, because they will, then see if you can make the questions even more clear

Do not use/ask

information unless you can act on it (or relay it to someone who can)
double barrelled questions (one thought per question)
biased words/phrases
Negative wording
Abbreviations
Jargon or technical terms
Slang
Ambiguous phrases

Permission and clearance

Don’t collect any data or even talk to potential respondents about the study until you have permission and clearance
Permission is not given retroactively!
Check in advance whether you need ethical clearance
- If so, get help with filling in the forms
- Don’t assume your supervisor will check or correct anything (they are often not ethics experts)
- Common mistakes include:
  - Not uploading everything, including: proposal, interview guide, questionnaire, recruitment email, consent forms, CVs
  - Not acknowledging that your participants are people and need their time and privacy respected

Descriptive statistics

Counts and proportions

The simplest way to summarise information is counting. If we count things in categories, then we get frequencies. If we divide by the total count then we get relative frequencies (proportions).

Suppose we asked 20 students on campus how far they want to study:

Bachelors	Honours	Masters	PhD
6	10	3	1

Cross-tables

Suppose we wanted to look at the interaction of two responses. We could count every combination of categories.

Gender	Bachelors	Honours	Masters	PhD
Female	2	6	2	0
Male	4	4	1	1

If we wanted proportions we need to ask whether we want

Row proportions - In this case proportions among genders
Column proportions - In this case proportions among degrees
Overall proportions - Proportions among all respondents

Measures of location and scale

When we measure something directly, such as rainfall, then we can summarise it with descriptive statistics. Typical examples include:

Count (n): The number of valid observations in that column of data
Average (mean, \(\bar{x}\)): The arithmetic average value
Standard deviation (s, sd, StdDev, \(\hat{\sigma}\)): A measure of how far the observations deviate from the average, on average.
- It is based on square differences so bigger differences carry a lot more weight.
Median (\(p_{50},q_{0.5}\)): The middle value after sorting the observations from small to large
Q1 and Q3 (\(p_{25}\) and \(p_{75}\)): After sorting the observations from small to large, the values 1 quarter and 3 quarters of the way through. The middle half the values will be between these two. The distance between them is called the inter-quartile range (IQR).
MAD: The median absolute deviation, or how far the observations are from the median in general

Plots

Plots allow us to put a lot of information in a small space, and allows the reader to draw their own conclusions.

Bar plots are a great example. For every category they show the median (middle line), quartiles (box), robust range (whiskers), and extreme values (dots).

Statistical hypothesis testing

The boringness assumption

The classical (frequentist) framework of testing has many flaws and alternatives, but remains popular because of its simplicity
It starts by assuming that everything is boring
- This is called the null hypothesis
We assume that there is no difference between cases, treatments, situations, or groups
We assume that there is no change over time, nor space

The uncertain experiment

We do an experiment and see something that looks interesting
No experiment is perfect, there are always errors and uncertainties
So we ask the question, “How interesting is this thing really?”
- Is this just a coincidence?
- If we did the experiment again, would the result be as interesting? More interesting?
- Or would it be less interesting or even entirely different?

The p-value

The p-value is defined as the probability of seeing something at least as interesting as what we observed, under the assumption that nothing is really interesting in general.
If the p-value is large then we continue to assume that the null (boring) hypothesis is true. We DO NOT accept or conclude that the null hypothesis is true, that would be wrong. We simply say that we have not found any evidence that contradicts it.
If the p-value is small then we reject our null hypothesis and conclude that there is at least one interesting thing to see.
The probability is calculated by saying, “What might have happened had we repeated the experiment under the same settings an infinite number of times?”

Significance level

So what is big and what is small for a p-value?
Before doing any experiments or collecting any data we pick a number close to 0 and say that we are happy with that rate of false positives.
We call this number the significance level \((\alpha)\).
Typical values might be 0.05 (5%) or 0.01 (1%).
If you haven’t done a statistical test then don’t use the word significant.

Multiple testing problem

The more things you try to test the more likely you are to stumble on an ‘interesting’ coincidence and end up talking nonsense
People who don’t know what they want to know ask a ton of questions in the hope of getting ‘lucky’ and finding out something interesting, but chances are they are just finding coincidences
In statistics we often say we are happy with a 5% rate of false positives, but that assumes you are doing 1 test - if you do 40 tests then you expect to get 2 false positives
It is possible to adjust for this in some ways, but you always lose power.
- The most common adjustment is the Holm-Bonferroni method

Intervals

A better way to express uncertainty is via intervals, but the goal is still to say something about the general population based on the sample you took

A 95% confidence interval for the average says

If we were to repeat the experiment a large number of times then 95% of the time the true population average would be in the interval generated.

A 95% credibility interval says

Based on the 1 sample we observed and our prior knowledge, we are 95% certain that the population average is in this interval

A 95% prediction interval says

95% of the population values are predicted to be inside this interval

Intervals to tests

While confidence intervals are the least intuitive, they are the most common and popular and the ones you will work with
Confidence intervals are connected to statistical hypothesis tests as follows:
- If the null hypothesis falls inside the interval we fail to reject it
- If the null hypothesis is outside the interval we reject it and make a conclusion
Only if the null hypothesis is rejected do we try to interpret the result
- Otherwise we run too much risk of reading into things that might be coincidence

Interval example

Suppose you talk to people before and after a workshop and ask them their opinion of something on a scale of 0 to 10

The best way to analysis the responses is the person by person difference
- Much more powerful than just comparing averages, but requires identifying people
Suppose the 95% interval for the average difference (after - before) is (0.12; 1.97)
- The null hypothesis would be no difference, so 0
- Thus, we reject the null hypothesis and look at the direction of the difference
  - In this case the opinion is higher after the workshop

Power analysis

If you get a large p-value and want to conclude that things are really similar then you need to show that had there been a difference then you would have detected it
This must be done before analysing the data to be valid
Requires that you quantify what you expect to see, as well as the variation you expect to see, before you start seeing it
For standard statistics there are handy power calculators online
- There are also bad ones though. If a power calculator asks for the population size then it is a bad one and you need to get rid of it!

Specific statistical tests

t-Test

The t-test is all about averages
The most popular statistical test for lots of reasons:
- People like to think in averages and ask whether things are the same on average
- It is easy to do and fairly reliable
- It flows from a Central Limit Theorem (and other theorems) that says:
  - If your numbers follow any well-behaved pattern, then the average of a large enough sample will tend towards a normal distribution

Normal distribution

A normal distribution means that your numbers are roughly symmetric and bell shaped.

If this holds then the t-test works at its best and tells you about how your sample average relates to the population average.

t-Test example

Suppose we map a Likert scale question as follows:

And get this summary:

Count	Average	StdDev	Question
441	3.841	1.01	I am comfortable with my reading speed

If people were answering this question at random then we would expect the average to be 3. If we take the standard deviation \((s=1.01)\) and divide by the square root of the number of observations \((\sqrt{441}=21)\) then we get the standard error \((SE\approx 0.05)\). We see that the average is about 17 standard errors away from 3, we are are super far into the tail of the distribution (p-value\(\approx 0\)), and we conclude that the average really isn’t 3.

And since the average is above 3, practically, people such as those surveyed are significantly comfortable with their reading speed.

Two-sample t-test

The standard t-test compares the average of a single set of numbers to a predetermined number
If you have paired observations (e.g. matched before vs after) then you subtract first to end up with a single set of differences which you can compare to 0 (no difference)
But what if you have two separate set of numbers (e.g. Free State vs Gauteng)?
- You can compare the averages using a two-sample t-test
- The two-sample t-test is much less powerful and has a lot more assumptions though

ANOVA

What if you have more than two sets of numbers?

Analysis of Variance (ANOVA) asks whether all the groups have the same average \((\mu_1=\mu_2=\mu_3=\dots)\)
Suppose we ask each office of a business to rate their manager on a scale of 0 to 10. Each office operates independently so we can compare the manager ratings using an ANOVA to see if any managers are better or worse liked than the rest.
ANOVAs assume that the groups have the same variance, so if one manager plays favourites and has more variance in rating then the ANOVA will not work right.
If the ANOVA gives a small p-value then we reject the null hypothesis that all the averages are equal and conclude that at least 1 average is different from the rest.
- It does not imply that all the averages are different. To determine which averages are different people like to do a post-hoc test, such as the Tukey HSD procedure

Rank-sum and U-test

If your data is badly behaved, skewed, unusual, or just doesn’t really meet the assumptions above then people will recommend using a non-parametric alternative.

However, non-parametric tests have different null hypotheses, and are not directly interchangeable.

For comparing a single sample to a fixed value we might use a Wilcoxon signed-rank test. This is not testing the average at all, instead it tests the more general question of whether the values tend to be close to or far away from the null value.

For comparing two samples we can use the Mann-Whitney U test. This asks the question of whether one set of numbers is systematically larger or systematically smaller than another.

Because these questions are more general, and make less assumptions, they are also more robust. However, they can have less power to find differences.

More advanced tests also have non-parametric alternatives, e.g. Kruskall-Wallis instead of ANOVA.

Correlation

Correlation measures the tendency for things to move together.

A positive correlation says that when one thing is above average the other will tend to also be above average (below average \(\leftrightarrow\) below average). Think of temperature and watermelon consumption say.

A negative correlation says that things tend to move against each other.

Correlation done not imply causation.

Regression

Regression is when we try to explain the variation in one variable by the variation in other variables, e.g. regional office profit could be explained by office morale, staff turnover, training programmes, controlled for region population, regional per capita income, etc.

Regression is the basis for both prediction and forecasting (not the same thing).

While it is the most powerful tool in statistics, it is also the most complex. Regression experts regularly mess it up.

If you want to do a regression in a scientific or experimental setting then contact a statistician.

If you want to do a regression in a business setting, hire a data scientist.

Conclusion

The end

Thank you for your time and attention.

I really hope I could inspire some people, or at least broaden minds as to what is possible.

If you want to do more fancy statistics, like process control, machine learning, business analytics, …, then please talk to an expert. But be careful of big promises. A good statistician will focus on helping to stop getting some things wrong, they will not single handedly double profits or turn the business around. While statistics can do amazing things, it is not a magic wand: if your data sucks, so will your results.