While these slides are draw up and presented by Sean van der Merwe of the UFS Statistical Consultation Unit, they are inspired by slides of Dudu Ndlovu, previously with the UFS Faculty of Economic and Management Sciences.
The more detailed your scale the better. A 4 level Likert scale is going to hurt you in the long run, just ask them to rate their agreement on a scale of 0 to 10 instead.
Read, read, listen, watch, discuss, and read some more.
Research questions often fall in one of these categories
Important factors to consider in research design include
One of the most powerful conclusions that can come from research is when you can say that A causes B. Causal inference is easily conveyed and generally actionable, so highly desired. Unfortunately, it is very difficult to get right.
To be able to say that A causes B you must show that:
If you don’t have both of these conditions then don’t use the word ‘causes’.
We can create the conditions for causal inference by finding groups that are statistically identical in every way that is related to the outcome, except for the treatment. We look at the change in the groups over time and then study the differences in changes between groups.
Example: you have an online storefront and are considering two layouts. You can implement both layouts and then each time a new person comes to the website they get one of the layouts at random. After a month you see which layout resulted in more purchases on average. Since the assignment is completely random, the two groups will tend to be similar, and you can confidently say that the difference in purchases is due to the layout because it logically can’t be anything else.
This is known as A/B testing and is statistically similar to a medical trial.
Counter example: you also have a physical storefront and need to downscale one of the channels. You can’t just look at purchases/profit total or purchases/profit per customer, because there are many critical systematic differences between customers that walk into a physical store and customers that buy online.
To answer this question you must first seek to understand the differences between the groups and their motivations, say via questionnaires, interviews, or focus groups.
Then you must selectively match people that are similar in background and ask the questions, “What would this person have done had they used the other channel? Would they have spent more or less? Would they have generated more profit?” This is incredibly difficult to do right and requires a very large sample.
Another school of thought says that if you can accurately predict the results of your actions then you don’t need to fully understand the details of the processes. While technically true, it is also the easiest road to massive mistakes.
To properly make useful predictions you must:
While it would be nice to always know exactly what causes what and thus make definitive and unquestionable business decisions with known outcomes, in practice we need to make do with what is practical. Getting data from people can lead to useful insights, but we must remember that we are getting perceptions and opinions, not a true reflection of reality.
Regardless of anything else about the study, if you want to do inference then you must have a representative sample.
Very rarely is research purely deductive: where you look at a specific thing in its entirety from all angles and draw conclusions about that thing alone. Examples include mathematical theorems, or looking at the accounting history of a specific company only and making deductions about that company’s history only (say that they paid too little tax).
Almost all research is inferential, where you look at a sample (small picture) and based on that you say something about the population (big picture). This includes talking about the future based on the past, or talking to a few people and extrapolating to many people.
If there are systematic differences between your sample and your population then your inferences will be wrong, no matter how much effort you put into everything else you do.
Throughout your analysis, you must remember at all times that you are making judgements on the perceptions of people such as those who respond to your survey. The survey will only be presented to a specific group of people, and we should not try to extend the results beyond that group without seriously considering and accounting for any systematic differences between that group and any broader group.
If you are only talking to people in Bloemfontein, then be very careful to not talk about people in South Africa in general, stick to talking about Bloemfontein people.
If you are going to hang out at a mall and talk to passers by then that is called convenience sampling. This is not a simple random sample.
You can improve your chances of getting a representative sample by taking pro-active steps. For example, make sure you ask different types of people, move to different parts of the mall, and go to at least 3 malls in different suburbs with different demographics.
The best you can do explain all the variation you can, report what is left, and hope it stays stable for a while.
Nominal questions are those with no ordering in the categories. Demographic questions are often nominal.
Ordinal questions are those with ordered categories. These are often things that have an underlying scale but are better to measure in steps.
Stick to 1 or 2 scales for a survey and don’t have them in between each other.
Ask yourself, “Does a neutral level on my scale make sense for this question?”
Use mutually exclusive and exhaustive categories
The following may seem like a good way to get lots of information from “one” question:
Do you participate in sports?
1 = No; 2 = Yes
If No, Go to Question 3
If Yes, choose all sports that apply from the following: Football; Volleyball; Basketball; Soccer; Swimming; Other (Specify_________)
The simplest way to summarise information is counting. If we count things in categories, then we get frequencies. If we divide by the total count then we get relative frequencies (proportions).
Suppose we asked 20 students on campus how far they want to study:
Bachelors | Honours | Masters | PhD |
---|---|---|---|
6 | 10 | 3 | 1 |
Suppose we wanted to look at the interaction of two responses. We could count every combination of categories.
Gender | Bachelors | Honours | Masters | PhD |
---|---|---|---|---|
Female | 2 | 6 | 2 | 0 |
Male | 4 | 4 | 1 | 1 |
If we wanted proportions we need to ask whether we want
When we measure something directly, such as rainfall, then we can summarise it with descriptive statistics. Typical examples include:
Plots allow us to put a lot of information in a small space, and allows the reader to draw their own conclusions.
Bar plots are a great example. For every category they show the median (middle line), quartiles (box), robust range (whiskers), and extreme values (dots).
A 95% confidence interval for the average says
If we were to repeat the experiment a large number of times then 95% of the time the true population average would be in the interval generated.
A 95% credibility interval says
Based on the 1 sample we observed and our prior knowledge, we are 95% certain that the population average is in this interval
A 95% prediction interval says
95% of the population values are predicted to be inside this interval
Suppose you talk to people before and after a workshop and ask them their opinion of something on a scale of 0 to 10
A normal distribution means that your numbers are roughly symmetric and bell shaped.
If this holds then the t-test works at its best and tells you about how your sample average relates to the population average.
Suppose we map a Likert scale question as follows:
And get this summary:
Count | Average | StdDev | Question |
---|---|---|---|
441 | 3.841 | 1.01 | I am comfortable with my reading speed |
If people were answering this question at random then we would expect the average to be 3. If we take the standard deviation \((s=1.01)\) and divide by the square root of the number of observations \((\sqrt{441}=21)\) then we get the standard error \((SE\approx 0.05)\). We see that the average is about 17 standard errors away from 3, we are are super far into the tail of the distribution (p-value\(\approx 0\)), and we conclude that the average really isn’t 3.
And since the average is above 3, practically, people such as those surveyed are significantly comfortable with their reading speed.
What if you have more than two sets of numbers?
If your data is badly behaved, skewed, unusual, or just doesn’t really meet the assumptions above then people will recommend using a non-parametric alternative.
However, non-parametric tests have different null hypotheses, and are not directly interchangeable.
For comparing a single sample to a fixed value we might use a Wilcoxon signed-rank test. This is not testing the average at all, instead it tests the more general question of whether the values tend to be close to or far away from the null value.
For comparing two samples we can use the Mann-Whitney U test. This asks the question of whether one set of numbers is systematically larger or systematically smaller than another.
Because these questions are more general, and make less assumptions, they are also more robust. However, they can have less power to find differences.
More advanced tests also have non-parametric alternatives, e.g. Kruskall-Wallis instead of ANOVA.
Correlation measures the tendency for things to move together.
A positive correlation says that when one thing is above average the other will tend to also be above average (below average \(\leftrightarrow\) below average). Think of temperature and watermelon consumption say.
A negative correlation says that things tend to move against each other.
Correlation done not imply causation.
Regression is when we try to explain the variation in one variable by the variation in other variables, e.g. regional office profit could be explained by office morale, staff turnover, training programmes, controlled for region population, regional per capita income, etc.
Regression is the basis for both prediction and forecasting (not the same thing).
While it is the most powerful tool in statistics, it is also the most complex. Regression experts regularly mess it up.
If you want to do a regression in a scientific or experimental setting then contact a statistician.
If you want to do a regression in a business setting, hire a data scientist.
Thank you for your time and attention.
I really hope I could inspire some people, or at least broaden minds as to what is possible.
If you want to do more fancy statistics, like process control, machine learning, business analytics, …, then please talk to an expert. But be careful of big promises. A good statistician will focus on helping to stop getting some things wrong, they will not single handedly double profits or turn the business around. While statistics can do amazing things, it is not a magic wand: if your data sucks, so will your results.
2023/08/23 - Quantitative Research