Use of AI tools in Statistics

Sean van der Merwe

Introduction

Topics

  • AI tools are not new, but are suddenly popular
  • Types of AI tools and how to use them
  • Uses of AI for lecturers
  • Uses of AI for students
  • Abuses of AI by students
    • And how to spot them

Is the above order in order?

History

  • AI is not new, been in use for decades
    • Think of robotics and gaming
    • Google revolutionised search by bringing in statistics, and gradually making it smarter
    • WinBugs (early 2000s) then OpenBugs, JAGS, and now Stan are all Bayesian AI tools
    • Tools for writing English essays have been around for years
    • Wolfram Alpha is an amazing AI tool for mathematics, from the developers of Mathematica
  • 2023 saw the release of ChatGPT 3.5 to the public
    • First public tool with general purpose usefulness and ability to understand plain English
    • This is when the media took notice

My (Sean’s) thoughts

My views on the AI tools we’re about to discuss are that …

  • They summarise complicated topics in plain language
  • They explain simple things clearly
  • So they are like Wikipedia page generators
    • They make something like a Wikipedia page on the fly based on what you ask
  • BUT without any accuracy checks!
    • They have no concept of right or wrong, correct or incorrect
      • They are language models based on the general internet text so they have the same biases that the internet has

Chatbots

  • ChatGPT 3.5 is free and useful
    • Superficial and not cutting edge, limitations on use
  • ChatGPT 4 is $20/month and has all the best plugins
    • Still superficial unless you use the right plugins and the topic is popular
  • New Bing AI
    • Requires Edge browser but no login (private Microsoft account recommended for better performance)
    • Based on ChatGPT 4, but different interface
    • Most superficial responses, but still super useful because it is super flexible
  • Google Bard
    • Main competitor to ChatGPT (which is primarily sponsored by Microsoft)

Note that ChatGPT was not trained on academic papers, so expecting academic answers will cause you a lot of pain.

Research tools

The tools below are far less flexible, but far more rigorous, and based on peer reviewed papers:

  • Elicit
    • Free signup, single step to use
    • Loosely based on old ChatGPT 3, but is not a chatbot
    • Finds and summarises (briefly) top papers matching search
  • Consensus
    • Same as above
    • Has three summary modes:
      • First it just presents the top papers to you neatly (same as Elicit)
      • Then it has a button to generate a consensus summary of these papers
      • If you pay then the consensus summary will be generated using ChatGPT 4 instead of the older models

Examples

Uses for lecturers

  • These tools are fantastic for creating assessments
    • One of my assignments this semester was created entirely by ChatGPT from 1 short prompt
    • As a follow up I asked it for a rubric and then used that rubric as-is (with acknowledgement)
    • Use them to generate word problems, they are great at that!
      • But don’t ask them to solve statistical problems
      • These tools are bad at math, statistics, and probability, as those require precision
  • They are good at translating though
    • That includes translating into LaTeX 😀
      • Example: ask it to give you a formula for standard deviation in LaTeX notation

More lecturer uses

  • They are great at answering basic student questions about the work
    • Copy the question into Bing, copy the answer back to the student
    • Example: student asks what a line of code means, you paste the line of code into the chat bot and ask it to explain it
  • Use it to draft a difficult email
    • Or a motivating Blackboard post to get students excited about a topic

In summary, they are good at expanding and contracting text

But don’t give them personal information

Student uses

  • Students around the world are using these bots as cheap alternatives to tutors
  • They ask the bots to explain the notes to them
    • Summarising the parts that are too long, and expanding the parts that are too short
  • The bots can give easy introductions to topics
  • They can even quiz the student
    • Ask the bot to give you a multiple choice question on something, wait for your response, and assess your response

Student abuses

  • I’ve had students submit assignments where almost everything is just copy-pasted from ChatGPT
    • But, in my view, ChatGPT is not the problem, copy-pasting is
    • Most students used it as a starting point, and then reworked the text as they saw fit
      • Like Wikipedia, ChatGPT is a good starting point and a terrible endpoint
  • ChatGPT can hallucinate and talk nonsense
    • It can also make up fake references

Spotting issues

  • The most obvious is when a student just does a copy-paste:
    • If students have questions in their answers,
    • or words that are a clear response to a prompt,
    • or end with a statement leading to more information that doesn’t come
  • Another clue is if the language style does not match that of the student in general
    • ChatGPT language is generally better than that of our students
  • Turn-it-in has an AI detection tool that tries to pick up AI use.
    • If it says 100% then you know there’s a problem
    • If it says 0% then it’s probably all good
    • Anything in between means the student probably used a little bit of AI text but didn’t rely on it (usually also fine)

Conclusion

Curiosity counts

  • The most valuable skill in statistics is curiosity. It is not just a personality trait, it can be learned, cultivated, and taught. Always being willing to learn new things will get you ahead more than almost anything else
  • I love learning new statistical ideas and approaches, and I really love learning new tools
    • The more tools in your (organised) toolbox the faster and better you can deal with issues
    • Learning new tools is easier than ever with AI chat being available to help you learn fast
    • But I’m careful to always remember that it is the task that matters, not the tool

The end

Thank you for your time and attention.

This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image by DALL-E via Bing AI.