https://austinhenley.com/blog/codeaid.html
Austin Z. Henley
I work on AI + developer tools.
azh321@gmail.com
@austinzhenley
github/AZHenley
---------------------------------------------------------------------
Home | Publications | Teaching | Blog
---------------------------------------------------------------------
CodeAid: A classroom deployment of an LLM-based programming assistant
5/19/2024
The CodeAid interface. It shows the students code, an area to ask
questions, and buttons to ask the AI for specific types of help.
This post was co-written with Majeed Kazemitabaar, who led this
project. Majeed is a PhD student in CS at the University of Toronto
who has been researching the educational impact and utility of LLMs
in computing education. We summarize our recent CHI'24 paper,
"CodeAid: Evaluating a Classroom Deployment of an LLM-based
Programming Assistant that Balances Student and Educator Needs". See
the paper for more details.
---------------------------------------------------------------------
LLM-powered tools like ChatGPT can assist students that need help in
programming classes by explaining code and coding concepts,
generating fixed versions of incorrect code, providing examples,
suggesting areas of improvement, and even writing entire code
solutions.
However, the productivity-driven and direct nature of the AI's
responses is concerning in educational settings. Many instructors are
prohibiting their usage in introductory programming classes to avoid
academic integrity issues and students' over-reliance on AI.
In this research, we explored the design and evaluation of a
"pedagogical" LLM-powered coding assistant to scale up instructional
support in educational settings.
We iteratively designed a programming assistant, CodeAid, that
provides help to students without revealing code solutions. It was
developed by Majeed as a web app that uses GPT3.5 to power an
assortment of AI features. We then deployed CodeAid in a C
programming course of 700 students as an optional resource, similar
to office hours and Q/A forums, for their entire 12-week semester.
Overall, we collected data from the following sources:
* [[?]] Weekly surveys about using CodeAid vs. other non-AI
resources
* [] CodeAid's log data
* [] Interviews with 22 students about why and how they used
CodeAid
* [] Final survey asking students to compare CodeAid with ChatGPT
* [] Interviews with 8 computing educators about CodeAid, their
ethical and pedagogical considerations, and comparison with other
resources
During the deployment, 372 students used CodeAid and asked 8000
queries. We thematically analyzed 1750 of the queries and CodeAid's
responses to understand students' usage patterns and types of queries
(RQ1), and CodeAid's response Quality in terms of correctness,
helpfulness, and Directness (RQ2). Furthermore, we qualitatively
analyzed data collected from the interviews and surveys to understand
the perspectives of students (RQ3) and educators (RQ4) about CodeAid.
CodeAid's features
CodeAid was developed with five main features that were iteratively
updated during the deployment based on student feedback:
* General Question
* Question from Code
* Help Write Code
* Explain Code
* Help Fix Code
The illustration below shows these features in action:
CodeAid allows students to ask five types of coding questions:
General Question, Question From Code, Explain Code, Help Fix Code,
and Help Write Code. In response, CodeAid uses LLMs to generate
pedagogical answers that do not contain direct code solutions. When
asked general questions or to generate code, it provides a natural
language response with an interactive pseudo-code that allows
students to hover over each line and understand what each line does.
Responses also include relevant function documentations retrieved
from a database to ensure factual accuracy and approved by course
educator. When asked to help fix provided incorrect code, CodeAid
does not display the fixed code. Instead, it highlights incorrect
parts of the students' code with suggested fixes.
Below are some of the unique properties of CodeAid:
* Interactive Pseudo-Code: Instead of generating code, CodeAid
generated an interactive pseudo-code. The pseudo-code allowed
students to hover over each line to see a detailed explanation
about that line.
* Relevant Function Documentations: Not everything needs to be
AI-generated. CodeAid uses Retrieval Augmented Generation (RAG)
to display official and instructor-verified documentations of
functions relevant to students' queries. These were designed to
save time, as well as to allow students to see code examples to
learn about using those functions.
* Suggested Follow-Up Questions: CodeAid also generates several
suggested follow-up questions for students to ask after each
response.
* Annotating Incorrect Code: When using the Help Fix Code, CodeAid
does not display the fixed code. Instead, it highlights incorrect
parts of the students' code with suggested fixes.
* Interactive Explain Code: Instead of just displaying a high-level
explanation of the entire code in a paragraph, CodeAid renders an
interactive component in which students can hover over each line
to understand the purpose and implementation of each line of the
provided code.
* Stream Rendering of Interactive Components: CodeAid renders
interactive components as the response is being streamed from the
LLM enabling a more interactive experience with less delay.
Results
From our 12-week deployment, surveys, and interviews, we aim to
answer our four research questions.
RQ1: Students' Usage Patterns and Type of Queries
First, let's look into the high-level statistics of students' usage
of CodeAid:
* From the 300 students who consented to use their data, 160
students used CodeAid less than 10 times, whereas 62 students
used it more than 30 times.
* On average, women used CodeAid significantly more frequently than
men (33.8 queries vs. 18.4 queries) while representing only 30%
of the entire class.
A chart showing daily usage of CodeAid over time. There are spikes at
each assignment and exam due date. Peak usage was 400 questions asked
by 50 users in one day.
The thematic analysis revealed four types of queries from CodeAid:
1. Asking Programming Questions (36%)
+ Code and conceptual clarification queries (70%) about the
programming language, its syntax, its memory management, and
operations.
+ Function-specific queries (15%) about the behavior,
arguments, and return types of specific functions.
+ Code execution probe queries (15%) in which students used
CodeAid similar to a compiler to verify execution or evaluate
the output of their code on particular inputs.
2. Debugging Code (32%)
+ Buggy code resolution queries (68%) that focused on fixing
their incorrect code based on a provided behavior.
+ Problem source identification queries (23%) in which students
asked CodeAid to identify the source of the errors in their
code.
+ Error message interpretation queries (9%) to better explain
the error that they are receiving.
3. Writing Code (24%)
+ High-level coding guidance queries (56%) in which students
asked "how- to" questions about a specific coding task.
+ Direct code solution queries (44%) where students asked
CodeAid to generate the direct solution (by copying the task
description of their assignment).
4. Explaining Code (6%): like explaining the starter code provided
in their assignments.
RQ2: CodeAid's Response Quality
The thematic analysis showed that about 80% of the responses were
technically correct and The General Question, Explain Code, and Help
Write Code features all responded correctly in 90% of times, while
the Help Fix Code and Question from Code were correct in 60% of
times.
In terms of not revealing direct solutions, CodeAid almost never
revealed direct code. Instead, it generated:
* Natural language responses (43%)
* High-level response with pseudo-code of generic example code
(16%)
* Pseudo-code of a specifically requested task (6%)
* Suggested fixes for minor syntax errors (16%)
* Suggested fixes for semantic issues (8%)
RQ3: Students' Perspectives and Concerns
Based on the student interviews and surveys:
* Students appreciated CodeAid's 24/7 availability and being "a
private space to ask questions without being judged".
* Students also liked CodeAid's contextual assistance which
provided a faster way to access relevant knowledge, allowed
students to phrase questions however they wanted, and produced
responses that were relevant to their class.
* In terms of the directness of responses: some students indicated
that they wanted CodeAid to produce less direct responses, like
hints. Interestingly, some students regulated themselves to not
use features that produced more direct responses.
* In terms of trust some students trusted CodeAid while others
found that "it can lie to you and still sound confident." Some
students trusted CodeAid just because it was part of the course
and the instructor endorsed it.
* When asked students about reasons for not using CodeAid, they
mentioned a lack of need, preference to use existing tools,
wanting to solve problems by themselves, or a lack of trust over
AI.
* Comparing CodeAid with ChatGPT: even though using ChatGPT was
prohibited, students reported using it slightly more than
CodeAid. They preferred its easier interface, and larger context
window to ask about longer code snippets. However, some students
did not like ChatGPT since it did a lot of the work for them.
RQ4: Educators' Perspectives and Concerns
* Overall, most educators liked the integration of pseudo-code with
line-by-line explanations as it provides structure and reduces
cognitive load and is better than finding exact code solutions
that are available on the internet. However, some educators were
concerned that when the algorithm is what students need to learn,
revealing its pseudo-code will be harmful to their learning.
* Most educators wanted to keep students away from ChatGPT. They
would rather encourage students to use CodeAid instead, and even
suggested integrating code editors with code execution right
inside CodeAid.
* Educators wanted the ability to customize CodeAid with course
topics, or when to activate/deactivate displaying pseudo-code.
* Lastly, most educators wanted a monitoring dashboard to see a
summary of asked questions and generated responses as well as to
reflect on their own instructor. However, other educators
mentioned that students should not feel like they are being
watched.
Design considerations for future educational AI assistants
We synthesized our findings into four major design considerations for
future educational AI assistants, positioned within four main stages
of a student's help-seeking process.
A chart of the four design considerations with additional trade-offs.
* Exploiting AI's Unique Benefits: for deciding between AI vs
non-AI assistance (like debuggers and documentations).
* Designing the AI Querying Interface: once the user decides using
AI assistance, how should students formulate questions and
provide context? Particularly in terms of problem identification,
query formulation, and context provision.
* Balancing the Directness of AI Responses: once the user asks a
question, how direct should the AI respond?
* Supporting Trust, Transparency, and Control: upon receiving a
response, how can students evaluate it and if necessary steer the
AI towards a better response?
---------------------------------------------------------------------
There is a ways to go before we understand how to best use AI in the
classrooms to enhance both instructors and students. Maybe one day it
will provide just the right information at just the right time to
students to keep them optimally engaged and learning while
identifying opportunities for the instructor to intervene.
TODO.
Special thanks to the other co-authors of this work: Runlong Ye,
Xiaoning Wang, Paul Denny, Michelle Craig, and Tovi Grossman.
CodeAid is open source. The full details of the design and evaluation
are in our paper, CodeAid: Evaluating a Classroom Deployment of an
LLM-based Programming Assistant that Balances Student and Educator
Needs. You might also be interested in:
* Learning to code with and without AI
* Exploring 50 user interfaces for AI code suggestions
* The pain points of building a copilot
* The pain points of teaching computer science