[HN Gopher] CodeAid: A classroom deployment of an LLM-based codi...
___________________________________________________________________
CodeAid: A classroom deployment of an LLM-based coding assistant
Author : jermaustin1
Score : 38 points
Date : 2024-06-07 16:02 UTC (1 days ago)
(HTM) web link (austinhenley.com)
(TXT) w3m dump (austinhenley.com)
| majeedkazemi wrote:
| I'm the lead author of this paper. Feel free to ask me anything!
| - MK
| newzisforsukas wrote:
| What do you think about the ethical implications of using
| unreliable agents as educators?
| majeedkazemi wrote:
| The same goes with human TAs that are extensively used in
| undergrad introductory programming classes. They can also be
| unreliable in many cases.
|
| 1. Provide students with the tools and knowledge to
| critically verify responses, either coming from an educator
| or a an AI agent. 2. Build more transparent AI agents that
| show how reliable they are on different types of queries. Our
| deployment showed that the Help Fix Code was less reliable,
| while other features were significantly better.
|
| But totally agree that we should be discussing the ethical
| implications much more.
| lispisok wrote:
| My experience as a TA is students definitely do not have
| the knowledge to critically verify responses.
| majeedkazemi wrote:
| Doesn't this make AI agents better? Given that human TAs
| do make mistakes a LOT, or in many cases are just
| unprepared (e.g. haven't done the programming assignment
| themselves)
|
| Human TA's have ego, AI doesn't. With proper tools, you
| should be able to steer an AI agent.
|
| I think both humans and AI agents both have their
| drawbacks and benefits. That's why the last section of
| the paper discusses that we even need to teach students
| (or provide tools) to help them decide where to use AI vs
| non-AI tools.
| exe34 wrote:
| i agree with this. I keep trying to instill paranoia in the
| younger people I work with. even if you can see that the
| code is doing set_x(5), if it's crashing 20 lines down, I
| want you to either print or breakpoint the code here and
| really prove to me that x is now 5, before I look any
| further. sometimes set_x() might not do what you think.
| other times there might be something stomping on it from
| here to there, but I want to be absolutely sure, I don't
| share your faith in the documentation, I don't trust my own
| eyes to read the code, I just want to be 100% sure.
| throw46365 wrote:
| Right. So can an LLM convey that paranoia?
|
| The way a formal methods lecturer explained to me his
| concerns about the Y2K problem by talking about the
| embedded systems in the automated medication pumps
| treating his sick partner, and how without an MMU and
| code that could not be inspected, there was a non-zero
| chance that rolled-over dates would cause logging data to
| overwrite configuration data?
|
| Can an LLM convey a bit of anger and fear when talking
| about Therac-25?
|
| Even though a TA is often at a much lower teaching level
| than this, every single person who has ever learned
| anything has done so with the benefit of a teacher who
| "got through to them" either on a topic or on a
| principle.
|
| It's bonkers to compare TAs and LLMs simply on their
| error rate, when the errors TAs make are of a _totally_
| different nature to the errors LLMs can make.
| exe34 wrote:
| oh my point was that somebody has to strike the fear of
| god in them first before they start trusting the llm
| blindly. I know the llm can fake this kind of thing,
| especially if you put a prompt that forces "as an LLM,
| I'm probably going to shoot you in the foot randomly",
| but I'm sure they'll get used to ignoring it.
| throw46365 wrote:
| > oh my point was that somebody has to strike the fear of
| god in them first before they start trusting the llm
| blindly.
|
| We agree on that :-)
| throw46365 wrote:
| > The same goes with human TAs that are extensively used in
| undergrad introductory programming classes. They can also
| be unreliable in many cases.
|
| Ehh. Those TAs, if they feel they might be wrong, can
| consult the lecturer/professor. And if they feel they might
| be wrong, they can just say so.
|
| IMO there is little to no comparison between a bad TA and a
| confidently-wrong LLM (having been a TA who knew to consult
| the professor if I felt I was not on solid ground).
|
| LLMs have no experience with teaching, they have no empathy
| for students grappling with the more challenging things,
| and they can _gain_ no experience with teaching. Because it
| 's not about spewing out text. It's about guiding and
| helping students with learning.
|
| For example: can an LLM sympathise or empathise with a
| cybernetics student who is grappling with the whole
| conceptual idea of laplace transforms? No. It can only spew
| out text with just the same level of investment as if it
| was writing a silly song about cats in galoshes on the
| Moon.
|
| I wish we were not in this "well humans also..."
| justification phase.
|
| It is genuinely disrespectful to actual real people and
| it's founded on projection.
|
| And in this case, it will also shut down the pipeline of
| academic progression if TAs are no longer hired.
|
| Why are we doing this to academia when the better approach
| would be giving TAs better training in actual teaching?
| More-senior academics doing this kind of research work is
| absolutely riddled with moral hazard: it's not your jobs
| immediately on the line.
|
| ETA: sooner or later, people in the generative AI market
| should really consider not just saying that we should _talk
| about_ the ethical implications, but actually taking a
| stand on them. It 's not enough to produce something that
| might cause a problem, rush it into production and just say
| "we might want to talk about the problems this might
| cause". Ethics are for everyone, not just ethicists.
| majeedkazemi wrote:
| LLMs are tools. They're not everything. Yes, they can't
| sympathize or empathize. But if they can help a student
| to be more productive and learn at the same time, then
| I'm all in for designing them properly to be used in such
| educational contexts... "as an additional tool."
|
| We need both humans and AI. But there are problems with
| both, so that's why they can hopefully complement each
| other. Humans might have limited patience, availability,
| etc. and AI lacks empathy, and can be over-confident.
|
| > Why are we doing this to academia when the better
| approach would be giving TAs better training in actual
| teaching?
|
| Sure, that is a fantastic idea and some researchers have
| explored it.
|
| But, what's wrong with doing exploratory research, in a
| real-world deployment? In the paper we describe both
| where CodeAid failed and where students and educators
| found it useful, in a very honest way.
| throw46365 wrote:
| > We need both humans and AI.
|
| Genuine question: Why do we _need_ both humans and AI?
| What 's the evidence base for this statement?
|
| I feel this is another thing that proponents state as if
| it's unchallengeable fact, an all-progress-is-good thing.
|
| I question this assertion. People have become all too
| comfortable with it.
|
| (Personal opinion: I don't think teaching _needs_ AI at
| all, and if it does, a traditional simple expert system
| with crafted answers would still be better. I think there
| 's a staggering range of opportunities for improving
| teaching materials that don't involve LLMs, and they are
| all being ignored because of where the hot money goes.)
| majeedkazemi wrote:
| I think my stance is pretty clear about "utilizing" AI in
| educational settings. We absolutely don't _need_ AI the
| same way we need air to breathe. But AI could potentially
| provide some solutions (and create new problems or have
| adverse effects as well), so why not explore it properly
| to find out where it works and where it doesn 't?
| batch12 wrote:
| The statement is a false statement to begin with. We
| don't have AI yet. Maybe when we have software that is
| truly intelligent, we can let it teach us. Until then I
| see this more as a buggy interactive textbook and agree
| with the author's description of it as a tool and
| disagree with the idea of it as a teacher.
| hombre_fatal wrote:
| Meanwhile almost every TA I had at uni didn't really want
| to be there. They were there for their PhD, not as a
| professor in training which would have made your position
| more understandable. And to boot they rarely spoke
| English very well. I had a few TAs that I understood so
| poorly that I stopped attending their labs.
|
| The TA system feels like a hack where university gets to
| get free labor out of PhD students, but the undergrads
| suffer for it. I don't think there's much to glamorize.
| Nor do I think there's much to salvage from the days
| where you needed to attend office hours to get help. You
| see it as this critical human experience in uni but I
| don't.
|
| That said, half my professors at uni also prob didn't
| want to teach. They were there for research.
| throw46365 wrote:
| > They were there for their PhD, not as a professor in
| training which would have made your position more
| understandable.
|
| Right. Not all TAs become professors. But at a first
| approximation all professors have TA experience; it's
| generally their first experience of teaching.
|
| I was paid for my time as a TA, in the UK. It would be
| illegal for them not to pay.
| camdenreslink wrote:
| The TAs in my undergraduate intro to programming class were
| very knowledgable and reliable, but that is a sample size
| of 1.
| tmpz22 wrote:
| The Grad student teachers and TAs in my math courses -
| including discrete math - were at best ambivalent to us
| lesser Computer Science students and at worst under-
| trained and contemptuous.
|
| University of Oregon ~2014ish
| batch12 wrote:
| > The same goes with human TAs that are extensively used in
| undergrad introductory programming classes. They can also
| be unreliable in many cases
|
| I think one difference is that human TAs can,
| theoretically, be held accountable for their reliability
| whereas a holding a LLM accountable is a little more
| difficult.
| jfarmer wrote:
| TAs can be unreliable, yes, but there's at least a social
| contract. When a person asserts something they're assuming
| a certain level of responsibility for the truth of that
| assertion. If a TA is wrong, students have a range of
| recourses, both formal and informal.
|
| For example, an informal recourse could be a student saying
| to a TA: "I've stopped trusting you because your answers
| aren't consistent." The only immediate recourse a student
| has with an LLM is formal: rephrasing the query.
|
| The TA "should know better" and if they don't then they
| should correct themselves. If they're wrong often enough
| they might learn to assert things in a different, more
| moderated way. If a TA is bad enough they could be removed
| from the classroom.
|
| If a student acts on a TA's advice there's a kind of
| "reliance" defense. There are several remedies if a TA
| gives their section bad advice/information which manifests
| on that section's exams, homework, etc. What remedies are
| there if a student acts on an LLM's bad advice?
|
| Who is responsible if the LLM is wrong? Whose behavior will
| have to change? The LLM's behavior can't change, not
| really, so the burden will (surely) shift from the TA to
| the student.
|
| The novice has to learn how to properly query the LLM and
| "critically verify [its] responses", but critical judgement
| is precisely what novices lack!
| stuartjohnson12 wrote:
| https://slatestarcodex.com/2014/08/14/beware-isolated-
| demand...
| newzisforsukas wrote:
| That article is borderline rambling, and I don't see how it
| applies to asking this question.
| jhp123 wrote:
| the explanation of "*word = "hello"" shown is completely
| incorrect, the memmove explanation is also incorrect
| hombre_fatal wrote:
| > It was developed by Majeed as a web app that uses GPT3.5 to
| power an assortment of AI features
|
| Really need GPT-4 for technical explanations, but it's also
| much more expensive.
|
| Every once in a while, ChatGPT logs me out and switches to
| GPT-3, and I immediately notice just due to the quality of the
| answer.
| eikenberry wrote:
| > Direct code solution queries (44%) where students asked CodeAid
| to generate the direct solution (by copying the task description
| of their assignment).
|
| Did these solutions scores get penalized for lack of real
| understanding? Or, to put it another way, is your class about
| teaching programming itself or about teaching how to solve
| problems using any tool available (including an AI that solve it
| for you)?
| majeedkazemi wrote:
| No. Students' usage was anonymized, so the course instructors
| did not know who used the system in what way. This was to make
| sure that students could use the tool freely without feeling
| like the instructors are watching their usage.
| eikenberry wrote:
| OK, cool. Hadn't realized it was designed as an experiment.
| So that would more be something the potential users might
| want to consider when they read about this. Thank you for
| clarifying.
___________________________________________________________________
(page generated 2024-06-08 23:01 UTC)