[HN Gopher] CodeAid: A classroom deployment of an LLM-based codi...
       ___________________________________________________________________
        
       CodeAid: A classroom deployment of an LLM-based coding assistant
        
       Author : jermaustin1
       Score  : 38 points
       Date   : 2024-06-07 16:02 UTC (1 days ago)
        
 (HTM) web link (austinhenley.com)
 (TXT) w3m dump (austinhenley.com)
        
       | majeedkazemi wrote:
       | I'm the lead author of this paper. Feel free to ask me anything!
       | - MK
        
         | newzisforsukas wrote:
         | What do you think about the ethical implications of using
         | unreliable agents as educators?
        
           | majeedkazemi wrote:
           | The same goes with human TAs that are extensively used in
           | undergrad introductory programming classes. They can also be
           | unreliable in many cases.
           | 
           | 1. Provide students with the tools and knowledge to
           | critically verify responses, either coming from an educator
           | or a an AI agent. 2. Build more transparent AI agents that
           | show how reliable they are on different types of queries. Our
           | deployment showed that the Help Fix Code was less reliable,
           | while other features were significantly better.
           | 
           | But totally agree that we should be discussing the ethical
           | implications much more.
        
             | lispisok wrote:
             | My experience as a TA is students definitely do not have
             | the knowledge to critically verify responses.
        
               | majeedkazemi wrote:
               | Doesn't this make AI agents better? Given that human TAs
               | do make mistakes a LOT, or in many cases are just
               | unprepared (e.g. haven't done the programming assignment
               | themselves)
               | 
               | Human TA's have ego, AI doesn't. With proper tools, you
               | should be able to steer an AI agent.
               | 
               | I think both humans and AI agents both have their
               | drawbacks and benefits. That's why the last section of
               | the paper discusses that we even need to teach students
               | (or provide tools) to help them decide where to use AI vs
               | non-AI tools.
        
             | exe34 wrote:
             | i agree with this. I keep trying to instill paranoia in the
             | younger people I work with. even if you can see that the
             | code is doing set_x(5), if it's crashing 20 lines down, I
             | want you to either print or breakpoint the code here and
             | really prove to me that x is now 5, before I look any
             | further. sometimes set_x() might not do what you think.
             | other times there might be something stomping on it from
             | here to there, but I want to be absolutely sure, I don't
             | share your faith in the documentation, I don't trust my own
             | eyes to read the code, I just want to be 100% sure.
        
               | throw46365 wrote:
               | Right. So can an LLM convey that paranoia?
               | 
               | The way a formal methods lecturer explained to me his
               | concerns about the Y2K problem by talking about the
               | embedded systems in the automated medication pumps
               | treating his sick partner, and how without an MMU and
               | code that could not be inspected, there was a non-zero
               | chance that rolled-over dates would cause logging data to
               | overwrite configuration data?
               | 
               | Can an LLM convey a bit of anger and fear when talking
               | about Therac-25?
               | 
               | Even though a TA is often at a much lower teaching level
               | than this, every single person who has ever learned
               | anything has done so with the benefit of a teacher who
               | "got through to them" either on a topic or on a
               | principle.
               | 
               | It's bonkers to compare TAs and LLMs simply on their
               | error rate, when the errors TAs make are of a _totally_
               | different nature to the errors LLMs can make.
        
               | exe34 wrote:
               | oh my point was that somebody has to strike the fear of
               | god in them first before they start trusting the llm
               | blindly. I know the llm can fake this kind of thing,
               | especially if you put a prompt that forces "as an LLM,
               | I'm probably going to shoot you in the foot randomly",
               | but I'm sure they'll get used to ignoring it.
        
               | throw46365 wrote:
               | > oh my point was that somebody has to strike the fear of
               | god in them first before they start trusting the llm
               | blindly.
               | 
               | We agree on that :-)
        
             | throw46365 wrote:
             | > The same goes with human TAs that are extensively used in
             | undergrad introductory programming classes. They can also
             | be unreliable in many cases.
             | 
             | Ehh. Those TAs, if they feel they might be wrong, can
             | consult the lecturer/professor. And if they feel they might
             | be wrong, they can just say so.
             | 
             | IMO there is little to no comparison between a bad TA and a
             | confidently-wrong LLM (having been a TA who knew to consult
             | the professor if I felt I was not on solid ground).
             | 
             | LLMs have no experience with teaching, they have no empathy
             | for students grappling with the more challenging things,
             | and they can _gain_ no experience with teaching. Because it
             | 's not about spewing out text. It's about guiding and
             | helping students with learning.
             | 
             | For example: can an LLM sympathise or empathise with a
             | cybernetics student who is grappling with the whole
             | conceptual idea of laplace transforms? No. It can only spew
             | out text with just the same level of investment as if it
             | was writing a silly song about cats in galoshes on the
             | Moon.
             | 
             | I wish we were not in this "well humans also..."
             | justification phase.
             | 
             | It is genuinely disrespectful to actual real people and
             | it's founded on projection.
             | 
             | And in this case, it will also shut down the pipeline of
             | academic progression if TAs are no longer hired.
             | 
             | Why are we doing this to academia when the better approach
             | would be giving TAs better training in actual teaching?
             | More-senior academics doing this kind of research work is
             | absolutely riddled with moral hazard: it's not your jobs
             | immediately on the line.
             | 
             | ETA: sooner or later, people in the generative AI market
             | should really consider not just saying that we should _talk
             | about_ the ethical implications, but actually taking a
             | stand on them. It 's not enough to produce something that
             | might cause a problem, rush it into production and just say
             | "we might want to talk about the problems this might
             | cause". Ethics are for everyone, not just ethicists.
        
               | majeedkazemi wrote:
               | LLMs are tools. They're not everything. Yes, they can't
               | sympathize or empathize. But if they can help a student
               | to be more productive and learn at the same time, then
               | I'm all in for designing them properly to be used in such
               | educational contexts... "as an additional tool."
               | 
               | We need both humans and AI. But there are problems with
               | both, so that's why they can hopefully complement each
               | other. Humans might have limited patience, availability,
               | etc. and AI lacks empathy, and can be over-confident.
               | 
               | > Why are we doing this to academia when the better
               | approach would be giving TAs better training in actual
               | teaching?
               | 
               | Sure, that is a fantastic idea and some researchers have
               | explored it.
               | 
               | But, what's wrong with doing exploratory research, in a
               | real-world deployment? In the paper we describe both
               | where CodeAid failed and where students and educators
               | found it useful, in a very honest way.
        
               | throw46365 wrote:
               | > We need both humans and AI.
               | 
               | Genuine question: Why do we _need_ both humans and AI?
               | What 's the evidence base for this statement?
               | 
               | I feel this is another thing that proponents state as if
               | it's unchallengeable fact, an all-progress-is-good thing.
               | 
               | I question this assertion. People have become all too
               | comfortable with it.
               | 
               | (Personal opinion: I don't think teaching _needs_ AI at
               | all, and if it does, a traditional simple expert system
               | with crafted answers would still be better. I think there
               | 's a staggering range of opportunities for improving
               | teaching materials that don't involve LLMs, and they are
               | all being ignored because of where the hot money goes.)
        
               | majeedkazemi wrote:
               | I think my stance is pretty clear about "utilizing" AI in
               | educational settings. We absolutely don't _need_ AI the
               | same way we need air to breathe. But AI could potentially
               | provide some solutions (and create new problems or have
               | adverse effects as well), so why not explore it properly
               | to find out where it works and where it doesn 't?
        
               | batch12 wrote:
               | The statement is a false statement to begin with. We
               | don't have AI yet. Maybe when we have software that is
               | truly intelligent, we can let it teach us. Until then I
               | see this more as a buggy interactive textbook and agree
               | with the author's description of it as a tool and
               | disagree with the idea of it as a teacher.
        
               | hombre_fatal wrote:
               | Meanwhile almost every TA I had at uni didn't really want
               | to be there. They were there for their PhD, not as a
               | professor in training which would have made your position
               | more understandable. And to boot they rarely spoke
               | English very well. I had a few TAs that I understood so
               | poorly that I stopped attending their labs.
               | 
               | The TA system feels like a hack where university gets to
               | get free labor out of PhD students, but the undergrads
               | suffer for it. I don't think there's much to glamorize.
               | Nor do I think there's much to salvage from the days
               | where you needed to attend office hours to get help. You
               | see it as this critical human experience in uni but I
               | don't.
               | 
               | That said, half my professors at uni also prob didn't
               | want to teach. They were there for research.
        
               | throw46365 wrote:
               | > They were there for their PhD, not as a professor in
               | training which would have made your position more
               | understandable.
               | 
               | Right. Not all TAs become professors. But at a first
               | approximation all professors have TA experience; it's
               | generally their first experience of teaching.
               | 
               | I was paid for my time as a TA, in the UK. It would be
               | illegal for them not to pay.
        
             | camdenreslink wrote:
             | The TAs in my undergraduate intro to programming class were
             | very knowledgable and reliable, but that is a sample size
             | of 1.
        
               | tmpz22 wrote:
               | The Grad student teachers and TAs in my math courses -
               | including discrete math - were at best ambivalent to us
               | lesser Computer Science students and at worst under-
               | trained and contemptuous.
               | 
               | University of Oregon ~2014ish
        
             | batch12 wrote:
             | > The same goes with human TAs that are extensively used in
             | undergrad introductory programming classes. They can also
             | be unreliable in many cases
             | 
             | I think one difference is that human TAs can,
             | theoretically, be held accountable for their reliability
             | whereas a holding a LLM accountable is a little more
             | difficult.
        
             | jfarmer wrote:
             | TAs can be unreliable, yes, but there's at least a social
             | contract. When a person asserts something they're assuming
             | a certain level of responsibility for the truth of that
             | assertion. If a TA is wrong, students have a range of
             | recourses, both formal and informal.
             | 
             | For example, an informal recourse could be a student saying
             | to a TA: "I've stopped trusting you because your answers
             | aren't consistent." The only immediate recourse a student
             | has with an LLM is formal: rephrasing the query.
             | 
             | The TA "should know better" and if they don't then they
             | should correct themselves. If they're wrong often enough
             | they might learn to assert things in a different, more
             | moderated way. If a TA is bad enough they could be removed
             | from the classroom.
             | 
             | If a student acts on a TA's advice there's a kind of
             | "reliance" defense. There are several remedies if a TA
             | gives their section bad advice/information which manifests
             | on that section's exams, homework, etc. What remedies are
             | there if a student acts on an LLM's bad advice?
             | 
             | Who is responsible if the LLM is wrong? Whose behavior will
             | have to change? The LLM's behavior can't change, not
             | really, so the burden will (surely) shift from the TA to
             | the student.
             | 
             | The novice has to learn how to properly query the LLM and
             | "critically verify [its] responses", but critical judgement
             | is precisely what novices lack!
        
           | stuartjohnson12 wrote:
           | https://slatestarcodex.com/2014/08/14/beware-isolated-
           | demand...
        
             | newzisforsukas wrote:
             | That article is borderline rambling, and I don't see how it
             | applies to asking this question.
        
       | jhp123 wrote:
       | the explanation of "*word = "hello"" shown is completely
       | incorrect, the memmove explanation is also incorrect
        
         | hombre_fatal wrote:
         | > It was developed by Majeed as a web app that uses GPT3.5 to
         | power an assortment of AI features
         | 
         | Really need GPT-4 for technical explanations, but it's also
         | much more expensive.
         | 
         | Every once in a while, ChatGPT logs me out and switches to
         | GPT-3, and I immediately notice just due to the quality of the
         | answer.
        
       | eikenberry wrote:
       | > Direct code solution queries (44%) where students asked CodeAid
       | to generate the direct solution (by copying the task description
       | of their assignment).
       | 
       | Did these solutions scores get penalized for lack of real
       | understanding? Or, to put it another way, is your class about
       | teaching programming itself or about teaching how to solve
       | problems using any tool available (including an AI that solve it
       | for you)?
        
         | majeedkazemi wrote:
         | No. Students' usage was anonymized, so the course instructors
         | did not know who used the system in what way. This was to make
         | sure that students could use the tool freely without feeling
         | like the instructors are watching their usage.
        
           | eikenberry wrote:
           | OK, cool. Hadn't realized it was designed as an experiment.
           | So that would more be something the potential users might
           | want to consider when they read about this. Thank you for
           | clarifying.
        
       ___________________________________________________________________
       (page generated 2024-06-08 23:01 UTC)