Post AxVzlD3UKmAZlKaaMS by tao@mathstodon.xyz
 (DIR) More posts by tao@mathstodon.xyz
 (DIR) Post #AxVzlD3UKmAZlKaaMS by tao@mathstodon.xyz
       2025-08-25T01:30:01Z
       
       0 likes, 0 repeats
       
       Developers of AI tools often focus on trying to create a useful "success mode" for their tool, where the tool accomplishes the task that the user requested.  But having a clearly identified "failure mode" is equally important, and does not always receive a corresponding amount of attention.To give an analogy: if one is looking for some piece of information (e.g., obscure quote from a film that one has forgotten the name of), one can type some half-remembered version of the quote into a search engine and see what happens.  If one is lucky, one activates the "success mode" where a useful hit is returned.  Or, one can have a clear "failure mode" in which some message along the lines of "no relevant matches found" is shown, and one can either switch to a different way of searching for the information, or abandon the search entirely.  However, there are also more ambiguous "intermediate modes" in which the search engine returned some hits that partially matched one's query, but do not seem to be exactly what one wants.  This can lead to a somewhat frustrating and time-wasting exercise of investigating false leads and continuing fruitless queries (which could happen for instance if one had misremembered a key detail of the information one is trying to search for, and misconfiguring the search queries accordingly).  This is often a significantly worse user experience than if the search engine had simply displayed a "failure mode" immediately. (1/3)
       
 (DIR) Post #AxVzlEMfSx6bp6ZPtY by tao@mathstodon.xyz
       2025-08-25T01:30:14Z
       
       0 likes, 0 repeats
       
       In contrast, AI chatbots are usually tuned to avoid a "failure mode" as much as possible, at the expense of increasing the occurrence of "intermediate modes" where the chatbot response looks potentially useful, and invites further interaction from the user, but is not exactly providing what the user wants, and could contain hallucinations or some fundamental misunderstanding of the task that would take significant effort to uncover.  Paradoxically, such tools may become significantly more useful if they simply reported that they were unable to provide a high quality answer to a query in such cases.A comparison may be drawn with the increasingly advanced, but stringently verified, "tactics" used in a modern proof assistant such as Lean.  I have been experimenting recently with the new tactic `grind` in Lean, which is a powerful tool (inspired more by "good old-fashioned AI" such as satisfiability modulo theories (SMT) solvers, than modern data-driven AI) to try to close complex proof goals if all the tools needed to do so are already provided in the proof environment; roughly speaking, this corresponds to proofs that can be obtained by "expanding everything out and trying all obvious combinations of the hypotheses".  When I apply `grind` to a given subgoal, it can report a success within seconds, closing that subgoal in a Lean-verified fashion and allowing me to move on to the next subgoal.  But, importantly, when this does not work, I quickly get a "`grind` failed" message, in which case I simply delete `grind` from the code and proceed by a more pedestrian sequence of lower level tactics.  (2/3)
       
 (DIR) Post #AxVzlFYl1UN1WtEaNU by tao@mathstodon.xyz
       2025-08-25T01:30:30Z
       
       0 likes, 0 repeats
       
       While `grind` does have some additional tuneable parameters and options that I could try to tweak to try to coax it to work, I am finding that the optimal way for me to use this tool is not to try to explore too much the "intermediate modes" in which I try to repeatedly reconfigure the tool: if I cannot use `grind` to close the goal after one or two such attempts, I simply move on.  I still find the tool to be a non-trivial time-saver in many contexts, but in some circumstances the way it saves time is by promptly signaling that it is not the appropriate tool for the task at hand.Are there any prominent examples of chatbots that are tuned to report failure modes instead of intermediate modes when the confidence level of their answer is low?  I hypothesize that these may actually be a more useful and less frustrating type of assistant to incorporate into complex workflows, despite being seemingly less "powerful".  (3/3)
       
 (DIR) Post #AxVzlGaZCFPain5Xpw by mdcory@mastodon.social
       2025-08-25T01:34:32Z
       
       1 likes, 0 repeats
       
       @tao Here is the appropriate film scene and quote. https://www.youtube.com/watch?v=MpmGXeAtWUw
       
 (DIR) Post #AxVzswvcB54y3ub4jI by highergeometer@mathstodon.xyz
       2025-08-25T02:50:38Z
       
       0 likes, 0 repeats
       
       @tao "invites further interaction from the user," the cynic in me says this is (consciously or unconsciously) by design, because it increases the tokens used and hence the spend. If an LLM just said "I couldn't find such a quote", then someone who is not yet satisfied will leave and try a different tool.
       
 (DIR) Post #AxVzsxs6fbrozJxmts by shironeko@fedi.tesaguri.club
       2025-08-25T03:06:10.253359Z
       
       0 likes, 0 repeats
       
       @highergeometer @tao this is probably right as even before LLM google search has gone down this path of using lower quality result to boost ad revenue.