fsebugoutzone.org:9999

       Post ATcH5LrBI3llDO8a2K by StuartGray@mastodonapp.uk
 (DIR) More posts by StuartGray@mastodonapp.uk
 (DIR) Post #ATcEwR2ET6x2lCvXcm by TedUnderwood@sigmoid.social
       2023-03-14T19:20:08Z
       
       0 likes, 0 repeats
       
       Okay, I&#39;m convinced. #GPT4 is better in ways that will matter for things humanists want to do. Here I gave it a (very obscure, recent) short story—except for the last few paragraphs—and asked it to speculate about the ending. ChatGPT is like &quot;I don&#39;t know, man.&quot; GPT-4 catches that a mysterious character is wearing sandals and might be a time traveler from the Roman past. 4 on the left (black icon).
       
 (DIR) Post #ATcH5LrBI3llDO8a2K by StuartGray@mastodonapp.uk
       2023-03-14T19:44:09Z
       
       0 likes, 0 repeats
       
       @TedUnderwood Nice comparison.I had similar results with ChatGPT. I found that getting it to either extrapolate, create additional content, or suggest new alternatives was a mixed bag. Sometimes it worked, sometimes it didn&#39;t.I had the distinct impression that I was running up against deliberate restrictions each time it failed, and usually required creative prompt editing to get the desired result.However, your example suggests that may just be &quot;cover&quot; for it&#39;s lack of context.
       
 (DIR) Post #ATcHc3wyt3PXEeRqdM by TedUnderwood@sigmoid.social
       2023-03-14T19:50:04Z
       
       0 likes, 0 repeats
       
       @StuartGray Yeah, it&#39;s hard to separate. Idk how much RLHF GPT-4 has received. But my initial reaction is that it&#39;s less cautious.
       
 (DIR) Post #ATcHzZRrwYqHaN6aYq by StuartGray@mastodonapp.uk
       2023-03-14T19:54:18Z
       
       0 likes, 0 repeats
       
       @TedUnderwood It also raises an interesting point that I&#39;d not really considered until now.When LLMs &quot;fail&quot; in their output, or are otherwise unable to produce a reliable response (say, due to lack of relevant training data points), are they being used to generate the subsequent output but with a modified/injected prompt for known fail conditions?And if so, how could we ever tell?