Post ATcH5LrBI3llDO8a2K by StuartGray@mastodonapp.uk
 (DIR) More posts by StuartGray@mastodonapp.uk
 (DIR) Post #ATcEwR2ET6x2lCvXcm by TedUnderwood@sigmoid.social
       2023-03-14T19:20:08Z
       
       0 likes, 0 repeats
       
       Okay, I'm convinced. #GPT4 is better in ways that will matter for things humanists want to do. Here I gave it a (very obscure, recent) short story—except for the last few paragraphs—and asked it to speculate about the ending. ChatGPT is like "I don't know, man." GPT-4 catches that a mysterious character is wearing sandals and might be a time traveler from the Roman past. 4 on the left (black icon).
       
 (DIR) Post #ATcH5LrBI3llDO8a2K by StuartGray@mastodonapp.uk
       2023-03-14T19:44:09Z
       
       0 likes, 0 repeats
       
       @TedUnderwood Nice comparison.I had similar results with ChatGPT. I found that getting it to either extrapolate, create additional content, or suggest new alternatives was a mixed bag. Sometimes it worked, sometimes it didn't.I had the distinct impression that I was running up against deliberate restrictions each time it failed, and usually required creative prompt editing to get the desired result.However, your example suggests that may just be "cover" for it's lack of context.
       
 (DIR) Post #ATcHc3wyt3PXEeRqdM by TedUnderwood@sigmoid.social
       2023-03-14T19:50:04Z
       
       0 likes, 0 repeats
       
       @StuartGray Yeah, it's hard to separate. Idk how much RLHF GPT-4 has received. But my initial reaction is that it's less cautious.
       
 (DIR) Post #ATcHzZRrwYqHaN6aYq by StuartGray@mastodonapp.uk
       2023-03-14T19:54:18Z
       
       0 likes, 0 repeats
       
       @TedUnderwood It also raises an interesting point that I'd not really considered until now.When LLMs "fail" in their output, or are otherwise unable to produce a reliable response (say, due to lack of relevant training data points), are they being used to generate the subsequent output but with a modified/injected prompt for known fail conditions?And if so, how could we ever tell?