[HN Gopher] RWKV RNN: Better than ChatGPT?
___________________________________________________________________
RWKV RNN: Better than ChatGPT?
Author : pffft8888
Score : 72 points
Date : 2023-03-23 20:45 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jacobn wrote:
| From the project page: pronounced as "RwaKuv"
|
| That is still quite challenging to pronounce, maybe one of "rwkv"
| -> "raw-kv" -> "rawk-v" -> "rock-v"?
| dragonwriter wrote:
| "RwaKuv" seems like it would pretty closely match "Rock of"
| tjr wrote:
| Rocky V?
| adeon wrote:
| I've followed updates on this project r/machinelearning and for
| me the existence of projects like this is some good evidence that
| the OpenAI moat is not that strong. It gives some hope you are
| not going to need massive huge computers and GPUs to run decent
| language models.
|
| I hope this project will thrive.
| serverholic wrote:
| [dead]
| GaggiX wrote:
| The best thing about this model is that it has O(T) speed and
| O(1) memory during inference vs the O(T^2) speed and O(T) memory
| (flash memory) of a GPT model, still it can be trained in
| parallel like a GPT model.
| pffft8888 wrote:
| In addition,
|
| 1) it's open source.
|
| 2) you can run it yourself so the rug won't be pulled from
| under you when they decide to shutdown and move users up to the
| next version or another product as they've done with the older
| text-davinci models.
|
| 3) you get to align it (using RLFH) as opposed to a corporation
| dictating what is "aligned" and what is "safe."
|
| 4) you won't have to deal with government led censorship. For
| example, instead of the FBI using JIRA to manage a list of URLs
| to be censored (as they did according to the latest
| revelations) they can train the AI to self-censor as Bing has
| done.
|
| 5) you won't be using the product of a company that was started
| as a non-profit with $100M donation (from Elon Musk) to promote
| transparent AI only to take that money and turn into a for-
| profit company and close-source the AI.
| pffft8888 wrote:
| What test cases do folks here recommend for measuring this new
| model's ability to reason? and, specifically, if it can reason
| about code with similar (or better!) performance to ChatGPT4? Has
| anyone managed to get it running locally?
| gooseus wrote:
| OpenAI has been collecting a ton of evals here
| https://github.com/openai/evals with many of them including
| some comments about how well GPT-4 does vs GPT-3.5.
|
| You could clone that repo, adapt the oaieval script to run
| against different APIs, then run the evals against both and
| compare the results.
| macrolocal wrote:
| The author claims 61.0% on WinoGrande vis-a-vis GPT-4's 87.5%.
| pffft8888 wrote:
| "you can fine-tune RWKV into a non-parallelizable RNN (then
| you can use outputs of later layers of the previous token) if
| you want extra performance."
|
| Is that 61% using the non-parallelizable RNN mode or the
| standard mode? I wonder if it's the latter.
|
| This new model may be a viable alternative to ChatGPT, which
| is not only closed sourced but can be shut down in the future
| just as they did with the older text-davinci models.
|
| Plus, the alignement and safety has rendered ChatGPT useless
| for helping with areas such as critical analysis of social
| issues (that go against the aligned views) and any and all
| critical thinking that goes against the aligned views of
| those who own and program ChatGPT. This could a viable free
| (as in freedom) alternative.
| macrolocal wrote:
| I think the Cambrian explosion is just beginning.
| MaxikCZ wrote:
| I can't seem to find it in GitHub repo, do you know the value
| for ChatGPT before it switched to GPT-4?
| macrolocal wrote:
| Here are a few benchmarks:
|
| https://paperswithcode.com/sota/common-sense-reasoning-on-
| wi...
| pffft8888 wrote:
| Imagine having ChatGPT level AI running in an ASIC inside
| earphones. This could be like an always-on buddy, available
| offline and able to access resources when you're connected.
|
| Or in Google Glasses. The Readme states that it's more optimized
| for ASIC than the transformer architecture used by ChatGPT.
| [deleted]
___________________________________________________________________
(page generated 2023-03-23 23:00 UTC)