[HN Gopher] MiniGPT-4
___________________________________________________________________
MiniGPT-4
Author : GaggiX
Score : 790 points
Date : 2023-04-17 08:56 UTC (14 hours ago)
(HTM) web link (minigpt-4.github.io)
(TXT) w3m dump (minigpt-4.github.io)
| macrolime wrote:
| A modified BLIP2 that works with diffusion models rather than
| vision transformers would be pretty cool. Using Vicuna-13B or
| another large language model as the language model of Stable
| Diffusion, instead of the text encoder of CLIP, would be a game
| changer, it would completely change how you prompt Stable
| Diffusion from something kinda like a tag list to it being able
| to follow instructions in English instead.
| jokoon wrote:
| Are there more university/research centric resource of
| communities regarding AI, I mean things on a more epistemic and
| higher level, that disregards current ML techniques and
| implementations?
|
| I have no problem when there is a lot of money spent to implement
| things that can be used and tested, but I wish I could read about
| the next step of AI, something that involves science and not
| software engineering.
|
| I've yet to read anything related to the analysis of trained data
| or trained neural networks, the structure of the brains of small
| insects, psychology on general intelligence, AI and emotions, the
| real ability of an AI to think and write meaningful code or
| understand what is a problem/solution etc, I mean anything that
| is related to AI that can be treated by computer science.
|
| I have no problems with algorithms and implementations, but don't
| university/professors also try to go beyond, with epistemology,
| to connect the dots with biology and neurology?
|
| Unless I see science cover several fields of research, it's not
| real, worthy research in my view, which is why the reason why I'm
| so stubbornly skeptical of chatGPT.
| thewarrior wrote:
| We've tried but we understand even less about how "real"
| intelligence works. Atleast in a way that we can replicate it.
|
| The people doing so called alchemy are making far better
| systems that anyone who's trying to actually understand things.
| And we've been trying for decades now.
| visarga wrote:
| If the engineering part is of secondary importance for you,
| then at least remember about the dataset. It is in the
| composition and quality of the training data that all the
| skills find their origin.
|
| Most of our discussions are about model size, but few about
| dataset. Yet all the scaling laws hint at the great usefulness
| of more data. Sometimes even little data can have a great
| impact in the fine-tuning phase. In the end it is the training
| data that transforms a random init into the model.
| Tepix wrote:
| I think it's poor form that they are taking the GPT-4 name for an
| unrelated project. After all, the underlying Vicuna is merely a
| fine-tuned LLaMA. Plus they use the smaller 13B version.
|
| The results look interesting, however.
|
| Here's hoping that they'll add GTPQ 4bit quantizing so the 65B
| version of the model can be run on 2x 3090.
| seydor wrote:
| The way things are going, very soon GPT-4 will not be something
| to aspire to be
| twelvechairs wrote:
| Its poor form to be calling it 'Open' AI. So I guess its swings
| and roundabouts. Everyone is leeching where they can.
| iandanforth wrote:
| To be fair they were open when that name was picked and it
| looks like they may be trying to transition to just 'ai.com'.
| teacpde wrote:
| Wow, that must be an expensive domain name.
| was_a_dev wrote:
| I'm sure they can afford it
|
| But justai.com would also be apt
| philosopher1234 wrote:
| they already did afford it. it redirects to chatgpt
| xiphias2 wrote:
| wow, how come it's not known?
| sebzim4500 wrote:
| What about this url redirect is interesting enough to you
| that you think it should be widely known?
| alsodumb wrote:
| It's pretty clear that they're doing it for citation farming.
| I've seen another paper from KAUST that uses fancy title on a
| trivial paper.
| collaborative wrote:
| Adding GPT4 to anything now increases marketing x4. So many AI
| news coming out lately that not adding it risks drowning in a
| sea of info.. even in the case of a good project
| Uehreka wrote:
| The word for this is "trademark infringement". You are
| specifically not allowed to capitalize on the marketing of
| another entity's product to bolster yours by implying through
| your name that you are somehow related. This is why "DALL-E
| Mini" had to change their name to craiyon.
| HarHarVeryFunny wrote:
| It's also just (deliberately) misleading. It's based on the
| 13B Vicuna/Llama model, not 175B GPT-3 or 1T GPT-4. There
| is zero justification for calling it MiniGPT-4.
|
| A more honest name would be Visual-Vicuna or Son-of-BLIP.
| sebzim4500 wrote:
| I don't see how it's misleading. MiniGPT-4 makes it sound
| like a smaller alternative to GPT-4, if it was based on
| GPT-4 there would be nothing 'mini' about it.
| HarHarVeryFunny wrote:
| It has more in common with GPT-3 than GPT-4 in terms of
| size, but in reality it's based on Vicuna/Llama which is
| 10x smaller than either, so as far as the LLM part of it
| goes its not mini-anything - it's just straight-up Vicuna
| 13B.
|
| The model as a whole is just BLIP-2 with a larger linear
| layer, and using Vicuna as the LLM. If you look at their
| code it's literally using the entire BLIP-2 encoder
| (Salesforce code).
|
| https://arxiv.org/pdf/2301.12597.pdf
| [deleted]
| Tepix wrote:
| > _1T GPT-4_
|
| The number of parameters used for GPT-4 is unknown.
| HarHarVeryFunny wrote:
| I got the 1T GPT-4 number from here - this is the video
| that goes with the Microsoft "Sparks of AGI" paper, by a
| Microsoft researcher that had early access to GPT-4 as
| part of their relationship with OpenAI.
|
| https://www.youtube.com/watch?v=qbIk7-JPB2c
| sandkoan wrote:
| Bubeck has clarified that the "1 trillion" number he was
| throwing around was just a hypothetical metaphorical--it
| was in no way shape or form implying that GPT-4 has 1
| trillion parameters [0].
|
| [0] https://twitter.com/SebastienBubeck/status/1644151579
| 7238251...
| HarHarVeryFunny wrote:
| OK - thanks!
|
| So we're back to guessing ...
|
| A couple of years ago Altman claimed that GPT-4 wouldn't
| be much bigger than GPT-3 although it would use a lot
| more compute.
|
| https://news.knowledia.com/US/en/articles/sam-altman-q-
| and-a...
|
| OTOH, given the massive performance gains scaling from
| GPT-2 to GPT-3, it's hard to imagine them not wanting to
| increase the parameter count at least by a factor of 2,
| even if they were expecting most of the performance gain
| to come from elsewhere (context size, number of training
| tokens, data quality).
|
| So in 0.5-1T range, perhaps ?
| collaborative wrote:
| At this point the letters GPT make more sense than "AI" or
| "LLM" in many peoples minds
| Uehreka wrote:
| Hard disagree.
|
| Outside of the brand name ChatGPT, lay members of the
| general public are way more likely to call these chatbots
| (like Bard and Bing) "AIs" than "GPTs". And although GPT
| could technically refer to any model that uses a
| Generative Pre-trained Transformer approach (although it
| probably wouldn't be an open-and-shut case), the mark
| "GPT-4" definitely is associated with OpenAI and their
| product, and you can't just use it without their
| permission.
| collaborative wrote:
| So OpenAI ostensibly owns "GPT4" according to your
| argument. But does it own "MiniGPT4"? I hope you see the
| absurdity of this.
|
| Let's not discuss the amount of copyright licenses OpenAI
| has already infringed, too
| Uehreka wrote:
| I'll put it this way:
|
| At Brewer's Art in Baltimore, MD they just released a
| beer called GPT (Green Peppercorn Tripel)[1]. They're
| likely allowed to do that because a reasonable consumer
| would probably not actually think they had collaborated
| with OpenAI, because OpenAI does not make beer.
|
| OP is releasing a model called "MiniGPT-4". A reasonable
| consumer could look at that name and become confused
| about the origin of the product, thinking it was from
| OpenAI. This would be understandable, since OpenAI also
| makes large language models and has a well known one that
| they've been promoting whose brand name is "GPT-4". If
| MiniGPT-4 does not meet that consumer's expectation of
| quality (which has been built up through using and
| hearing about GPT-4) it may cause them to think something
| like "Wow, I guess OpenAI is going downhill".
|
| Trademark cases are generally decided on a "reasonable
| consumer" basis. So yeah, they can seem a little
| arbitrary. But it's important for consumers to be able to
| distinguish the origin of the goods they are consuming
| and for creators to be able to benefit from their
| investment in advertising and product development.
|
| [1] https://www.thebrewersart.com/bottles-cans
| nashashmi wrote:
| They can always say GPT-like. Or miniaturized GPT-like LLM.
| causi wrote:
| Someone needs to write a buyer's guide for GPUs and LLMs. For
| example, what's the best course of action if don't need to
| train anything but do want to eventually _run_ whatever model
| becomes the first local-capable equivalent to ChatGPT? Do you
| go with Nvidia for the CUDA cores or with AMD for more VRAM? Do
| you do neither and wait another generation?
| thewataccount wrote:
| Nvidia and the highest amount of vram you can get.
|
| Currently the 4090, the rumor is the 4090ti will have 48gb of
| vram, idk if its worth waiting or not.
|
| The more VRAM the higher paremeter count you can run all in
| memory (fastest by far).
|
| AMD is almost a joke in ML. The lack of CUDA support (which
| is nvidia proprietary) is straight lethal, and also even
| though ROCM does have much better support these days, from
| what I've seen it's still a fraction of the performance of
| what it should be. I'm also not sure if you need projects to
| support it or not, I know pytorch has backend support for it
| but I'm not sure how easy it is to drop in.
| brandall10 wrote:
| I think the most recent rumors were amended to it having
| 24, unfortunately.
| thewataccount wrote:
| Darn.
|
| I mean in all honestly there's no reason a gaming card
| would need 48gb at the moment when so few games even use
| 24gb.
|
| 48GB really only makes sense for workstation cards.
| causi wrote:
| That's good to know. A lot of people might look at the
| 7900XTX with its 24GB of ram for a grand and think "ah, a
| good deal for fitting an LLM in".
| lhl wrote:
| For a general guide, I recommend:
| https://timdettmers.com/2023/01/30/which-gpu-for-deep-
| learni...
|
| There's a subreddit r/LocalLLaMA that seems like the most
| active community focused on self-hosting LLMs. Here's a
| recent discussion on hardware: https://www.reddit.com/r/Local
| LLaMA/comments/12lynw8/is_anyo...
|
| If you're looking just for local inference, you're best bet
| is probably to buy a consumer GPU w/ 24GB of RAM (3090 is
| fine, 4090 more performance potential), which can fit a 30B
| parameter 4-bit quantized model that can probably be fine-
| tuned to ChatGPT (3.5) level quality. If not, then you can
| probably add a second card later on.
|
| Alternatively, if you have an Apple Silicon Mac, llama.cpp
| performs surprisingly well, it's easy to try for free:
| https://github.com/ggerganov/llama.cpp
|
| Current AMD consumer cards have terrible software support and
| IMO isn't really an option. On Windows you _might_ be able to
| use SHARK or DirectML ports, but nothing will run out of the
| box. ROCm still has no RDNA3 support (supposedly coming w /
| 5.5 but no release date announced) and it's unclear how well
| it'll work - basically, unless you would rather be fighting
| w/ hardware than playing around w/ ML, it's probably best to
| avoid (the older RDNA cards also don't have tensor cores, so
| perf would be hobbled even if you could get things running.
| Lots of software has been written w/ CUDA-only in mind).
| BeefWellington wrote:
| FWIW I had no real issues getting StableDiffusion to run on
| a 6800 I have in one of my systems.
|
| I haven't tried with LLaMA at all.
| lhl wrote:
| 6800 is RDNA2, not RDNA3. The latter is still waiting for
| ROCm support 4 months post-launch:
| https://github.com/RadeonOpenCompute/ROCm/issues/1813
| BeefWellington wrote:
| I'm aware that a 6800 is not RDNA3. You stated broadly:
|
| > Current AMD consumer cards have terrible software
| support and IMO isn't really an option. On Windows you
| might be able to use SHARK or DirectML ports, but nothing
| will run out of the box.
|
| I was merely sharing that I did not have that same
| experience that current consumer cards have terrible
| support.
| lhl wrote:
| Sure, and I was merely clarifying that only last-gen
| architectures work. While I'm glad that SD works for you,
| if we are getting into it, I think that having no support
| on current-gen flagship models _does_ equal broadly
| terrible software support and is more worth highlighting,
| since otherwise someone might assume they could just go
| pick one of those up and get a 24GB GPU on the cheap,
| especially in the context of LLMs (which is what the OP
| was asking about).
|
| For RDNA2, you apparently _can_ get LLMs running, but it
| requires forking /patching both bitsandbytes and GPTQ:
| https://rentry.org/eq3hg - and this will be true for any
| library (eg, can you use accelerate? deepspeed? fastgen?
| who knows, but certainly no one is testing it and AMD
| doesn't care if you're not on CDNA). It's important to
| note again, anything that works atm will still only work
| with last-gen cards, on Linux-only (ROCm does not work
| through WSL), w/ limited VRAM (no 30Bq4 models), and
| since RDNA2 tensor support is awful, if the SD benchmarks
| are anything to go by, performance will still end up
| worse than an RTX 3050:
| https://www.tomshardware.com/news/stable-diffusion-gpu-
| bench...
| AuryGlenz wrote:
| Running is one thing. Running with any sort of speed is
| another.
| l33tman wrote:
| You don't need a GPU, alpaca.cpp runs purely on the CPU if
| you have 32+ GB of RAM for the 30B Llama model. It's a bit
| slow, but you can toy with it..
| avereveard wrote:
| Depends on whether you just want to play around or have
| actual business needs, but to play around I'm waiting for the
| next generation of unified memory architectures
| ilteris wrote:
| Can you point any source re: next generation of unified
| memory architectures? How far are they away?
| avereveard wrote:
| Well currently there's just apple and they are super
| secretive about their roadmap, the M1/2 top line chip
| already perform quite well, and while the top version
| look expensive, a pair of a6000 will definitely eat more
| money both to purchase and to run.
|
| M3 is around the corner tho, and there's some
| announcement to come from intel or arm following their
| partnership. There's also the new card coming from intel
| that is supposed to be aimed squarely at machine learning
| workloads, and they don't have to segment their market by
| memory sizing like Nvidia do, but they aren't well
| supported as device targets, but a pair of these will
| likely be very cost effective if and only if they will
| get credible compatibility with the libraries and models
| Tepix wrote:
| Read Tim Dettmer's guide.
|
| Basically, you want nVidia, and you want lots of VRAM. Buy
| used for much more bang for the buck.
|
| Depending on your budget, get:
|
| - an RTX 3060 with 12GB or
|
| - 1 used RTX 3090 with 24GB (approx twice as expensive as the
| 3060 but twice the VRAM and much faster) or
|
| - 2 used RTX 3090 cards if you need more than 24GB.
|
| Everything beyond that gets quite a bit more expensive
| because then you need a platform with more PCIe lanes, you
| may need more than one PSU and you will have problems fitting
| and cooling everything.
|
| With two cards and 2x24GB you can run the largest version of
| the LLaMA model (the 65B variant) and all its descendants
| with 4-bit quantization inside your GPU's VRAM, i.e. with
| good performance. Can can also try some low resource fine-
| tuning variants (LoRa etc).
|
| Oh and while you're at it also get a decent amount of RAM
| like 64GB or 128GB (it's very cheap right now) and a NVMe
| SSD. These models are quite large.
| mk_stjames wrote:
| Ask: how many parameters is the model? At how many bits? and
| has it been quantized to a smaller number of bits or if not
| can it be without losing quality? multiply number of bits by
| the number of parameters, convert to GB, and then add so
| amount of overhead for however much else needs to load and/or
| how horrible/amazing of an implementation the inference code
| is. Done.
|
| Example: A 30B parameter model trained at 16bit FP gets
| quantized down to 4 bit ints. 4 bits = 0.5 byte. 30 billion *
| 0.5 byte = 15GB of VRAM (plus a GB or few of other overhead)
|
| For more real world discussion see
|
| https://github.com/ggerganov/llama.cpp/issues/13
| redox99 wrote:
| It's pretty simple actually.
|
| Get a 3090 or 4090. Forget about AMD.
| mewpmewp2 wrote:
| Should I get a gaming PC with 4090 if I want to get into
| personal side projects in ML and also games?
|
| Do I need dualboot? Or is Windows good?
| dharma1 wrote:
| 4090 is amazing, but very large card. 3090 is "good
| enough" for ML - same 24gb vram - and you can pick them
| up used for half the price of a new 4090. That's what I
| did.
|
| WSL on windows apparently decent, or native PyTorch, dual
| boot windows/ubuntu still prob best tho.
| artificial wrote:
| Getting CUDA on OpenSUSE was super easy. The Nvidia blob
| drivers are easy to install and CUDA needs another
| download and some copy paste. Even Unreal Editor was
| easier to install than on Windows.
| Regic wrote:
| I'm running models locally on my 3090 and it's fast
| enough, although for example building a vector database
| can take a while. I can run LoRa training but I haven't
| done anything meaningful with it so far. I chose 3090
| because of the cable issue of 4090 (also, no nvlink,
| although I'm not sure that matters) but it's debatable if
| my fears are justified. I need to leave the gpu running
| while I'm away and I just don't feel comfortable doing
| that with a 4090. I rather take the lower performance.
|
| One caveat though, my asus b650e-f is barely supported by
| the currently used ubuntu kernel (e.g. my microphone
| doesn't work, before upgrading kernel + bios I didn't
| have lan connection...) so expect some problems if you
| want to use a relatively new gaming setup for linux.
| thewataccount wrote:
| Windows generally works but there may be a somewhat small
| performance hit. IMO linux is much easier to get to work
| judging by all the github issue threads I see able
| SD/LLaMa stuff on windows - but I don't use windows so I
| dont have personal experience.
|
| 4090 24GB is 1800USD, The Ada A6000 48GB is like 8000USD
| and idk where you buy it? So if you want to run games and
| models locally the 4090 is honestly the best option.
|
| EDIT: I forgot - there is a rumored 4090ti with 48gb of
| vram, no idea if thats worth waiting for.
| wincy wrote:
| If I was going to spend $8000 on a video card I'd hunt on
| eBay for an A100 80GB rather than settle for the A6000
| thewataccount wrote:
| Honestly yeah a used A100 80GB sounds like a better idea.
| macrolime wrote:
| The A6000 is actually the old generation, Ampere. The new
| Ada generation one is called 6000. Seems many places
| still sell A6000 (Ampere) for the same price as RTX 6000
| (Ada) though, even though the new one is twice as fast.
|
| Seems you can get used RTX A6000s for around $3000 on
| ebay.
| thewataccount wrote:
| That.... That explains why I can't find it and makes a
| ton of sense.....
|
| I think that's such a silly name for it, but oh well
|
| Thanks for the correction!
| macrolime wrote:
| Just to add to the confusion, there's another older RTX
| 6000 with 24GB ram. This is from an even older
| generation, same as the GeForce 20 series.
| thewataccount wrote:
| You're kidding? So they called it the RTX 6000, then
| called it the RTX A6000 for ampere, then back to RTX 6000
| for Ada?
|
| Why do they do this? Sometimes consumer products are
| versioned weirdly to mislead customers (like intel cpus)
| - but these wouldn't even make sense to do that with as
| they're enterprise cards?
| macrolime wrote:
| Actually the first one is called Quadro RTX 6000, while
| the Ada one is just RTX 6000 without "Quadro" in front.
| Not that it makes the naming make much more sense.
|
| According to GPT-4 the next generation one will be called
| Galactic Unicorn RTX 6000 :D
| dannyw wrote:
| You could also get more detailed help from
| https://reddit.com/r/buildapc/
| whywhywhywhy wrote:
| Save some money and go 3090, same vram, speed difference
| probably isn't worth the premium for the 4090. Then
| upgrade when the rumored 5090 generational leap happens.
| sampa wrote:
| you can run Windows in VM and passthrough your videocard
| to it. So it's basically a native gaming (3-5% cost) w/o
| reboots. Look for VFIO.
| turmeric_root wrote:
| Windows reserves a certain percentage for VRAM for some
| reason. So I'd recommend Linux. Or find a way to disable
| the desktop/UI in Windows.
| sureglymop wrote:
| I have access to an Nvidia A100. But as a layman, what
| specs does the rest of the system need to use it for some
| real work? I would assume there needs to be at least as
| much ram as vram and maybe a few terabytes of disk space.
| Does anyone have experience with this?
| michaelt wrote:
| If you have an A100, which in its 80GB variant costs
| $23,667 [1], you would not generally quibble over the
| price of a few terabytes of disk space.
|
| [1] https://www.dell.com/en-us/shop/nvidia-
| ampere-a100-pcie-300w...
| seydor wrote:
| At least for llama derivatives, reddit has a list https://www
| .reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...
| dannyw wrote:
| Definitely NVIDIA. Good luck running any real ML models on
| AMD GPUs.
| [deleted]
| iandanforth wrote:
| Filed an issue: Up or down vote there as you see fit.
|
| https://github.com/Vision-CAIR/MiniGPT-4/issues/9
| cloudking wrote:
| It doesn't seem to recognize text from a screenshot very well, I
| gave detailed error messages from a Windows screenshot:
| https://filestore.community.support.microsoft.com/api/images...
|
| and prompted "Describe any issues found in this screenshot and
| steps to resolve them"
|
| while it correctly identified it as a screenshot from a computer,
| it gave a very generic response and didn't identify the error
| messages correctly:
|
| "The screenshot shows a computer screen with a warning message
| that reads, "The system is currently unavailable. Please try
| again later." This indicates that there is an issue with the
| computer system that needs to be resolved.
|
| To resolve this issue, you can try the following steps:
|
| 1. Check the system status: Check the system status to see if
| there are any errors or issues that need to be resolved.
|
| 2. Restart the computer: Try restarting the computer to see if
| that resolves the issue.
|
| 3. Check for updates: Check for any updates that may have been
| released for the computer system and apply them if necessary.
|
| 4. Check for viruses: Run a virus scan on the computer to check
| for any viruses that may be causing the issue.
|
| 5. Check for hardware issues: Check for any hardware issues that
| may be causing the issue. This could include issues with the
| power supply, hard drive, or other components.
|
| 6. Contact technical support: If none of the above steps resolve
| the issue, contact technical support for further assistance. They
| may be able to provide additional troubleshooting steps or help
| you resolve the issue."
| HansHamster wrote:
| Just tried it with a random schematic:
|
| > The image shows a block diagram of a power amplifier circuit
| with a power supply and a load resistor. The power supply is
| connected to the input of the amplifier, which is then connected
| to the output. The load resistor is connected to the output of
| the amplifier and the power supply. The amplifier is designed to
| amplify the input signal to a higher level, which is then sent to
| the load resistor to power the load. The power supply provides
| the necessary voltage to operate the amplifier.
|
| On the one hand I'm impressed that it recognises an amplifier,
| but just about everything in the description is (slightly) wrong.
| It is not a power amplifier, the power supply is not connected to
| the input (why should it be?), there is no load resistor...
| mftb wrote:
| I also got middling results. I gave it a novel NASA picture of
| the moon passing in front of the earth, taken by the Deep Space
| Climate Observatory Satellite, so an unusual vantage point.
| First it was just wrong, and said that it was a picture of a
| crescent moon taken from a mountain top, or something like
| that. Not terrible at least it recognized the moon, but wrong.
| I tried to give it a chance and corrected it with the
| additional info that the picture was taken from space. It then
| did better, but on the last line it threw in that it was not a
| photograph, but a rendering of some sort, which again, is just
| wrong.
|
| More and more I get the impression that people are so taken
| with the chat interface that they are grading these models more
| favorably than is warranted.
| HansHamster wrote:
| And after pointing out all the mistakes in the description,
| it took a very ... interesting approach to address my
| remarks:
|
| > I apologize for the confusion. The image shows a block
| diagram of a basic amplifier circuit with a power supply, but
| it does not show any input stages, output stages, or load
| resistors. [...] The input stage is made up of an operational
| amplifier, which is used to amplify the input signal. The
| output of the input stage is then connected to the power
| supply, which provides the necessary voltage to operate the
| amplifier. The amplifier is designed to amplify the input
| signal to a higher level, which is then sent to the load
| resistor to power the load.
|
| So it acknowledges the remarks, describes everything that is
| _not_ in the image and then puts the stuff back into the
| description anyway :)
| aabajian wrote:
| From a radiology world this is fascinating. I'm not worried about
| job security as I'm an interventionalist. What I'm wondering is
| about go-to-market strategies for diagnostics. I do some
| diagnostic reads and I would love to have something like this
| pre-draft reports (especially for X-Rays). There are tons of "AI
| in rads" companies right now, none of which have models that come
| anywhere close to GPT-4 or even this. Personally, I would not
| _start_ such a company because the barrier to entry is now so
| low. I think there needs to be an open-source project that just
| subtrains these models on rads images and is available to all
| radiologists.
| euroderf wrote:
| Your profession and... a few hundred others ?
| deeplstm wrote:
| Here's the video summary https://youtu.be/uPV9Gk3IC-g
| atgctg wrote:
| It's hard to distinguish non-Google projects with Google Sans in
| their templates from actual Google Research papers, as the font
| is meant to be exclusively used by Google[1].
|
| [1]
| https://developers.google.com/fonts/faq#how_can_i_get_a_lice...
| sebzim4500 wrote:
| Surely most people would read the authors list to determine
| provenance rather than the font?
| pxeger1 wrote:
| I didn't think about it consciously but I think I did
| implicitly assume it was a Google project because of the font
| tarruda wrote:
| Looking forward to the next generation of cheap GPUs with enough
| VRAM to run models like Vicuna-13 locally.
| adam_arthur wrote:
| A lot of people are running fairly powerful models directly on
| the CPU these days... seems like inference will not be a GPU
| exclusive activity going forward. Given that RAM is the main
| bottleneck at this point, running on CPU seems more practical
| for most end users
|
| See: https://news.ycombinator.com/item?id=35602234
| valine wrote:
| Cheap is relative I suppose.
|
| I'm running Vicuna 13b 16f locally and it needs 26GB of VRAM,
| which won't even fit on a single RTX 4090. The next gen RTX
| Titan might have enough vram but that won't come cheap. I'm
| expecting a price point above $2500.
| 22c wrote:
| I'm not sure if it's the point GP is trying to make, but I
| would like to see GPUs with extra VRAM that don't have the
| extra compute. eg. similar performance of a 4070Ti but with
| 24GB or 32GB of VRAM.
|
| I don't see a really good reason why OEMs couldn't do that
| now, in the past there have been OEM cards that have more
| VRAM than the reference design. I'm sure there's an appetite
| for cards like that for people who don't want to refinance
| their home loan to get 2 x RTX 4090 cards.
| Tepix wrote:
| I'm fairly sure that NVIDIA is making sure that consumer
| cards are no good alternative to their $10000 80GB VRAM
| A100 cards.
| ggregoire wrote:
| Could you use something like this to summarize the content of a
| video? Or would you first need to extract the images from the
| video (like 1 fps, with ffmpeg) and tell the AI to analyze all
| the images and summarize them like if it was a video?
| all2 wrote:
| I'd do this using both streams of information: audio and video.
|
| I'd segment the audio semantically based on the topic of
| discussion, and I'd segment the video based on editing,
| subjects in scene, etc. We could start simply and just have a
| "timestamp": [ subjects, in, frame] key-value.
|
| It'd take some fiddling to sort how to mesh these two streams
| of data back together. The first thing I'd try is segment by
| time chunks (the resolution of which would depend on min/max
| segment lengths in video and audio streams) and then clump the
| time chunks together based on audio+video content.
| golergka wrote:
| I don't know what unholy abomination they did to the scrolling,
| but after I increased the page font size, it just sticks to the
| bottom and you can't scroll up.
| arjonagelhout wrote:
| I'm experiencing the same thing on iOS Safari, it rapidly jumps
| back and forth between two positions.
| [deleted]
| detrites wrote:
| Right off the bat I lost all scrolling inertia here, it's like
| being in the dark ages.
| raylad wrote:
| I gave it a picture of a bottle, which it did a fairly good job
| of identifying (although it misread some of the text, mistaking a
| 1 for a 4).
|
| Then I asked it what are the likely ingredients of the product.
| It still hadn't replied after 2274s so I gave up on it.
| simonw wrote:
| Hate to be the person complaining about the name, but we already
| saw how this plays out with DALL-E mini: if you name your project
| directly after something else like this, no matter how much extra
| explanatory text you attach to it a large number of people will
| assume it's an "official" variant of the thing it was named
| after.
|
| Eventually you'll have to rename it, either to resolve the
| confusion or because OpenAI pressure you to do so, or both.
|
| So better to pick a less confusing name from the start.
|
| (This one is even more confusing because it's about image inputs,
| but GPT4 with image inputs had not actually been released to
| anyone yet - similar in fact to how DALL-E mini got massive
| attention because DALL-E itself was still in closed preview)
| xiphias2 wrote:
| ,,So better to pick a less confusing name from the start.''
|
| I'm not sure if that's better from a marketing standpoint
| though....it works, you still remember DALL-E mini
| jiwidi wrote:
| a bit nasty to call something minigpt-4 when it comes from llama
| wsgeorge wrote:
| Giving this a quick spin and I like what I'm seeing. I gave it a
| picture of Paolo Veronese's Lament over the Dead Christ [0] and
| asked what was going on in the background.
|
| > The painting depicts the scene of the crucifixion of Jesus
| Christ. In the foreground, there is a group of people standing
| around the cross, including Mary, the mother of Jesus, and the
| two thieves who were crucified with him. In the background, there
| is a hill with a group of people standing on it, watching the
| crucifixion. The sky is dark and stormy, with lightning
| illuminating the scene. The overall mood of the painting is one
| of sadness and tragedy.
|
| Then I asked for a sonnet on it:
|
| "In this painting, we see the scene of the cross
|
| Where Jesus is crucified, with Mary at his side
|
| And the thieves who were crucified with him
|
| In the background, a hill with people standing
|
| Watching the tragic event unfold
|
| The sky is dark and stormy, with lightning flashing
|
| Illuminating the scene of the crucifixion
|
| The mood is one of sadness and tragedy
|
| As we contemplate the sacrifice of Jesus
|
| But even in death, he shows us the way
|
| To love and forgiveness, to live another day
|
| And so we remember the crucifixion
|
| And the love that Jesus showed us that day."
|
| So...not quite what I was expecting, but it's progress!
|
| [0]
| https://en.wikipedia.org/wiki/Lament_over_the_Dead_Christ_(V...
| omeid2 wrote:
| What did you expect?
| wsgeorge wrote:
| A 14-line poem with a consistent rhyme scheme and meter.
| Perhaps my request should have been more specific.
| pmoriarty wrote:
| From my experimentation, LLMs tend to kind of suck at rhyme
| and meter, and all but the simplest types of poetry, so
| even if you'd specified it probably wouldn't have been able
| to deliver.
|
| This is definitely something they could be trained to be
| much better at, but I guess it's hasn't been a priority.
| astrange wrote:
| GPT4 is surprisingly good at it, considering BPE
| tokenization means it shouldn't be able to do rhyme at
| all.
| jcims wrote:
| Has anyone tried using phonetic tokens instead of text?
| I'm curious if that would help with things like rhyming.
| causi wrote:
| I never know what to expect anymore. We live in a world where
| computers can describe paintings and write sonnets about them
| but a half-trillion dollar car company can't figure out how
| to parallel park with eight cameras.
| astrange wrote:
| That's a well known result from the last AI hype era.
|
| https://en.wikipedia.org/wiki/Moravec%27s_paradox
|
| Although, I'm not surprised AI can describe paintings, but
| I'm still pretty surprised it can generate them. How'd we
| get that to work!?
| causi wrote:
| Yeah, and I'm still completely lost as to why resolution
| is such a limiting factor. If you know you're drawing a
| soccer ball why is a 512x512 soccer ball so much easier
| than a 1024x1024 soccer ball?
| gary_0 wrote:
| There are a few cases where people have used ChatGPT to
| generate SVG[0], with mostly unimpressive results. I'm
| sure sooner or later models will be developed
| specifically for creating vector drawings instead of
| raster, including with the ability to apply textures and
| gradients.
|
| Also, the resolution of Stable Diffusion's output isn't
| much of a limitation if you're willing to use other tools
| to massage the output into something professional-
| quality. See [1]
|
| [0] For example:
| https://praeclarum.org/2023/04/03/chatsvg.html
|
| [1] https://news.ycombinator.com/item?id=35592847
| moffkalast wrote:
| DriveGPT as it hits 2 parked cars and runs over a dog:
| "Apologies for the confusion earlier. You are correct that
| the cars were in the planned local planner path..."
| dceddia wrote:
| Sorry, yes, that was a dog.
|
| Let me try again and this time I will definitely not hit
| anything.
|
| Sorry, that was another dog.
| brycedriesenga wrote:
| "Was that another dog?"
|
| BingDrive: I'm sorry, but I prefer not to continue this
| conversation.
| gary_0 wrote:
| It's like something right out of a Douglas Adams book.
| niemandhier wrote:
| That's because the decided they do not need lidar.
| valine wrote:
| Near term it's a frustrating decision, but if these gpt4
| vision LLMs are anything to go by it will prove to be the
| right decision in the long term.
| rafram wrote:
| Why wouldn't LIDAR _in addition to_ computer vision with
| cameras be a strictly better idea?
| bradstewart wrote:
| It's all trade offs. I'm just spitballing here, but if
| you have limited resources, you can either spend
| cash/time on lidar or invest in higher-quality mass-
| produced optics, or better computer vision software. If
| you get to a functional camera-only system sooner, might
| everyone be better off as you can deploy it more rapidly.
|
| Manufacturing capacity of lidar components might be
| limited.
|
| Another might be reliability/failure modes. If the system
| relies on lidar, that's another component that can break
| (or brownout and produce unreliable inputs).
|
| So in a vaccum, yea a lidar+camera system is _probably_
| better, but who knows with real life trade offs.
|
| (again, I just made these up, I do not work on this
| stuff, but these are a few scenarios I can imagine)
| astrange wrote:
| Remember that "cameras" aren't as good as human
| perception because human eyes interact with the
| environment instead of being passive sensors. (That is,
| if you can't see something you can move your head.)
|
| Plus we have ears, are under a roof so can't get rained
| on, are self cleaning, temperature regulating, have much
| better dynamic range, wear driving glasses...
| spaceman_2020 wrote:
| And we still get into millions of accidents every year...
| program_whiz wrote:
| I keep hearing this argument over and over, but I find it
| uncompelling. As a relatively young person with good
| vision, who has never been in an accident after many
| years of driving, and who doesn't make the kind of simple
| mistakes I've seen the absurd mistakes self-driving cars
| make and I would not trust my life to a self-driving car.
|
| Asking people to accept a driverless car based on over-
| arching statistics is papering over some very glaring
| issues. For example, are most accidents in cars being
| caused by "average" drivers or are they young / old /
| intoxicated / distracted / bad vision? Are the statistics
| randomly distributed (e.g. any driver is just as likely
| as the next to get in accidents)? Because the driverless
| cars seem to have accidents at random in unpredictable
| ways, but human drivers can be excellent (no accidents,
| no tickets ever), or terrible (drive fast, tickets, high
| insurance, accidents, etc). The distribution of accidents
| among humans is not close to uniform, and is usually
| explainable. I wouldn't trust a poor human driver on a
| regular basis, nor would I trust an AI because I'm
| actually a much better driver than both (no tickets, no
| accidents, can handle complex situations the AI can't).
| Are the comparisons of human accidents being treated as
| homogenous (e.g. the chance of ramming full speed into a
| parked car the same as a fender-bender?). I see 5.8M car
| crashes anually, but deaths remain fairly low (~40k,
| .68%), vs 400 driverless accidents with ~20 deaths (5%),
| I'm not sure we're talking about the same type of
| accidents.
|
| tl;dr papering over the complexity of driving and how
| good a portion of drivers might be by mixing non-
| homogenous groups of drivers and taking global statistics
| of all accidents and drivers to justify unreliable and
| relatively dangerous technology would be a strict
| downgrade for most good drivers (who are most of the
| population).
| rootusrootus wrote:
| Which sounds like a lot until you realize 1) we drive
| over three _trillion_ miles a year in the US, and 2) the
| majority of those accidents are concentrated to a
| fraction of all drivers. The median human driver is quite
| good, and the state of the art AI isn 't even in the same
| galaxy yet.
| Philip-J-Fry wrote:
| It's not lidar they need. BMW, Mercedes, Porsche, etc.
| All can park themselves almost perfectly every time.
| Teslas can't, and will take 5x as long to do it when they
| can, if the computer can even bother to recognise the
| space.
|
| It's software. Original Teslas with AP1 better than
| Teslas own in house software on their latest AP.
| causi wrote:
| Sure, sure, but would it have killed them to drop in a
| few five dollar "don't hit this object" ultrasonic
| proximity sensors?
| michaelt wrote:
| While ultrasonic sensors would be fine for parking, they
| don't have very good range so they aren't much help in
| avoiding, for example, crashing into stationary fire
| trucks or concrete lane dividers at freeway speeds.
| itsyaboi wrote:
| Just disable autopilot 0.00001 seconds before impact and
| it becomes the driver's fault.
| tough wrote:
| Couldn't afford*
| bugglebeetle wrote:
| Don't worry. That's valuation has always been ZIRP nonsense
| and will fall with Musk's addiction to publicly
| demonstrating his incompetence.
| pmoriarty wrote:
| How about asking it to describe a Robert Williams painting?
|
| Here's a relatively easy one: [1]
|
| [1] -
| http://www.illustrationartgallery.com/acatalog/WilliamsTimmy...
| wg0 wrote:
| I tried that and here's what it has to say:
|
| > This painting appears to be a depiction of a child running
| down a dirt road with a kite in hand. The child is wearing a
| yellow shirt and blue shorts, and the background is filled
| with various colors and shapes. The overall mood of the
| painting is playful and carefree.
|
| What's amazing is that it has correctly described the scene
| including details such as: - A boy - On
| a dirt road. - Having a Kite in hand. - Wearing
| Yellow shirt and blue shorts.
| jcims wrote:
| Used this prompt
|
| Could you describe the image as thoroughly as possible?
| Please include the content, any noteworthy symbology,
| techniques or styles, what kind of media, guess at the artist
| or identify any attribution, then attempt to describe the
| theme or idea it represents?
|
| Got this
|
| The image is a painting of a child running down a dirt road
| with a kite in hand. The child is wearing a yellow shirt and
| blue shorts. The background is a mix of urban and rural
| elements, with buildings and trees in the background and a
| field of grass in the foreground. The style of the painting
| is a mix of realism and abstraction, with a focus on the
| child and the kite. The artist may be attempting to convey a
| sense of freedom and playfulness, with the child running with
| the kite representing the joy and wonder of childhood.
|
| The medium used is oil paint on canvas. The artist may have
| used brushes and other tools to create the different textures
| and effects in the painting. The artist may have also used
| underpainting to establish the composition and values before
| adding the final layers of color and detail.
|
| It is difficult to identify the artist without more
| information, but the style and technique of the painting
| suggest that it may be the work of an artist who is skilled
| in realism and abstraction, and who is interested in
| exploring themes of childhood and playfulness.
| pmoriarty wrote:
| It completely missed what makes this painting distinctive
| and funny/scary, and what's actually going on.
|
| It also missed the right half of the painting.
|
| Try asking it why the painting is called "Timmy's Last
| Surprise".
| dom96 wrote:
| Can this decode CAPTCHAs? Are we finally in a place where
| CAPTCHAs are useless?
| spyder wrote:
| Just tried it with a simple 4 character one and it's bad at it,
| detects 1 or two characters correctly from the 4 if it outputs
| anything. It's probably better with the "select the traffic
| lights" kind of captchas, but those are also already possible
| to solve with other image models too if I remember correctly.
| alex_duf wrote:
| would it be a good things for captchas to be useless? Because
| in my career I've been a few times on the receiving end of spam
| attacks and various nastiness and it's nice to be able to
| integrate such a thing - as unpleasant as it is for the user -
| and solve the problem at hand.
|
| if this disappears then spammers and the various botnets will
| have the upper hand again.
| dom96 wrote:
| It wouldn't be good, did I imply it would? I'm interested to
| know if we are now in a place where they do become useless
| dalys wrote:
| I was also thinking about this the other day.
|
| Idea one: Captchas are to become pretty useless as a "is this a
| human" tactic soon. Maybe it already is, I don't know. What
| other things could we think off to prove someone is human? I
| was watching Lex Fridman and Max Tegmark and they were
| remarking on how Twitter using payment as a differentiator
| between human and bot is actually really good. And maybe the
| only way we can reliably determine if someone is a human or not
| right now. Just by the virtue that having thousands of bots
| doing something, that suddenly costs $5 per event will deter
| most attacks. Integrating online identification systems from
| various countries could be one tactic (such as
| https://en.wikipedia.org/wiki/BankID that we use in Sweden to
| log in to basically any online service). New startup: Un-
| botable authentication as a service.
|
| Idea two: Since captchas are useless, we'll be able to do bots
| that can do almost everything on the web. No need for writing
| automation scripts, headless browsers, regexp etc. Just feed
| real visual data from browser to GPT-4 (or MiniGPT-4 or
| similar). Give instructions like "You need to accomplish this
| task: Go to facebook.com and create a user account and be
| friends with 100 people and act like a human. Follow the
| instructions on the website.". Then let the bot figure out
| where to move the mouse and send click events, keyboard events
| etc. Obviously much slower than a scripted bot, but would be
| very hard to detect as a bot. Good for E2E testing also? New
| startup: WebGPT: Bots that can do any task on any website.
| TestGPT: E2E testing service. Just write test instructions like
| "Make sure the login works on /login" and you're done! And you
| thought flaky tests were a thing of the past... Would be kind
| of cool for UX testing also. "You are a 43 year old truck
| driver in Spain, please go to news.ycombinator.com and see how
| easy it is for you to register an account, log in, go to any
| news story and make a new comment. Report anything you think is
| confusing."
| tysam_and wrote:
| I was not expecting this to be good due to the hype wave, but at
| least looking at the examples, they seem very impressive and
| worthy of the (potential) hype! I can't wait to see what new
| waves of these models keep coming out.
|
| I for one will probably still stay near the fundamental side of
| things as all of this thrash is simply far too much for me, but
| it is quite nice and beautiful for me to get to see all of the
| variety and beauty of what is being built here. <3 <3 <3 <3
|
| Now, to figure out a good way to integrate these seamlessly into
| my development workflow....I go about as painfully low-tech as
| one could. I might as well be using an electronic typewriter with
| only Ctrl+F search, lolzers....
| mdrzn wrote:
| Interesting, will try it out.
|
| The queue is about 100 at the moment, with 700s of waiting.
| mritchie712 wrote:
| 1200s now...
| punkspider wrote:
| Also some links don't seem to open in a new tab and if you
| click around while you wait you might lose your progress.
|
| I clicked on the Video button while waiting, assuming that it
| would open in a new tab, and lost my place in queue.
| zapdrive wrote:
| Also once you get your turn and ask your question, it puts
| you back in the queue.
| djangovm wrote:
| If I have to learn how to be able to read this code and
| understand what it and its dependencies are doing, where do I
| start? Is reading their paper an effective strategy?
|
| https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPT_4...
| lxe wrote:
| Why Vicuna over something like oasst-pythia? Would love to see a
| table comparing all the new models side by side.
| GaggiX wrote:
| All Pythia models were trained on 300B tokens, LLaMa models
| were trained on 1/1.4T tokens.
| lhl wrote:
| Fabrice Bellard has run a standard set of benchmarks w/ lm-eval
| on a big chunk of open models here:
| https://bellard.org/ts_server/ - Flan T5 XXL and GPT-NeoX 20B
| both outperform Pythia 12B on average (LLaMA 13B+ tops the
| charts).
| [deleted]
| vlovic wrote:
| Do I understand this correctly: they just took Blip2 and replaced
| the LLM with Vicuna, and to do that they just added a single
| linear layer to translate between frozen vision encoder and
| (frozen) Vicuna? Additionally, and importantly, they manually
| create a high quality dataset for finetuning their model.
|
| If that is the case, then this is really a very, very simple
| paper. But I guess simple things can lead to great improvements,
| and indeed their results seem very impressive. Goes to show how
| much low hanging fruit there must be in deep learning these days
| by leveraging the amazing, and amazingly general, capabilities of
| LLMs.
| visarga wrote:
| Yes, model composability magic.
| 23459872345 wrote:
| Has anyone gotten this running in Colab?
| felipesabino wrote:
| the lost me when the demo bot chose php for the website builder
| demo /s
| 152334H wrote:
| On a technical level, they're doing something really simple --
| take BLIP2's ViT-L+Q-former, connect it to Vicuna-13B with a
| linear layer, and train just the tiny layer on some datasets of
| image-text pairs.
|
| But the _results_ are pretty amazing. It completely knocks
| Openflamingo && even the original blip2 models out of the park.
| And best of all, it arrived before OpenAI's GPT-4 Image Modality
| did. Real win for Open Source AI.
|
| The repo's default inference code is kind of bad -- vicuna is
| loaded in fp16 so it can't fit on any consumer hardware. I
| created a PR on the repo to load it with int8, so hopefully by
| tomorrow it'll be runnable by 3090/4090 users.
|
| I also developed a toy discord bot
| (https://github.com/152334H/MiniGPT-4-discord-bot) to show the
| model to some people, but inference is very slow so I doubt I'll
| be hosting it publicly.
| sho wrote:
| > they're doing something really simple -- take BLIP2's
| ViT-L+Q-former, connect it to Vicuna-13B with a linear layer,
| and train just the tiny layer on some datasets of image-text
| pairs
|
| Oh yes. Simple! Jesus, this ML stuff makes a humble web dev
| like myself feel like a dog trying to read Tolstoy.
| selfhoster11 wrote:
| Seriously, ChatGPT was the thing that gave me a foothold into
| the AI/machine learning world... because it gave me hope that
| a mere mortal can achieve something reasonable with this tech
| without a crazy amount of work and educational background.
| 152334H wrote:
| I don't have much experience myself. I only started ~10
| months ago -- just a month or two before Stable Diffusion.
|
| You just have to do it every day. It's fun!
| jack_riminton wrote:
| Can you recommend what kind of small daily activities
| would help a web dev get into it?
| 152334H wrote:
| Regardless of what you want to learn, "small daily
| activities" is a bit hard. You can learn some stuff by
| osmosis, following the feeds of AI devs && AI channels,
| but the bulk of what I learn comes from starting projects
| & digging into code & reading papers.
|
| If you can hold attention span over several days (I
| can't), work on a project bit-by-bit. Just make sure it
| uses modern AI stuff, and that you have smart people to
| talk around with.
| anonzzzies wrote:
| There are really great resources now from eli5 about all of
| this tech to books like 'the little learner' which any
| programmer can get into. Yes, it takes effort but it is a
| great time for it.
| pansa2 wrote:
| Web stuff probably makes ML devs feel the same way.
|
| ML is just a different field, using a different set of
| technologies from those you're familiar with.
| tourgen wrote:
| [dead]
| spaceman_2020 wrote:
| The best ML PhDs can't do what frontend devs can:
| understand CSS :D
| ikurei wrote:
| > This ML stuff makes a humble web dev like myself feel like
| a dog trying to read Tolstoy.
|
| Just like any discussion between advanced web devs would make
| any humble woodworker feel?
|
| And just like any discussion between advanced woodworkers
| would make a humble web dev feel?
|
| _" It's really simple, they're just using a No. 7 jointer
| plane with a high-angle frog and a PM-V11 blade to flatten
| those curly birch boards, then a No. 4 smoother plane with a
| Norris-type adjuster and a toothed blade for the final
| pass."_
|
| Whut?
|
| _" You could use Webpack to bundle your HTML, CSS and Babel-
| transpiled TypeScript 5 down to shim-included Ecmascript 4"_,
| _" They're just using OAuth2 authentication with Passport.js
| and JWT tokens, which easily gets you CSRF protection"_, _"
| Our e-learning platform uses LMS.js and xAPI.js, plus SCORM
| for course packaging and Moodle as the LMS backend."_, ...
|
| There was a time you didn't know what any of that meant.
|
| Just because you don't know what the words mean shouldn't
| make it sound difficult. Not saying AI is easy, just that the
| jargon is not a good indication of difficulty and we should
| know better than to be so easily mystified.
| tracker1 wrote:
| Okay, I won't mention how much is wrong in the webdev
| statement... :-D
| brycedriesenga wrote:
| Hey, guys. Hey. Ready to talk plate processing and residue
| transport plate funneling? Why don't we start with joust
| jambs? Hey, why not? Plates and jousts. Can we couple them?
| Hell, yeah, we can. Want to know how? Get this. Proprietary
| to McMillan. Only us. Ready? We fit Donnely nut spacing
| grip grids and splay-flexed brace columns against beam-
| fastened derrick husk nuts and girdle plate Jerries, while
| plate flex tandems press task apparati of ten vertipin-
| plated pan traps at every maiden clamp plate packet.
| Knuckle couplers plate alternating sprams from the t-nut to
| the SKN to the chim line. Yeah. That is the McMillan way.
| And it's just another day at the office.
| SanderNL wrote:
| Talk dirty to me!
| intelVISA wrote:
| runtime polymorphism
| therein wrote:
| Just tell me do we need a turbo encabulator or not?
| Rzor wrote:
| This post is double great and I will never forgive Amazon
| for canceling that show.
|
| For those that don't know this is from a show called
| Patriot.
|
| https://en.wikipedia.org/wiki/Patriot_(TV_series)
|
| Scene: https://youtube.com/watch?v=-F-IHvF5OCA
| SpaceL10n wrote:
| Web devs have become blue collar!? =P
|
| Great idea, actually. I do hope for a curriculum that
| enables kids on the trade school path to learn more about
| programming. Why not Master/Journeyman/Apprentice style
| learning for web dev??
| wnolens wrote:
| That's kind of how I think about bootcamps pumping out
| web devs. They're like trade schools, teaching you just
| enough fundamentals to know how to use existing tools.
| tracker1 wrote:
| Mostly agree... though I don't think the bootcamps get
| enough fundamentals in. Not to mention that it takes the
| type of person that will go above and beyond what has
| been assigned to succeed trying to be a productive
| employee in the space. I'm self-taught and the first
| years of my career spent countless hours reading,
| practicing and solving problems. I still spend a good
| 10-15 hours a week reading and exploring software
| development and try to at least keep up with what's out
| there. In the end, the best you can do is be aware of
| what, or even that options are out there.
|
| I can't imagine starting out today...
| ikurei wrote:
| I kind of agree, but I'd add that I don't think it's a
| bad thing.
| f1codz wrote:
| You make a good point. Except that a number of these
| concepts and tooling in the ML world have been slingshotted
| into the forefront in a relatively short time and it has
| been hard to play catch up. For eg. - someone said "frozen
| Vicuna" below - what does that mean?
| idkyall wrote:
| Vicuna is a specific open source AI LLM:
| https://ai.plainenglish.io/vicuna-the-unparalleled-open-
| sour...
| birdyrooster wrote:
| I love your analysis.
| ttul wrote:
| The thing is, machine learning sorta requires a few math
| prerequisites: linear algebra, differential equations, and
| to some degree vector calculus. Most web developers don't
| have this background.
| craigching wrote:
| If you want to understand the theory, that's true. If you
| want to develop an intuitive understanding without having
| to understand all the nuts and bolts (and I understand
| that can be a big ask for how some people
| learn/understand), give this a try:
| https://karpathy.ai/zero-to-hero.html
| akiselev wrote:
| The irony is Karpathy presents the limit/epsilon
| definition of derivatives in the first half hour (quite
| well IMO and he never actually says "epsilon") which is
| very much a nuts and bolts kind of explanation in
| calculus.
|
| That said, when most people say differential equations
| they're usually thinking of analytical solutions which is
| very much not necessary for practical ML.
| idiotsecant wrote:
| I would say the limit epsilon derivative is exactly the
| sort of thing grandparent post is talking about. It's
| quite intuitive and doesn't require hardly any
| mathematical foundation at all, other than basic geometry
| and algebra. You can understand topics that build on that
| simple concept without understanding the more formal
| derivative definitions.
| mjburgess wrote:
| In practice, it's a lot more like web dev than you might
| imagine.
|
| The above means that the approach is web-dev like gluing,
| almost literally just, from existingliba
| import someop from existinglibb import anotherop
| from someaifw import glue a = someop(X)
| b = glue(a) Y = anotherop(b)
| amelius wrote:
| It's more like gardening: 1. plant seed
| 2. ...wait a very long time... 3. observe
| completely unexpected but cool result
|
| The unexpected part of step 3 is what makes this very
| different from any kind of engineering, even webdev.
|
| Of course, there _is_ a lot of engineering involved in good
| ML, but that is more comparable to agricultural engineering
| in the sense that it 's just a lot of dumb plumbing that
| any engineer can do without knowledge of the actual
| application.
| tudorw wrote:
| count me in :)
| kerkeslager wrote:
| I mean, for me, the unexpected part of 3 is what got me
| into programming in general. The first time you type a
| mysterious incantation into an editor and a few more
| mysterious incantations into the console and the console
| prints "Hello, world" like it was supposed to, it's
| unexpected because it's hard to believe that any of this
| mysterious incantation stuff actually works at all.
|
| As you get better at programming you have to take on
| harder problems to create the surprise of something
| working, because you gain confidence, and as you gain
| confidence, you start expecting your code to work. It's
| only when you've compiled the thing 6 times with small
| corrections and gotten segfaults each time and the 7th
| time you finally find the place you weren't updating the
| pointer and you correct it, but this is the 7th error
| you've corrected without the segfault going away, so you
| don't really expect it to fix the problem, but then you
| run it and it's fixed!
|
| And then you get a job and the reality is that most of
| the jobs you're just writing CRUD apps and for a little
| while you can get some surprise out of learning the
| frameworks, but eventually you actually get really,
| really knowledgeable about the Postrgres/Django/React
| stack and nothing surprises you any more, but _because_
| nothing surprises you any more, you 're really effective
| and you start being able to bill the big bucks _but only
| for work on that stack_ because it takes time to struggle
| enough to get surprised, and the time that takes means
| your time is worth less to your clients. Money ruins
| everything. And if you don 't do anything non-billable,
| it's easy to forget what programming felt like when you
| didn't know how your tools all worked inside and out. Not
| everyone takes this path but it's certainly the easiest
| path to take.
|
| I think for a lot of folks who have been doing this for a
| long time, the reason ML is so exciting is it's getting
| them back out of their comfort zone, and into a space
| where they can experience surprise again.
|
| But that surprise has always been available if you
| continue to find areas of programming that push you out
| of your comfort zone. For me it's been writing
| compilers/interpreters for programming languages.
| _Crafting Interpreters_ was awesome: for the first time I
| benchmarked a program written in _my_ language against a
| Python program, and _my_ program was faster: I never
| expected I 'd be able to do that! More recently, I wrote
| a generational GC. It's... way too memory-intensive to be
| used in my language which uses one-GC-per-thread for
| potentially millions of threads, but it certainly was a
| surprise when that worked.
|
| Personally, I'm keeping track of ML enough to know broad
| strokes of things but I'm not getting my hands dirty with
| code until there are some giants to stand on the
| shoulders of. Those may already exist but it's not clear
| who they are yet. And I've got very little interest in
| plugging together opaque API components; I know how to
| make an API call. I want to write the model code and
| train it myself.
| incidentnormal wrote:
| I like how you've expressed this insight, and it is so
| true.
|
| Becoming great at a particular technology stack means
| modelling it in great detail in your head, so you can
| move through it without external assistance. But that
| leaves an arena without discovery, where you just
| reinforce the same synapses, leading to rigidity and an
| absence of awe.
| teruakohatu wrote:
| There is a little more to it than that. Abstractions in ML
| are very leaky.
| marban wrote:
| And repeat that ~4 times to make it look like LangChain
| guax wrote:
| And just like webdev, each of those were done in a
| different platform and require arcane incantations and 5h
| of doc perusing to make it work on your system.
| KaoruAoiShiho wrote:
| Buddy this ain't 2022 anymore, ask chatgpt (with a plugin
| that can read docs).
| MattPalmer1086 wrote:
| You can just ask GPT how to do it. Much like a lot of web
| dev!
| kerkeslager wrote:
| And the code GPT gives you won't work, much like a lot of
| web dev? ;P
| pc86 wrote:
| Maybe it's because of how I use it, but the code ChatGPT
| gives me has always been super helpful and 99% correct.
| But, we have a policy at work not to use it for work
| product so I have to spend time changing enough of it
| where it's different, and I'm never copy/pasting
| anything. Enough changes to the structure and variables
| to make it sufficiently different that it can't be
| considered pasting company data into GPT, ask my
| question(s), see what comes back out, refactor/type
| manually into my IDE, test. I'd say one out of every 8-9
| times I get something objectively wrong - a method that
| doesn't exist, something not compiling, etc. But it's
| faster than using google/DDG, especially with some
| prompting so that it just spits back code and not 5th-
| grade level explanatory paragraphs before and after. And
| well over half the time it does exactly what I need or
| sufficiently close that my initial refactoring step gets
| me the rest of the way.
| MattPalmer1086 wrote:
| Would you say that this satisfies the spirit of the
| company policy? Or is it a bit of a hack to get around
| it?
|
| I ask because we are about to produce a similar policy at
| work. We can see the advantages of it, but likewise, we
| can't have company data held in their systems.
| pc86 wrote:
| The policy is to not send any "sensitive company data"
| into ChatGPT, which I 100% agree with. How we implement a
| given Vue component or a particular API isn't sensitive
| or particularly novel so if I strip the business logic
| out I do honestly believe I'm complying with the spirit
| of the policy.
| SanderNL wrote:
| If I use it I also make sure it's something completely
| non-core business, like an arcane piece of sorting or
| ugly rxjs construction.
|
| I get the IP angst, but some companies think their
| GetGenericObjectFromDB() REST bs is secret sauce.
| intelVISA wrote:
| To the average VC a computer switching on is secret sauce
| enough, the rest is really just an implementation detail.
| Gravityloss wrote:
| at some point someone makes a service where you can let
| AI take over your computer directly. Easier that way!
| Curling straight to shell taken to next level.
| wrayjustin wrote:
| So...AutoGPT? Now with command-line access! Have fun :)
|
| https://github.com/Significant-Gravitas/Auto-GPT/
| vorticalbox wrote:
| Found my next hobby project
| minimaxir wrote:
| This is why the Hugging Face transformer ecosystem is so
| good, as each of those blocks will roughly have the same
| unified API.
| MrGilbert wrote:
| Just get rid of all the abbreviations in your mind - they
| seem to be very intimidating. I really liked the explanation
| that Stephen Wolfram did on ChatGPT:
|
| https://writings.stephenwolfram.com/2023/02/what-is-
| chatgpt-...
|
| Maybe someone has resources to understand machine-learning on
| an ELI5 level.
| wwalexander wrote:
| Wow, he waits until halfway through the article to mention
| A New Kind of Science. Usually he works it into the first
| couple of paragraphs!
| alchemist1e9 wrote:
| I known it's hard to believe but I sense LLMs have
| slightly knocked his ego down and injected a small dose
| of humility.
|
| https://youtu.be/z5WZhCBRDpU
|
| I pick that up in above video and also in the post above.
|
| Definitely healthy for him which just to be clear I'm a
| huge Wolfram fan and the ego doesn't really bother me,
| it's just part of who he is, however I do find it nice
| that LLMs are having him self reflect more than typical.
| HarHarVeryFunny wrote:
| Not a big Wolfram fan myself. I gave him the benefit of
| the doubt and bought "A New Kind of Science" (freakin'
| expensive when it first came out), and read the whole
| 1280 pages cover to cover ... Would have been better
| presented as a short blog post.
|
| I find it funny how despite being completely uninvolved
| in ChatGPT he felt the need to inject himself into the
| conversation and write a book about it. I guess it's the
| sort of important stuff that he felt an important person
| like himself should be educating the plebes on.
|
| Predictably he had no insight into it and will have left
| the plebes thinking it's something related to MNIST and
| cat-detection.
| MrGilbert wrote:
| I just happen to read this article of him, which I found
| easy to understand. I'm neither a huge proponent nor
| opponent of the likes of his work. Or, bluntly speaking:
| I don't know much else about his reputation in the
| community.
| pc86 wrote:
| Big "a monad is just a monoid in the category of
| endofunctors" vibes from this one.
| joaogui1 wrote:
| FWIW I work in LLMs and I consistently fail to do simple
| webdev stuff
| craigching wrote:
| I was where you're at about ... oh wow, it's been almost ten
| years since I jumped into machine learning. Mind you, I've
| been learning on the side most of this time other than a
| theoretical class at the University of Minnesota. But, that
| aside, and depending on where you're at in your
| understanding, this is a great resource for catching up if
| you're really interested: https://karpathy.ai/zero-to-
| hero.html it was posted on HN a couple of weeks ago and I
| have to say it's a really good introduction and Andrej
| Karpathy is a passionate and excellent teacher. You may want
| to brush up on some intro Calculus, but it's very
| understandable.
| thewarrior wrote:
| I've only been reading ML stuff for a few months and I kind
| of understand what it's saying. This stuff isn't as complex
| as its made out to be.
|
| It's just a bunch of black boxes AKA "pure functions".
|
| BLIP2's ViT-L+Q-former AKA //I give you a
| picture of a plate of lobster it will say "A plate of
| lobster". getTextFromImage(image) -> Text
|
| Vicuna-13B AKA //I give you a prompt and
| you return completion ChatGPT style
| getCompletionFromPrompt(text) -> Text
|
| We want to take the output of the first one and then feed in
| a prompt to the LLM (Vicuna) that will help answer a question
| about the image. However the datatypes don't match. Lets add
| in a mapper. getAnswerToQuestion(image,
| question) -> answer text =
| getTextFromImage(image) prompt =
| mapTextToPrompt(text) return
| getCompletionForPrompt(prompt)
|
| Now where did this mapTextToPrompt come from ?
|
| This is the magic of ML. We can just "learn" this function
| from data. And they plugged in a "simple" layer and learned
| it from a few examples of (image , question) -> answer. This
| is what frameworks like Keras, Pytorch allow you to do. You
| can wire up these black boxes with some intermediate layers
| and pass in a bunch of data and voila you have a new model.
| This is called differentiable programming.
|
| The thing is you don't need to convert to text and then map
| back into numbers to feed into the LLM. You skip that and use
| the numbers it outputs and multiply directly with an
| intermediate matrix.
| getAnswerToQuestion(image, question) -> answer
| text = getEmbeddingFromImage(image) embedding =
| mapEmbeddingToInputEmbeddingForLLM(text) return
| getCompletionForEmbedding(embedding)
|
| Congratulations you now understood that sentence.
| artificial wrote:
| Thank you for the insightful breakdown. Cheers!
| eternauta3k wrote:
| Interesting, so the LLM is "just" getting your question
| plus a normal text description of the image (as vectors)?
| thewarrior wrote:
| At a high level yes.
|
| More precisely - It gets the question After irs passed
| through a matrix that transforms the text description of
| the image so the LLM can "understand" it.
|
| It maps from the space of one ML model to the other.
| visarga wrote:
| > take BLIP2's ViT-L+Q-former
|
| This thing takes an image and creates a representation
| matrix.
|
| > connect it to Vicuna-13B with a linear layer
|
| Vicuna is an open LLM, pretty good quality, not as good as
| GPT3.5 though.
|
| This is the beautiful part - a mere multiplication is enough
| to convert the image tensor to text tensor. One freaking line
| of code, and a simple one.
|
| > and train just the tiny layer on some datasets of image-
| text pairs
|
| You then get a shitload of image-text pairs and train the
| model to describe the images in text. But keep both the image
| and text model frozen. Is that hard? No, just flip a flag. So
| this "linear projection layer" (a matrix multiplication) is
| the only learned part. That means it takes less time to
| train, needs fewer examples and requires less memory.
|
| Training the image and text models was much more difficult.
| But here we don't train these models, they use them as ready-
| made parts. It's a hack on top of two unrelated models, so it
| is cheap.
|
| In the end the finishing touches - they label 3500 high
| quality image-text pairs, and fine-tune on them. Now the
| model becomes truly amazing. It has broad visual
| intelligence, and scooped OpenAI who didn't release Image
| GPT-4 in the APIs yet.
|
| The important lesson to take is that unrelated models can be
| composed together with a bit of extra training for the glue
| model. And that open AI is just as powerful as "Open"AI
| sometimes. It's breathing down their necks, just one step
| behind. This model is also significant for applications - it
| can power many automations in a flexible way.
| rafaelero wrote:
| > This is the beautiful part - a mere multiplication is
| enough to convert the image tensor to text tensor. One
| freaking line of code, and a simple one.
|
| I thought they were creating image tokens based on the
| queries during finetuning and appending them to the
| language model. They are not text tokens.
| relativeadv wrote:
| > like a dog trying to read Tolstoy
|
| this got a chuckle out loud from me. great visual.
| egeozcan wrote:
| This could be a great prompt to test the limits of txt2img
| models. The astronaut riding a horse got boring already :)
| mewpmewp2 wrote:
| Only because of big complicated sounding terms, that also
| exist in web dev.
| macawfish wrote:
| Arf!
| seydor wrote:
| Then it's an impressive demonstration of how modular neural
| networks can be. Maybe we don't even need to train monoliths
| artificial wrote:
| Maybe a distributed trainer? AI@Home?
| kramerger wrote:
| Can any of this realistically run on CPU at some point?
|
| (Not training obviously)
| tantony wrote:
| Someone is probably going to port it to llama.cpp soon.
| lhl wrote:
| Yes, you can run inference at decent speeds on CPU with
| llama.cpp. A token is about 0.75/words, so you can see lots
| of people getting 4-8 words/s on their CPUs:
| https://github.com/ggerganov/llama.cpp/issues/34
|
| There a lot of optimizations that can be done. Here's one w/
| potentially a 15X AVX speedup for example:
| https://github.com/ggerganov/llama.cpp/pull/996
| anonzzzies wrote:
| It will do, probably quite soon. Many people are trying.
| GaggiX wrote:
| I've run LLaMa models on my CPU before, ViT-L and the
| Q-former are two transformer models as well, so I can't see
| why they wouldn't run on a CPU.
| Ambix wrote:
| I'm developing framework [1] in Golang with this goal in mind
| :) It successfully runs relatively big LLM right now, and
| diffusion models will be the next step
|
| [1] https://github.com/gotzmann/llama.go/
| muyuu wrote:
| quantized Vicuna runs ok-ish in my 16GB i7 laptop (onboard
| graphics) and the output is usable
|
| see this comparison: https://old.reddit.com/r/LocalLLaMA/comm
| ents/12ezcly/compari...
|
| these models quantised to 4bit should run in CPU set ups with
| 16GB of RAM + 16GB of swap (Linux) and perhaps other setups
| run similarly
| HopenHeyHi wrote:
| Thanks for a useful comment.
|
| Do you reckon the 4bit quantized Vicuna just won't do here?
| https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-1...
|
| I think with this everything OpenAI demonstrated ~5 weeks ago
| has been recreated by actually-open AI. Even if it runs much
| much slower on prosumer hardware and with less good results at
| least it is de-magicked.
| p1esk wrote:
| The magic is in the quality of GPT-4 output. That hasn't been
| recreated yet.
| kolinko wrote:
| Open AI still hasn't exactly reached the level of gpt3.5.
| GPT-4 is way ahead of anything.
| 152334H wrote:
| It'll work! I just haven't touched any of the 4bit stuff
| myself, so I don't personally know how to add it. Great low-
| hanging fruit for anyone else to take on.
| chenzhekl wrote:
| It sounds like a BLIP2 with an extra linear layer for
| finetuning (or aligning the Q-former with a new LLM?). What
| makes it more powerful than BLIP2?
| 152334H wrote:
| It's better because
|
| 1. it's using vicuna as a base.
|
| 2. It has a pretty high quality fine-tuning dataset. I
| initially missed this, and it's a very important advantage.
|
| 3. (speculatively) it doesn't collapse to extremely short
| responses (which BLIP2 and other models trained on image-text
| caption pairs) because of how small/simple the adapter is.
|
| I was interested in training a BLIP2-LLaMA model before this,
| and I might still do it just to test (3).
| matthewfcarlson wrote:
| >so hopefully by tomorrow it'll be runnable by 3090/4090 users.
|
| Taking a step back, this is just a wild statement. I know
| there's some doom and gloom out there, but in certain aspects,
| it's an awesome time to be alive.
| cs702 wrote:
| Indeed, really simple. And yes, the results are shockingly
| good. But what I find most remarkable about this is that the
| ViT-L+Q-former's hidden states _are related by only a linear
| projection_ (plus bias) to the Vicuna-13B 's token embeddings:
| emb_in_vicuna_space = emb_in_qformer_space @ W + B
|
| These two models are trained independently of each other, on
| very different data (RGB images vs integer token ids
| representing subwords), and yet somehow they learn to embed
| different data in feature vectors that are so... _similar_. WHY
| should that be the case?
|
| It suggests to me there may be something _universal_ about the
| embedding layers and hidden states of _all_ trained deep
| learning models.
| sigmoid10 wrote:
| >somehow they learn to embed different data in feature
| vectors that are so... similar
|
| At it's core, BLIP2 already projects RGB inputs into text
| token space and Vicuna (or rather LLaMA) uses such tokens as
| inputs as well as outputs. The only reason why a linear layer
| is needed at all is because they are not trained at the same
| time, so you still have to move text embeddings from one
| space to another. But it should not be surprising at all that
| one hidden linear layer suffices to do just that (see the
| universal approximation theorem [1]). This approach is just
| an efficient way to combine different models for downstream
| fine-tuning tasks while keeping their weights frozen, but it
| is neither new nor particularly surprising.
|
| [1] https://en.wikipedia.org/wiki/Universal_approximation_the
| ore...
| cs702 wrote:
| Thanks. Your comment about BLIP2 already projecting RGB
| inputs into (a different) text token space makes sense to
| me. See also fpgaminer's comment at
| https://news.ycombinator.com/item?id=35603246 . However, I
| don't see how the universal approximation theorem is
| relevant here. The fact that deep models with sufficient
| capacity can approximate any function does not imply that
| two deep models trained independently of each other on
| different tasks will learn to approximate functions that
| relate to each other only by a linear transformation.
| sigmoid10 wrote:
| >I don't see how the universal approximation theorem is
| relevant here. The fact that deep models
|
| The universal approximation is exactly _not_ about deep
| models. Deep means many layers. But in the most simple
| (and proven) case, a single hidden layer perceptron is
| all it needs according to the UAT. Technically it also
| needs a nonlinear activation function, but you get all
| sorts of nonlinearities for free downstream anyways in
| this particular model.
| fpgaminer wrote:
| BLIP2 is a contrastive Image-Language model. The embeddings
| from the BLIP2 image model are already both aligned with
| text, and linear. It should not be a surprise that only a
| projection is required to translate it to LLaMA's embedding
| space.
| cs702 wrote:
| This is the best answer. It makes sense to me. Thank you
| :-)
| AbrahamParangi wrote:
| I think it's just that affine transforms in high dimensions
| are surprisingly expressive. Since the functions are sparsely
| defined they're much less constrained compared to the low
| dimensional affine transformations we usually think of.
| cs702 wrote:
| Good point. Didn't think of that. It's a plausible
| explanation here, because the dimensionality of the spaces
| is so different, 5120 vs 768. Not surprisingly, the trained
| weight matrix has rank 768: it's using every feature in the
| lower-dimensional space.
|
| Still, it's kind of shocking that it works so well!
|
| I'd be curious to see if the learned weight matrix ends up
| being full-rank (or close to full-rank) if both spaces have
| the same dimensionality.
| visarga wrote:
| They would have full-rank because all the embedding space
| is used. There are no unused large pockets.
| cs702 wrote:
| The weight matrix's rank would decrease for each feature
| in the target space that cannot be expressed as as a
| linear combination of features in the input space (plus a
| bias). For example, if the target space has a feature
| representing a non-visual quality like "smelliness," it
| would not be expressible as a linear combination of
| features representing visual attributes like "redness,"
| "blueness," and "greenness," etc. in the input space.
|
| If both spaces have the same dimensionality, the learned
| weight matrix would be full-rank _only_ if every feature
| in the target space is expressible as a linear
| combination of features in the input space (plus a bias).
| Which brings me back to my original question: WHY would
| that be the case when the two models are trained
| independently on data that is so different?
| sdenton4 wrote:
| A random nxn matrix is full rank... So it's kinda the
| default: any amount of noise in the embedding is going to
| result in full-rank transformations.
|
| So it's really less-than-full rank which would require an
| explanation - ie, why does this image representation
| project into this perfectly isolated subspace of the
| language representation (or vice versa)?
|
| If that happened I would start looking for things like a
| vocabulary of smell which is completely distinct and non-
| overlapping with any visual context. But we use cross-
| modal analogies in language /constantly/ (many smells are
| associated with things we can see - 'smells like a rose')
| so you wouldn't expect any clean separations for
| different modalities... Maybe there's some branch of
| analytic philosophy which has managed to completely
| divorce itself from the physical world...
| eurekin wrote:
| > I created a PR on the repo to load it with int8, so hopefully
| by tomorrow it'll be runnable by 3090/4090 users.
|
| How about 2x3090? Can it be run on multiple gpus?
| collaborative wrote:
| With fp8, would 4GB be enough or is 6GB more like it?
| jxy wrote:
| This opens a huge possibilities. It's likely we could simply
| plug in stable diffusion using a linear layer. As well as
| whisper and some TTS. Getting a back to back mixed
| image/sound/text engine running on a laptop.
|
| I wonder if there's powerful enough ViT model that does OCR.
| m00dy wrote:
| Hi CHEN,
|
| Invoker here, I would like to have a chat or send me an email @
| community@invoker.network
| endorphine wrote:
| DotA?
| m00dy wrote:
| I'm way better than him ;)
| xnzakg wrote:
| The ramen example is kind of hilarious. Wonder if it would make
| more sense with a bigger model.
| m348e912 wrote:
| Ok I'll bite. What makes the Raman example so funny?
| burkaman wrote:
| The recipe says to make a broth, cook the noodles in the
| broth, then throw the broth away and add the noodles to a pan
| with meat. You'll end up with burned crispy noodles and meat
| in a dry bowl.
| Thomashuet wrote:
| The recipe it suggests is completely off. Just search for
| "ramen recipe" and compare. For instance it tells you to add
| soy sauce, sesame oil and green onions to the broth (so far
| so good) but then tells you to throw all of it away ("once
| the noodles are cooked, drain them"). The problem here is
| that you should cook the noodles separately in plain water,
| not a broth with soy sauce.
| thatsadude wrote:
| How is this a paper, they simply fine-tune the fully-connected
| layer of BLIP-2 https://arxiv.org/pdf/2301.12597.pdf
| nobrains wrote:
| I think the real news is where this research is coming from. It
| is from King Abdullah University of Science and Technology in
| Saudi. The arab world has recently been quite behind in tech
| research, but this is good news overall for the arab world, which
| will inspire other universities and researchers in the region to
| one-up their colleagues, which is overall good for the entire
| world.
| sterlind wrote:
| I would be more thrilled if it were literally any other country
| in the Arab world, but maybe that will come.
| Imnimo wrote:
| I'm pretty surprised that the frozen encoder has enough
| information in its representations to do the "design the website
| from a hand-written whiteboard mock-up" task, and that that
| information can be effectively accessed by the language model
| with just a small projection layer. I really would have guessed
| that you needed to be training the image encoder simultaneously.
| devinprater wrote:
| Holy crap! If this has image to text, and can run on a CPU, even
| with 16 GB RAM, one could hook up an NVDA addon, straight Python
| so it shouldn't be too hard, and have AI read whatever the user
| wants. Pictures, video frames, UI's, and if it can break the UI
| down into something the screen reader can understand, boom!
| m3kw9 wrote:
| This is basically Image to Text, then from the text, start a GPT
| type chat. No need to overthink anything
___________________________________________________________________
(page generated 2023-04-17 23:00 UTC)