[HN Gopher] Claude 3 Haiku: our fastest model yet
       ___________________________________________________________________
        
       Claude 3 Haiku: our fastest model yet
        
       Author : minimaxir
       Score  : 31 points
       Date   : 2024-03-13 20:59 UTC (2 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | minimaxir wrote:
       | This is a new announcement from the previous Claude 3
       | announcement last week:
       | https://news.ycombinator.com/item?id=39590666
       | 
       | Specifically, the smallest/likely will be the most popular model
       | is now available when it wasn't then. (The model ID is
       | claude-3-haiku-20240307 ). Notably, this is also a cheap model
       | that supports image input, but per the documentation you can only
       | provide 20 images at a time which won't work for video inputs.
       | 
       | Testing around image inputs in the web Workbench, it's
       | surprisingly good for the price.
        
       | simonw wrote:
       | Pricing: $0.25/million tokens of input, $1.25/million of output
       | 
       | GPT-3.5 Turbo is $0.50/$1.50
       | 
       | I've updated the Claude 3 plugin for my LLM CLI tool to support
       | the new model: https://github.com/simonw/llm-
       | claude-3/releases/tag/0.3                   pipx install llm
       | llm install llm-claude-3         llm keys set claude         #
       | Paste Anthropic API key here         llm -m claude-3-haiku 'Fun
       | facts about armadillos'
       | 
       | It's pretty fast! Animated GIF here:
       | https://github.com/simonw/llm-claude-3/issues/3#issuecomment...
        
         | BoorishBears wrote:
         | Seeing it be pretty slow in production with long prompts:
         | 
         | - 10-15 seconds for 400 tokens out, and 4,000-10,000 tokens in.
         | 
         | - 6-8 seconds when using Claude Instant for the same prompts
         | 
         | Hoping it's just a rush at launch.
        
         | porphyra wrote:
         | For comparison, Groq [1] has (price per million tokens of input
         | vs output):                   Llama 2 70B (4096 Context Length)
         | ~300 tokens/s $0.70/$0.80         Llama 2 7B (2048 Context
         | Length)     ~750 tokens/s $0.10/$0.10         Mixtral, 8x7B
         | SMoE (32K Context Length) ~480 tokens/s $0.27/$0.27
         | Gemma 7B (8K Context Length)         ~820 tokens/s $0.10/$0.10
         | 
         | [1] https://wow.groq.com/
        
           | BoorishBears wrote:
           | And zero capacity. Groq is coming across a total paper tiger:
           | no billing, unusable rate limits, and most importantly: a
           | request queue that makes it dramatically slower than any
           | other option.
           | 
           | They say they're just waiting on implementing billing, but at
           | this point it reads more like "we wouldn't be able to meet
           | demand of all your request usages".
           | 
           | -
           | 
           | Groq is going through all that to offer 500tk/s
           | theoretically, meanwhile I'm seeing Fireworks.ai come in at
           | 300+tk/s in production use.
        
       | gabev wrote:
       | This is fantastic! We just shipped so much of our old Claude
       | Instant calls to Haiku and the results are fantastic.
       | 
       | Zenfetch is now primarily powered by Claude 3 family of models :O
       | https://www.zenfetch.com
        
       | josh-sematic wrote:
       | You can play with Haiku for free at
       | https://app.airtrain.ai/playground
        
       | ldjkfkdsjnv wrote:
       | Is Anthropic moving faster than OpenAI? Or is OpenAI working on
       | something so big, that they aren't worried by being outpaced.
       | Regardless, I feel like I am watching history in real time.
        
         | svdr wrote:
         | I guess they must have put some time in Sora?
        
         | a_wild_dandan wrote:
         | OpenAI are presently training toward GPT-5, with periodic
         | GPT-4.x releases planned. These models take immense training
         | resources, red teaming, etc. It'll be a hot minute before you
         | see GPT-4.5, etc.
        
       | devnullbrain wrote:
       | SMS verification seems to be broken. Nothing reported on the
       | status page.
       | 
       | https://status.anthropic.com/
        
       | GaggiX wrote:
       | The multilingual capabilities of Claude 3 models are incredible,
       | even the smallest model, Haiku, is fluent in Georgian, a language
       | that not even GPT-4 can speak without making a huge amount of
       | mistakes.
        
         | brcmthrowaway wrote:
         | What are their training sources for this
        
       | pedalpete wrote:
       | Is there a good reason why they are comparing Claude 3 with
       | ChatGPT 3.5 instead of 4? Does anybody really even care about/use
       | Gemini?
        
         | bryanlarsen wrote:
         | AFAICT Claude 3 Haiku is supposed to be cheap/fast so compares
         | with ChatGPT 3.5, and Claude 3 Ultra is supposed to be the best
         | so compares with ChatGPT 4.
        
           | pants2 wrote:
           | You mean Claude 3 Opus, but yes.
        
       ___________________________________________________________________
       (page generated 2024-03-13 23:01 UTC)