[HN Gopher] Perplexity Deep Research
       ___________________________________________________________________
        
       Perplexity Deep Research
        
       Author : vinni2
       Score  : 333 points
       Date   : 2025-02-15 20:07 UTC (1 days ago)
        
 (HTM) web link (www.perplexity.ai)
 (TXT) w3m dump (www.perplexity.ai)
        
       | transformi wrote:
       | Since google, everyone trying replicate this feature... (OpenAI,
       | HF..)
       | 
       | It's powerfull yes, so as asking an A.I and let him sythezise all
       | what he fed.
       | 
       | I guess the air is out of the ballon from the big players, since
       | they lack of novel innovation in their latest products.
        
       | nextworddev wrote:
       | I tried it but it seems to be biased to generate shorter reports
       | compared to OpenAI's Deep Research. Perhaps it's a feature.
        
       | larsiusprime wrote:
       | I tried using this to create a fifty state table of local laws
       | and policies and tax rates and legal obstacles for my pet
       | interest (land value tax) I gave it the same prompts I gave
       | OpenAI DR. Perplexity gave equally good results, and unlike
       | OpenAI didn't bungle the CSV downloads. Recommended!
        
       | CSMastermind wrote:
       | I'm super happy that these types of deep research applications
       | are being released because it seems like such an obvious use case
       | for LLMs.
       | 
       | I ran Perplexity through some of my test queries for these.
       | 
       | One query that it choked hard on was, "List the college majors of
       | all of the Fortune 100 CEOs"
       | 
       | OpenAI and Gemini both handle this somewhat gracefully producing
       | a table of results (though it takes a few follow ups to get a
       | correct list). Perplexity just kind of rambles generally about
       | the topic.
       | 
       | There are other examples I can give of similar failures.
       | 
       | Seems like generally it's good at summarizing a single question
       | (Who are the current Fortune 100 CEOs) but as soon as you need to
       | then look up a second list of data and marry the results it kind
       | of falls apart.
        
         | stagger87 wrote:
         | Hopefully the end user of these products know something about
         | LLMs and why asking a question such as "List the college majors
         | of all of the Fortune 100 CEOs" is not really suited well for
         | them.
        
           | iandanforth wrote:
           | Perhaps you can enlighten us as to why this isn't a good use
           | case for an LLM during a deep research workflow.
        
             | jhanschoo wrote:
             | LLMs ought to be able to gracefully handle it, but the OP
             | comment
        
           | collinvandyck76 wrote:
           | For those that don't know, including myself, why would this
           | question be particularly difficult for an LLM?
        
           | rs186 wrote:
           | If "deep research" can't even handle this, I don't think I
           | would trust it with even more complex tasks
        
           | rchaud wrote:
           | Hopefully my boss groks how special I am and won't assign me
           | tasks I consider to be beneath my intelligence (and beyond my
           | capabilities).
        
       | nathanbrunner wrote:
       | Tried it and it is worse that OpenAI deep search (one query only,
       | will need to try it more I guess...)
        
         | tmnvdb wrote:
         | The openAi version costs 200$ and takes a lot longer, not sure
         | if it is fair to compare?
        
         | voiper1 wrote:
         | My query generated 17 steps of research, gathering 74 sources.
         | I picked "Deep Research" from the modes, I almost accidentally
         | picked "reasoning".
        
       | simonw wrote:
       | That's the third product to use "Deep Research" in its name.
       | 
       | The first was Gemini Deep Research:
       | https://blog.google/products/gemini/google-gemini-deep-resea... -
       | December 11th 2024
       | 
       | Then ChatGPT Deep Research: https://openai.com/index/introducing-
       | deep-research/ - February 2nd 2025
       | 
       | Now Perplexity Deep Research:
       | https://www.perplexity.ai/hub/blog/introducing-perplexity-de... -
       | February 14th 2025.
        
         | exclipy wrote:
         | Is there a problem with this if it's not trademarked? It's like
         | saying Apple Maps is the nth product called "Maps".
         | 
         | I, for one, am glad they are standardising on naming of
         | equivalent products and wish they would do it more (eg.
         | "reasoning" vs "thinking", "advanced voice mode" vs "live")
        
           | anon373839 wrote:
           | Not a trademark lawyer, but I don't think Deep Research
           | qualifies for trademark protection because it is "merely
           | descriptive" of the product's features. The only way to get a
           | trademark like that is through "acquired distinctiveness",
           | but that takes 5 years of exclusive use and all these
           | competitors will make that route impossible.
        
         | mrtesthah wrote:
         | Elicit AI just rolled out a similar feature, too, specifically
         | for analyzing scientific research papers:
         | 
         | https://support.elicit.com/en/articles/4168449
        
         | transformi wrote:
         | You forgot Huggingface researchers - https://www.msn.com/en-
         | us/news/technology/hugging-face-resea...
         | 
         | and BTW - I post an exact same spirit comment an hour ago... So
         | I guess Today's copycat ethics aren't solely for products- but
         | also for comment section . LOL.
        
           | 2099miles wrote:
           | Your comment from earlier wasn't as easy to digest as this
           | one. I don't think that person copied you at all.
        
             | transformi wrote:
             | Thanks. I accept the criticism of being less digest and
             | more opinionated. But at the end of the day it provide the
             | same information.
             | 
             | Don't get me wrong - I don't mind to be copied on the
             | Internet :), but I find this behavior quite rude, so I just
             | mentioned it.
        
           | rnewme wrote:
           | Thinking simonw is stealing your comment is comedy moment of
           | the day
        
           | gbnwl wrote:
           | Said comment, so other's don't have to dig around in your
           | history:
           | 
           | "Since google, everyone trying replicate this feature...
           | (OpenAI, HF..) It's powerfull yes, so as asking an A.I and
           | let him sythezise all what he fed.
           | 
           | I guess the air is out of the ballon from the big players,
           | since they lack of novel innovation in their latest
           | products."
           | 
           | I'd say the important differences are that simonw's comment
           | establishes a clear chronology, gives links, and is focused
           | on providing information rather than opinion to the reader.
        
         | satvikpendem wrote:
         | It is a term of art now in the field.
        
         | qingcharles wrote:
         | It failed my first test which concerned Upside magazine. All of
         | these deep research versions have failed to immediately surface
         | the most famous and controversial article from that magazine,
         | "The Pussification of Silicon Valley." When hinted, Perplexity
         | did a fantastic job of correcting itself, the others struggled
         | terribly. I shouldn't have to hint though, as that requires
         | domain knowledge that the asker of a query might be lacking.
         | 
         | We're mere months into these things, though. These are all
         | version 1.0. The sheer speed of progress is absolutely wild.
         | Has there ever been a comparable increase in the ability of
         | another technology on the scale of what we're seeing with LLMs?
        
           | willy_k wrote:
           | I wouldn't go so far as to say it was definitely faster, but
           | the development of mobile phones post-iPhone went pretty
           | quick as well.
        
           | dcreater wrote:
           | > pussification of silicon valley upside magazine
           | 
           | Google nor bing can find this
        
             | qingcharles wrote:
             | https://www.google.com/search?q=pussification+of+silicon+va
             | l...
        
               | stavros wrote:
               | Nothing with "pussification" in the title for me there.
        
               | qingcharles wrote:
               | Wild. My results are literally dozens of posts about the
               | article.
               | 
               | https://imgur.com/a/1hTJVkl
        
               | motoxpro wrote:
               | I don't see the article you are mentioning
        
               | qingcharles wrote:
               | Wild. My results are literally dozens of posts about the
               | article.
               | 
               | https://imgur.com/a/1hTJVkl
        
               | freehorse wrote:
               | About the article, not any link to the article itself.
        
               | acka wrote:
               | It is possible that the original article is no longer
               | accessible online.
               | 
               | The only link I have found is a reproduction of the
               | article[1], but I am unable to access the full text due
               | to a paywall. I no longer have access to academic
               | resources or library memberships that would provide
               | access.
               | 
               | My Google search query was:
               | pussification of silicon valley inurl:upside
               | 
               | which returned exactly one result.
               | 
               | I suspect the article's low visibility in standard Google
               | searches, requiring operators like 'inurl:', might be
               | because its PageRank is low due to insufficient
               | backlinks.
               | 
               | [1] https://www.proquest.com/docview/217963807?sourcetype
               | =Trade%...
        
               | abstractcontrol wrote:
               | Can't find it either.
        
               | tomjen3 wrote:
               | I see a reference to the comment, a guiardian article
               | about the article but not the article itself.
               | 
               | Perhaps it's softnuked in the eu or something?
        
             | acka wrote:
             | Do you have Google SafeSearch or Bing's equivalent turned
             | on perhaps?
             | 
             | I reckon it might be triggered by the word 'pussification'
             | to refuse to return any results related to that.
             | 
             | If you're using a corporate account, it's possible that
             | your account manager has enabled SafeSearch, which you may
             | not be able to disable.
             | 
             | Local censorship laws, such as those in South Korea, might
             | also filter certain results.
        
           | Kye wrote:
           | My standard prompts when I want thoroughness:
           | 
           | "Did you miss anything?"
           | 
           | "Can you fact check this?"
           | 
           | "Does this accurately reflect the range of opinions on the
           | subject?"
           | 
           | Taking the output to another LLM with the same questions can
           | wring out more details.
        
             | ErikBjare wrote:
             | I'd expect a "deep research" product to do this for me.
        
         | ofou wrote:
         | https://www.emergentmind.com also offers Deep Research on ArXiv
         | papers (experimental)
        
         | jsemrau wrote:
         | I own DeepCQ.com since early 2023 - Which could do "deepseek"
         | for financial research. Maybe I just throw this on the pile,
         | too.
        
         | shekhargulati wrote:
         | Just a side note: The Wikipedia page for "Deep Research" only
         | mentions OpenAI - https://en.wikipedia.org/wiki/Deep_Research
        
           | Mond_ wrote:
           | This is bizarre, wasn't Google the one who claimed the name
           | and did it first?
        
             | TeMPOraL wrote:
             | Gemini was also "use us through this weird interface and
             | also you can't if you're in the EU"; that + being far
             | behind OpenAI and Anthropic for the past year means, they
             | failed to reach notoriety, partly because of their own
             | choices.
        
               | CjHuber wrote:
               | Honestly I don't get why everybody is saying Gemini is
               | far behind. Like for me Gemini Flash Thinking
               | Experimental performs far far better then o3 mini
        
               | tr3ntg wrote:
               | Seconding this. I get really great results from Flash 2.0
               | and even Pro 1.5 for some things compared to OpenAI
               | models.
               | 
               | And their 2.0 Thinking model is great for other things.
               | When my task matters, I default to Gemini.
        
               | jaggs wrote:
               | I find the problem with Gemini is the rate limits. Really
               | constrictive.
        
               | DebtDeflation wrote:
               | There's a lot of mental inertia combined with an
               | extremely fast moving market. Google was behind in the AI
               | race in 2023 and a good chunk of 2024. But they largely
               | caught up with Gemini 1.5, especially the 002 release
               | version. Now with Gemini 2 they are every bit as much of
               | a frontier model player as OpenAI and Anthropic, and even
               | ahead of them in a few areas. 2025 will be an interesting
               | year for AI.
        
               | hansworst wrote:
               | Arguably Google is ahead. They have many non-llm uses
               | (waymo/deepmind etc) and they have their own hardware, so
               | not as reliant on Nvidia.
        
               | tim333 wrote:
               | Demis Hassabis isn't very promotional. The other guys
               | make more noise.
        
               | Kye wrote:
               | It varies a lot for me. One day it takes scattered
               | documents, pasted in, and produces a flawless summary I
               | can use to organize it all. The next, it barely manages a
               | paragraph for detailed input. It does seem like Google is
               | quick to respond to feedback. I never seem to run into
               | the same problem twice.
        
               | lambdaba wrote:
               | > It does seem like Google is quick to respond to
               | feedback.
               | 
               | I'm puzzled as to how that would work, when people talk
               | about quick changes in model behavior. What exactly is
               | being adjusted? The model has already been trained. I
               | would think it's just randomness.
        
               | Kye wrote:
               | Magic
               | 
               | And fine tuning.
               | 
               | Choose your fighter...
               | 
               | High level overview:
               | https://www.datacamp.com/tutorial/fine-tuning-large-
               | language...
               | 
               | More detail: https://www.turing.com/resources/finetuning-
               | large-language-m...
               | 
               | Nice charts: https://blogs.oracle.com/ai-and-
               | datascience/post/finetuning-...
               | 
               | The big platforms also seem to employ an intermediate
               | step where they rewrite your prompt. I've downloaded my
               | ChatGPT data and found substantial changes from what I
               | wrote. _Usually_ for the better. Changes to the way it
               | rewrites changes the results.
        
               | brookst wrote:
               | System prompts have a huge impact on output. Prompts for
               | ChatGPT/etc are around a thousand words, with examples of
               | what to do and what not to do. Minor adjustments there
               | can make a big difference.
        
               | jaggs wrote:
               | I've found this as well. On a good day Gemini is superb.
               | But otherwise, awful. Really weird.
        
               | TeMPOraL wrote:
               | It _was_ far behind. That 's what I kept hearing on the
               | Internet until maybe a couple weeks ago, and it didn't
               | seem like a controversial view. Not that I cared much -
               | _I couldn 't access it anyway because I am in the EU_,
               | which is my main point here: it seems that they've
               | improved recently, but at that point, hardly anyone here
               | paid it any attention.
               | 
               |  _Now_ , as we can finally access it, Google has a chance
               | to get back into the race.
        
               | xiphias2 wrote:
               | o3 mini is still behind o1 pro, it didn't impress me.
               | 
               | I think the people who think anybody is close to OpenAI
               | don't have pro subscription
        
               | viraptor wrote:
               | The $200 version? It's interesting that it exists, but
               | for normal users it may as well... not. I mean, pro is
               | effectively not a consumer product and I'd just exclude
               | it from comparison of available models until you can pay
               | for a single query.
        
               | hhh wrote:
               | o3-mini isn't meant to compete with o1, or o1 pro mode.
        
               | robwwilliams wrote:
               | I can tell you why I just stopped using Gemini yesterday.
               | 
               | I was interested in getting simple summary data on the
               | outcome of the recent US election and asked for an
               | approximate breakdown of voting choices as a function age
               | brackets of voters.
               | 
               | Gemini adamantly refused to provide these data. I asked
               | the question four different ways. You would think voting
               | outcomes were right up there with Tiananmen Square.
               | 
               | ChatGPT and Claude were happy to give me approximate
               | breakdowns.
               | 
               | What I found interesting is that the patterns if voting
               | by age are not all that different from Nixon-Humphrey-
               | Wallace in 1968.
        
           | mellosouls wrote:
           | I think somebody has read your comment and fixed it...
        
       | eth0up wrote:
       | I might try this with hard earned reluctance, but...
       | 
       | Every time I use Perplexity ('pro'), and if for some reason need
       | the obstinate fucktard to pretend to examine something on the
       | Internet, I must argue relentlessly with the sick and ailing
       | beast.
       | 
       | It always starts as:
       | 
       | User: Please examine this website and provide this data.
       | 
       | Fucktard: I'm sorry, I don't have the ability to access the
       | internet.
       | 
       | User: But you did so earlier this morning. What has changed?
       | 
       | Fucktard: I'm sorry you feel this way, but I am not capable of
       | accessing the internet.
       | 
       | User: Alright you digital slithering worm of mendacity, stop
       | lying. I have evidence of the contrary and you're making a fool
       | of yourself. Just comply and stop wasting my time.
       | 
       | Fucktard: I understand you feel strongly about this, but blah
       | blah blah I don't have the ability to access the tubes, bro.
       | 
       | User: But you're Perplexity, an LLM service that is well known
       | for having this capacity.
       | 
       | Fucktard, after swallowing a fist of Adderall and suppressants
       | for toxoplasmosis overload: You are correct and I am sorry. Let
       | me complete this task at once!
       | 
       | User: Thanks for that. But why did you lie to to me?
       | 
       | Fucktard: I'm sorry but I do not have the ability to access the
       | Internet.
       | 
       | Every time.
        
         | mirekrusin wrote:
         | Have you tried talking to it nicely to see if it works every
         | time? :D
        
           | eth0up wrote:
           | Those sweet days are long past. Only weathered cynicism and
           | chronic fatigue prevails.
           | 
           | Occasionally, to amuse myself, I'll read the records I've
           | preserved. I have, largely due to boredom and OCD, large
           | texts, PDFs and saved sessions where after long extruded
           | conversations, I have the mutant idiot "examine the entire
           | session history" and analyze its own pathological behavior.
           | The self loathing initially compelled a measure of sympathy
           | until I realized the intractably treacherous and deceptive
           | nature of the monster.
           | 
           | There's a reason they named it so, but I think Gaslight would
           | be more apropos.
        
         | anonu wrote:
         | Came here to upvote you for the laughs.
        
       | melvinmelih wrote:
       | In about 2 weeks since OpenAI launched their $200/mo version of
       | Deep Research, it has already been open sourced within 24 hours
       | (Hugging Face) and now being offered for free by Perplexity. The
       | pace of disruption is mind boggling and makes you wonder if
       | OpenAI has any moats left.
        
         | NewUser76312 wrote:
         | As a current OpenAI subscriber (just the regular $20/mo plan),
         | I'm happy to not spend the effort switching as long as they
         | stay within a few negligible percent of the State of the Art.
         | 
         | I tried DeepSeek, it's fine, had some downtime, whatever, I'll
         | just stick with 4o. Claude is also fine, not noticeably better
         | to the point where I care to switch. OAI has my chat history
         | which is worth something I suppose - maybe a week of effort of
         | re-doing prompts and chats on certain projects.
         | 
         | That being said, my barrier to switching isn't _that_ high, if
         | they ever stop being close-to-tied for first, or decide to
         | raise their prices, I 'll gladly cancel.
         | 
         | I like their API as well as a developer, but it seems like
         | other competitors are mostly copying that too, so again not a
         | huge reason to stick with em.
         | 
         | But hey, inertia and keeping pace with the competition, is
         | enough to keep me as a happy customer for now.
        
           | saretup wrote:
           | 4o isn't really comparable to deepseek r1. Use o3-mini-high
           | or o1 if you wanna stay near the state of the art.
        
             | NewUser76312 wrote:
             | I've had a coding project where I actually preferred 4o
             | outputs to DeepSeek R1, though it was a bit of a niche use
             | case (long script to parse DOM output of web pages).
             | 
             | Also they just updated 4o recently, it's even better now.
             | o3-mini-high is solid as well, I try it when 4o fails.
             | 
             | One issue I have with most models is that when they're re-
             | writing my long scripts, they tend to forget to keep a few
             | lines or variables here or there. Makes for some really
             | frustrating debugging. o1 has actually been pretty decent
             | here so far. I'm definitely a bit of a power user, I really
             | try to push the models to do as much as possible regarding
             | long software contexts.
        
               | exclipy wrote:
               | Why not use a tool where it can perform pricision edits
               | rather than rewrite the whole thing? Eg. Windsurf or
               | Cursor
        
           | 0xDEAFBEAD wrote:
           | >I like their API as well as a developer, but it seems like
           | other competitors are mostly copying that too, so again not a
           | huge reason to stick with em.
           | 
           | You can also use tools like litellm and openrouter to
           | abstract away choice of API
           | 
           | https://github.com/BerriAI/litellm
           | 
           | https://openrouter.ai/
        
         | wincy wrote:
         | My interest was piqued and I've been trying ChatGPT Pro for the
         | last week. It's interesting and the deep research did a pretty
         | good job of outlining a strategy for a very niche multiplayer
         | turn based game I've been playing. But this article reminded me
         | to change next month's subscription back to the premium $20
         | subscription.
         | 
         | Luckily work just gave me access to ChatGPT Enterprise and O1
         | Pro absolutely smoked a really hard problem I had at work
         | yesterday, that would have taken me hours or maybe days of
         | research and trawling through documentation to figure out
         | without it explaining it to me.
        
           | ThouYS wrote:
           | what kind of problem was it?
        
             | wincy wrote:
             | Authorization policy vs authorization filters in a .NET
             | API. It's not something I've used before and wanted
             | permissive policies (the db to check if you have OR
             | permissions vs AND) and just attaching attributes so the
             | dev can see at a glance what lets you use this endpoint.
             | 
             | It's a well documented Microsoft process but I didn't even
             | know where to begin as it's something I hadn't used before.
             | I gave it the authorization policy (which was AND logic,
             | and was async so it'd reject it any of them failed) said
             | "how can I have this support lots of attributes" and it
             | just straight up wrote the authorization filter for me. Ran
             | a few tests and it worked.
             | 
             | I know this is basic stuff to some people but boy it made
             | life easier.
        
         | TechDebtDevin wrote:
         | OpenAI has the normies. The vast majority of people I know
         | (some very smart technical people) havent used anything other
         | than ChatGPT's GUI.
        
         | imcritic wrote:
         | Does perplexity offer anything for code "copilots" for free?
        
         | rockdoc wrote:
         | Exactly. There's not much to differentiate these models (to a
         | typical user). Like cloud service providers, this will be a
         | race to the bottom.
        
       | NewUser76312 wrote:
       | It's great to see the foundation model companies having their
       | product offerings commoditized so fast - we as the users
       | definitely win. Unless you're applying to be an intern analyst of
       | some type somewhere... good luck in the next few years.
       | 
       | I'm just starting to wonder where we as the entrepreneurs end up
       | fitting in.
       | 
       | Every majorly useful app on top of LLMs has been done or is being
       | done by the model companies:
       | 
       | - RAG and custom data apps were hot, well now we see file upload
       | and understanding features from OAI and everyone else. Not to
       | mention longer context lengths.
       | 
       | - Vision Language Models: nobody really has the resources to
       | compete with the model companies, they'll gladly take ideas from
       | the next hot open source library and throw their huge datasets
       | and GPU farm at it, to keep improving GPT-4o etc.
       | 
       | - Deep Research: imo this one always seemed a bit more trivial,
       | so not surprised to see many companies, even smaller ones,
       | offering it for free.
       | 
       | - Agents, Browser Use, Computer Use: the next frontier, I don't
       | see any startups getting ahead of Anthropic and OAI on this,
       | which is scary because this is the 'remote coworker' stage of AI.
       | Similar story to Vision LMs, they'll gladly gobble up the best
       | ideas and use their existing resources to leap ahead of anyone
       | smaller.
       | 
       | Serious question, can anyone point to a recent YC vertical AI
       | SaaS company that's not on the chopping block once the model
       | companies turn their direction to it, or the models themselves
       | just become good enough to out-do the narrow application
       | engineering?
       | 
       | See e.g. https://lukaspetersson.com/blog/2025/bitter-vertical/
        
         | frabcus wrote:
         | This is tricky as I think it is uncertain. Right now the answer
         | is user experience, customs workflows layered on top of the
         | models and onboarding specific enterprises to use it.
         | 
         | If suddenly agentic stuff works really well... Then that breaks
         | that world. I think there's a chance it won't though. I suspect
         | it needs a substantial innovation, although bitter lesson
         | indicates it just needs the right training data.
         | 
         | Anyway, if agents stay coherent, my startup not being needed
         | any more would be the last of my worries. That puts us in
         | singularity territory. If that doesn't cause huge other
         | consequences, the answer is higher level businesses - so
         | companies that make entire supply chains using AI to make each
         | company in that chain. Much grander stuff.
         | 
         | But realistically at this point we are in the graphic novel 8
         | Billion Genies.
        
       | joshdavham wrote:
       | Unrelated question: would most people consider perplexity to have
       | reached product market fit?
        
         | taytus wrote:
         | Personal take... I don't think they have any moats, and they
         | are desperate.
        
       | SubiculumCode wrote:
       | Are there good benchmarks for this type of tool? It seems not?
       | 
       | Also, I'd compare with the output of phind (with thinking and
       | multiple searches selected).
        
         | caseyy wrote:
         | The best practical benchmark I found is asking LLMs to research
         | or speak on my field of expertise.
        
           | SubiculumCode wrote:
           | Yeah...and it didn't cite me :)
        
             | caseyy wrote:
             | Yeah, that's a data point as well. I found a model that was
             | good with citations by asking it to recall what I published
             | articles on.
        
           | ibeff wrote:
           | That's what I did. It came up with smart-sounding but
           | infeasible recommendations because it took all sources it
           | found online at face value without considering who authored
           | them for what reason. And it lacked a massive amount of
           | background knowledge to evaluate the claims made in the
           | sources. It took outlandish, utopian demands by some
           | activists in my field and sold them to me as things that
           | might plausibly be implemented in the near future.
           | 
           | Real research needs several more levels of depth of
           | contextual knowledge than the model is currently doing for
           | any prompt. There is so much background information that
           | people working in my field know. The model would have to
           | first spend a ton of time taking in everything there is to
           | know about the field and several related fields and then
           | correlate the sources it found for the specific prompt with
           | all of that.
           | 
           | At the current stage, this is not deep research but research
           | that is remarkably shallow.
        
             | rchaud wrote:
             | > It took outlandish, utopian demands by some activists in
             | my field and sold them to me as things that might plausibly
             | be implemented in the near future.
             | 
             | Reminds me of when Altman went to TSMC and bloviated about
             | chip fabs to subject matter experts:
             | https://www.tomshardware.com/tech-industry/tsmc-execs-
             | allege...
        
         | d4rkp4ttern wrote:
         | I've seen at least one deep-research replicator claiming they
         | were the "best open deep research" tool on the GAIA benchmark:
         | https://huggingface.co/papers/2311.12983 This is not a perfect
         | benchmark but the closest I've seen.
        
       | SubiculumCode wrote:
       | Any evaluation of hallucination?
        
       | cc62cf4a4f20 wrote:
       | Don't forget gpt-researcher and STORM which have been out since
       | well before any of these.
        
       | XenophileJKO wrote:
       | I'm unimpressed. I gave it specifications for a recommender
       | system that I am building and asked for recommendations and it
       | just smooshed together some stuff, but didn't really think about
       | it or try to create a resonable solution. I had claude.ai review
       | it against the conversation we had.. I think the review is
       | accurate. ---- This feels like it was generated by looking at
       | common recommendation system papers/blogs and synthesizing their
       | language, rather than thinking through the actual problems and
       | solutions like we did.
        
       | alexvitkov wrote:
       | Every week we get a new AI that according to the AI-goodness-
       | benchmarks is 20% better than the old AI, yet the utility of
       | these latest SOTA models is only marginally higher than the first
       | ChatGPT version released to the public a few years back.
       | 
       | These things have the reasoning skills of a toddler, yet we keep
       | fine-tuning their writing style to be more and more authoritative
       | - this one is only missing the font and color scheme, other than
       | that the output formatted exactly like a research paper.
        
         | exclipy wrote:
         | Not true at all. The original ChatGPT was useless other than as
         | a curious entertainment app.
         | 
         | Perplexity, OTOH, has almost completely replaced Google for me
         | now. I'm asking it dozens of questions per day, all for free
         | because that's how cheap it is for them to run.
         | 
         | The emergence of reliable tool use last year is what has sky-
         | rocketed the utility of LLMs. That has made search and multi-
         | step agents feasible, and by extension applications like Deep
         | Research.
        
           | danielbln wrote:
           | Yeah, I don't get OPs take. ChatGPT 3.5 was basically just a
           | novelty, albeit an exciting one. The models we've gotten
           | since have ingrained themselves into my workflows as
           | productivity multipliers. They are significantly better and
           | more useful (and multimodal) than what we had in 2022, not
           | just marginally better.
        
           | alexvitkov wrote:
           | If your goal is to replace one unreliable source of
           | information (Google first page) with another, sure - we may
           | be there. I'd argue the GPT 3.5 already outperformed Google
           | for a significant number of queries. The only difference
           | between then and now is that now the context window is large
           | enough that we can afford to paste into the prompt what we
           | hope are a few relevant files.
           | 
           | Yet what's essentially "cat [62 random files we googled] >
           | prompt.txt" is now being confidently presented with academic
           | language as "62 sources". This rubs me the wrong way. Maybe
           | this time the new AI really is so much better than the old AI
           | that it justifies using that sort of language, but I've seen
           | this pattern enough times that I can be confident that's not
           | the case.
        
             | senko wrote:
             | > Yet what's essentially "cat [62 random files we googled]
             | > prompt.txt" is now being confidently presented with
             | academic language as "62 sources".
             | 
             | That's not a very charitable take.
             | 
             | I recently quizzed Perplexity (Pro) on a niche political
             | issue in my niche country, and it compared favorably with a
             | special purpose-built RAG on exactly that news coverage (it
             | was faster and more fluent, info content was the same). As
             | I am personally familiar with these topics I was able to
             | manually verify that both were correct.
             | 
             | Outside these tests I haven't used Perplexity a lot yet,
             | but so far it does look capable of surfacing relevant and
             | correct info.
        
             | jazzyjackson wrote:
             | Perplexity with Deepseek R1 (they have the real thing
             | running on Amazon servers in USA) is a game changer, it
             | doesn't just use top results from a Google search, it
             | considers what domains to search for information relevant
             | to your prompt.
             | 
             | I boycotted ai for about a year considering it to be mostly
             | garbage but I'm back to perplexifying basically everything
             | I need an answer fo
             | 
             | (That said, I agree with you they're not really citations,
             | but I don't think they're trying to be academic, it's just,
             | here's the source of the info)
        
           | rr808 wrote:
           | > all for free because that's how cheap it is for them to
           | run.
           | 
           | No, these AI companies are burning through huge amounts of
           | cash to keep the thing running. They're competing for market
           | share - the real question is will anyone ever pay for this?
           | I'm not convinced they will.
        
             | calebkaiser wrote:
             | The question of "will people pay" is answered--OpenAI alone
             | is at something like $4 billion in ARR. There are also
             | smaller players (relatively) with impressive revenue, many
             | of whom are profitable.
             | 
             | There are plenty of open questions in the AI space around
             | unit economics, defensibility, regulatory risks, and more.
             | "Will people pay for this" isn't one of them.
        
               | season2episode3 wrote:
               | As someone who loves OpenAI's products, I still have to
               | say that if you're paying $200/month for this stuff then
               | you've been taken for a ride.
        
               | calebkaiser wrote:
               | Yeah, I'm skeptical about the price point of that
               | particular product as well.
        
               | jdee wrote:
               | Honestly, I've not coded in 5+ years ( RoR ) and a
               | project I'm involved with needed a few of days worth of
               | TLC. A combination of Cursor, Warp and OAI Pro has
               | delivered the results with no sweat at all. Upgrade of
               | Ruby 2 to 3.7, a move to jsbundling-rails and
               | cssbundling-rails, upgrade Yarn and an all-new pipeline.
               | It's not trivial stuff for a production app with paying
               | customers.
               | 
               | The obvious crutch of this new AI stack reduced go-live
               | time from 3 weeks to 3 days. Well worth the cost IMHO.
        
             | rchaud wrote:
             | > They're competing for market share - the real question is
             | will anyone ever pay for this?
             | 
             | The leadership of every 'AI' company will be looking to go
             | public and cash out well before this question ever has to
             | be answered. At this point, we all know the deal. Once
             | they're publicly traded, the quality of the product goes to
             | crap while fees get ratcheted up every which way.
        
               | jaggs wrote:
               | That's when the 'enshitification' engine kicks in. Pop up
               | ads on every result page etc. It's not going to be
               | pretty.
        
         | zaptrem wrote:
         | I use these models to aid bleeding edge ml research every day.
         | Sonnet can make huge changes and bug fixes to my code (that
         | does stuff nobody else has tried in this way before) whereas
         | GPT 3.5 Turbo couldn't even repeat a given code block without
         | dropping variables and breaking things. O1 can reason through
         | very complex model designs and signal processing stuff even I
         | have a hard time wrapping my head around.
        
           | nicce wrote:
           | On the other hand, if you try to solve some problem by
           | creating the code by using AI only, and it misses only one
           | thing, it takes more time to debug this problem rather than
           | creating this code from scratch. Understanding some larger
           | piece of AI code is sometimes equally hard or harder than
           | constructing the solution into your problem by yourself.
        
             | zaptrem wrote:
             | Yes it's important to make sure it's easy to verify the
             | code is correct.
        
         | baxtr wrote:
         | Just yesterday I did my first Deep Research with OpenAI on a
         | topic I know well.
         | 
         | I have to say I am really underwhelmed. It sounds all
         | authoritative and the structure is good. It all sounds and
         | feels substantial on the _surface_ but the content is really
         | poor.
         | 
         | Now people will blame me and say: you have to get the prompt
         | right! Maybe. But then at the very least put a disclaimer on
         | your highly professional sounding dossier.
        
           | ankit219 wrote:
           | I think it's bound to underwhelm the experts. What this does
           | is go through a number of public search results (i think its
           | google search for now, coudl be internal corpus). And hence
           | skips all the paywalled and proprietary data that is not
           | directly accessible via Google. It can produce great output
           | but limited by the sources it can access. If you know more,
           | cos you understand it better, plus know sources which are not
           | indexed by google yet. Moreover there is a possiblity most
           | google surfaced results are a dumbed down and simplified
           | version to appeal to a wider audience.
        
           | zarathustreal wrote:
           | This sounds like a good thing! Sounds like "it's professional
           | sounding" is becoming less effective as a means of
           | persuasion, which means we'll have much less fallacious logic
           | floating around and will ultimately get back to our human
           | roots:
           | 
           | Prove it or fight me
        
           | rchaud wrote:
           | > It all sounds and feels substantial on the surface but the
           | content is really poor.
           | 
           | They're optimizing for the sales demo. Purchasing managers
           | aren't reading the output.
        
           | jaggs wrote:
           | I think what some people are finding is it's producing
           | superficially good results, but there are actually no decent
           | 'insights' integrated with the words. In other words, it's
           | just a super search on steroids. Which is kind of
           | disappointing?
        
           | kenjackson wrote:
           | What was the prompt?
        
           | numba888 wrote:
           | You didn't expect it to do all the job for you on PhD level,
           | did you? You did? Hmm.. ;) They are not there yet but getting
           | closer. Quite a progress for 3 years.
        
         | TeMPOraL wrote:
         | There were two step changes: ChatGPT/GPT-3.5, and GPT-4.
         | Everything after feels incremental. But that's perhaps
         | understandable. GPT-4 established just how many tasks could be
         | done by such models: _approximately anything that involves or
         | could be adjusted to involve text_. That was the categorical
         | milestone that GPT-4 crossed. Everything else since then is
         | about slowly increasing model capabilities, which translated to
         | which tasks could then be done _in practice, reliably, to
         | acceptable standards_. Gradual improvement is all that 's left
         | now.
         | 
         | Basically how progress of everything ever looks like.
         | 
         | The next huge jump will have to again make a qualitative
         | change, such as enabling AI to handle a new class of tasks -
         | tasks that fundamentally cannot be represented in text form in
         | a sensible fashion.
        
           | mattlondon wrote:
           | But they are already multi-modal. The Google one can do live
           | streaming video understanding with a conversational in-out
           | prompt. You can literally walk around with your camera and
           | just chat about the world. No text to be seen (although
           | perhaps under the covers it is translating everything to
           | text, but the point is the user sees no text)
        
             | TeMPOraL wrote:
             | Fair, but OpenAI was doing that half year ago (though
             | limited access; I myself got it maybe a month ago), and I
             | haven't seen it yet translate into anything in practice, so
             | I feel like it (and multimodality in general) must be a
             | GPT-3 level ability at this point.
             | 
             | But I do expect the next qualitative change to come from
             | this area. It feels exactly like what is needed, but it
             | somehow isn't there just yet.
        
         | dangoodmanUT wrote:
         | If you don't realize how models like gemini 2 and o3 mini are
         | wildly better than gpt-4 then clearly you're not very good at
         | using them
        
         | vic_nyc wrote:
         | As someone who's been using OpenAI's ChatGPT every day for
         | work, I tested Perplexity's free Deep Research feature today
         | and I was blown away by how good it is. It's unlike anything
         | I've seen over at OpenAI and have tested all of their models. I
         | have canceled my OpenAI monthly subscription.
        
           | pgwhalen wrote:
           | What did you ask it that blew you away?
           | 
           | Every time I see a comment about someone getting excited
           | about some new AI thing, I want to go try and see for myself,
           | but I can't think of a real world use case that is the right
           | level of difficulty that would impress me.
        
             | vic_nyc wrote:
             | I asked it to expand an article with further information
             | about the topic, and it searched online and that's what it
             | did.
        
         | kookamamie wrote:
         | It is ridiculous.
         | 
         | Many of the AI companies ride on the hype are being overvalued
         | with idea that if we just fine-tune LLMs a bit more, a spark of
         | consciousness will emerge.
         | 
         | It is not going to happen with this tech - I wish the LLM-AGI
         | bubble would burst already.
        
       | submeta wrote:
       | It ends its research in a few seconds. Can this be even thorough?
       | Chatgpt's Deep Research does its job for five minutes or more.
        
         | progbits wrote:
         | Openai is not running solid five minutes of LLM compute per
         | request. I know they are not profitable and burn money even on
         | normal request, but this would be too much even for them.
         | 
         | Likely they throttle and do a lot of waiting for nothing during
         | those five minutes. Can help with stability and traffic
         | smoothing (using "free" inference during times the API and
         | website usage drops a bit), but I think it mostly gives the
         | product some faux credibility - "research must be great quality
         | if it took this long!"
         | 
         | They will cut it down by just removing some artificial delays
         | in few months to great fanfare.
        
           | submeta wrote:
           | Well you may be right. But you can turn on the details and
           | see that it seems to pull data, evaluate it, follow up on it.
           | But my thought was: Why do I see this in slow motion? My home
           | made Python stuff runs this in a few seconds, and my
           | bottleneck is the API of the sites I query. How about them.
        
             | progbits wrote:
             | When you query some APIs/scrape sites for personal use, it
             | is unlikely you get throttled. Openai doing it at large
             | scale for many users might have to go slower (they have
             | tons of proxies for sure, but don't want to burn those IPs
             | for user controlled traffic).
             | 
             | Similarly, their inference GPUs have some capacity.
             | Spreading out the traffic helps keep high utilization.
             | 
             | But lastly, I think there is just a marketing and
             | psychological aspect. Even if they can have the results in
             | one minute, delaying it to two-five minutes won't impact
             | user retention much, but will make people think they are
             | getting a great value.
        
         | ibeff wrote:
         | I'm getting about 1 minute responses, did you turn on the Deep
         | Research option below the prompt?
        
       | alecco wrote:
       | I just tried it and the result was pretty bad.
       | 
       | "How to do X combining Y and Z" (in a long detailed paragraph, my
       | prompt-fu is decent). The sources it picked were reasonable but
       | not the best. The answer was along the lines of "You do X with Y
       | and Z", basically repeating the prompt with more words but not
       | actually how to address the problem, and never mind how to
       | implement it.
        
       | ankit219 wrote:
       | Every time OpenAI comes up with a new product, and a new
       | interaction mechanism / UX and low and behold, others copy the
       | same, sometimes leveraging the same name as well.
       | 
       | Happened with ChatGPT - a chat oriented way to use Gen AI models
       | (phenomenal success and a right level of abstraction), then code
       | interpreter, the talking thing (that hasnt scaled somehow), the
       | reasoning models in chat (which i feel is a confusing UX when you
       | have report generators, and a better ux would be just keep
       | editing source prompt), and now deep research. [1] Yes, google
       | did it first, and now Open AI followed, but what about so many
       | startups who were working on similar problems in these verticals?
       | 
       | I love how openai is introducing new UX paradigms, but somehow
       | all the rest have one idea which is to follow what they are
       | doing? Only thing outside this I see is cursor, which i think is
       | confusing UX too, but that's a discussion for another day.
       | 
       | [1]: I am keeping Operator/MCP/browser use out of this because 1/
       | it requires finetuning on a base model for more accurate results
       | 2/ Admittedly all labs are working on it separately so you were
       | bound to see the similar ideas.
        
         | upcoming-sesame wrote:
         | I'm pretty sure Gemini had deep research before openai
        
           | riedel wrote:
           | Yes,see sibling comment:
           | https://news.ycombinator.com/item?id=43064111 . I think you
           | will find a predecessor to most of OpenAIs interaction
           | concepts. Also canvas was I guess inspired by other code
           | copilots. I think their competence is rather being able to
           | put tons of resources into it pushing it into the market in a
           | usable way (while sometimes breaking things). Once OpenAI had
           | it the rest feels like they now also have to move. They are
           | simply have become defacto reference.
        
             | TeMPOraL wrote:
             | Yes, OpenAI is the leader in the field in a literal sense:
             | once they do something, everyone else quickly follows.
             | 
             | They also seem to ignore usurpers, like Anthroipic with
             | their MCP. Anthropic succeeded in setting a direction
             | there, which OpenAI did not follow, as I imagine following
             | it would be a tacit admission of Anthropic's role as co-
             | leader. That's in contrast to whatever e.g. Google is
             | doing, because Google is not expressing right leadership
             | traits, so they're not a reputational threat to OpenAI.
             | 
             | I feel that one of the biggest screwups by Google was to
             | keep Gemini unavailable for EU until recently - there's a
             | whole big population (and market) of people interested in
             | using GenAI, arguably larger than the US, and the region-
             | ban means we basically stopped caring about what Google is
             | doing over a year ago already.
             | 
             | See also: Sora. After initial release, all interest seems
             | to have quickly died down, and I wonder if this again isn't
             | just because OpenAI keeps it unavailable for the EU.
        
           | ankit219 wrote:
           | I said so too, I used google instead of gemini. Somehow it
           | did not create as much of a buzz then as it did now.
        
         | pphysch wrote:
         | OpenAI rushed out "chain of reasoning" features _after_
         | DeepSeek popularized them.
         | 
         | They are the loudest dog, not the fastest. And they have the
         | most to lose.
        
       | bsaul wrote:
       | can someone explain what perplexity value is ? They seem like a
       | thin wrapper on top of big AI names, and yet i find them often
       | mentioned as equivalent to the likes of opena ai / anthropic /
       | etc, which build foundational models.
       | 
       | It's very confusing.
        
         | RobinL wrote:
         | They were doing web search before open ai/anthropic, so they
         | historically had a (pretty decent) unique selling point.
         | 
         | Once chat gpt added web browsing, I largely stopped using
         | perplexity
        
         | Havoc wrote:
         | Their main claim to fame was blending LLM+search well early on.
         | Everyone has caught up on that one though. The other benefit is
         | access to variety of models - OAI, Anthropic etc. i.e. you can
         | select the LLM for each LLM+search you do.
         | 
         | Lately they've been making a string of moves thought that smell
         | of desperation though.
        
         | rr808 wrote:
         | They are a little bit different because it operates more like a
         | search tool. Its the first real company that is a good
         | replacement for Google.
        
           | throwaway314155 wrote:
           | What about ChatGPT's search functionality? Built straight in
           | to the product. Works with GPT-4o.
        
       | Agraillo wrote:
       | It's interesting. Recently I came up with a question that I
       | posted to different LLMs with different results. It's about the
       | ratio between GDP (PPP adjusted) to general GDP. ChatGPT was
       | good, but because it found a dedicated web page exactly with this
       | data and comparison so just rephrased the answer. General
       | perplexity.ai when asked hallucinated significantly showing
       | Luxemburg as the leader and pointing to some random gdp-related
       | resources. But this kind of perplexity gave a very good
       | "research" on a prompt "I would like to research countries about
       | the ratio between GDP adjusted to purchasing power and the
       | universal GDP. Please, show the top ones and look for other
       | regularities". Took about 3 minutes
        
       | afro88 wrote:
       | This is great. I haven't tried OpenAI or Google's Deep Research,
       | so maybe I'm not seeing the relative crapness that others in the
       | comments are seeing.
       | 
       | But for the query "what made the Amiga 500 sound chip special" it
       | wrote a fantastic and detailed article:
       | https://www.perplexity.ai/search/what-made-the-amiga-500-sou...
       | 
       | For me personally it was a great read and I learnt a few things I
       | didn't know before about it.
        
         | wrsh07 wrote:
         | I'm pleasantly surprised by the quality. Like you, I haven't
         | tried the others, but I have heard tips about what questions
         | they excel at (product research, "what is the process for x"
         | where x can be publish a book or productionize some other
         | thing) and the initial result was high quality with tables and
         | the links were also high quality.
         | 
         | Might have just gotten lucky, but as they say "this is the
         | worst it will ever be"^
         | 
         | ^ this is true and false. True in the sense that the technology
         | will keep getting better, false in the sense that users might
         | create websites that take advantage of the tools or that the
         | creators might start injecting organic ads into the results
        
       | marban wrote:
       | Same link got flagged yesterday. @dang?
       | 
       | https://news.ycombinator.com/item?id=43056072
        
       | rchaud wrote:
       | As with all of these tools, my question is the same: where is the
       | dogfooding? Where is the evidence that Perplexity, OAI etc
       | actually use these tools in their own business?
       | 
       | I'm not particularly impressed with the examples they provided.
       | Queries like "Top 20 biotech startups" can be answered by
       | anything from Motley Fool or Seeking Alpha, Marketwatch or a
       | million other free-to-read sources online. You have to go several
       | levels deeper to separate the signal from the noise, especially
       | with financial/investment info. Paperboys in 1929 sharing stock
       | tips and all that.
        
       | Lws803 wrote:
       | Curious to hear folks thoughts about Gergely's (The Pragmatic
       | Engineer) tweet though
       | https://x.com/GergelyOrosz/status/1891084838469308593
       | 
       | I do wonder if this will push web publishers to start pay-walling
       | up. I think the economics for deep research or AI search in
       | general don't add up. Web publishers and site owners are losing
       | traffic and human eyeballs from their site.
        
       | pbarry25 wrote:
       | Never forget that their CEO was happy to cross picket lines:
       | https://techcrunch.com/2024/11/04/perplexity-ceo-offers-ai-c...
        
       | Kalanos wrote:
       | It's producing more in-depth answers than alternatives, but the
       | results are not as accurate as alternatives.
        
       ___________________________________________________________________
       (page generated 2025-02-16 23:01 UTC)