[HN Gopher] Perplexity Deep Research
___________________________________________________________________
Perplexity Deep Research
Author : vinni2
Score : 333 points
Date : 2025-02-15 20:07 UTC (1 days ago)
(HTM) web link (www.perplexity.ai)
(TXT) w3m dump (www.perplexity.ai)
| transformi wrote:
| Since google, everyone trying replicate this feature... (OpenAI,
| HF..)
|
| It's powerfull yes, so as asking an A.I and let him sythezise all
| what he fed.
|
| I guess the air is out of the ballon from the big players, since
| they lack of novel innovation in their latest products.
| nextworddev wrote:
| I tried it but it seems to be biased to generate shorter reports
| compared to OpenAI's Deep Research. Perhaps it's a feature.
| larsiusprime wrote:
| I tried using this to create a fifty state table of local laws
| and policies and tax rates and legal obstacles for my pet
| interest (land value tax) I gave it the same prompts I gave
| OpenAI DR. Perplexity gave equally good results, and unlike
| OpenAI didn't bungle the CSV downloads. Recommended!
| CSMastermind wrote:
| I'm super happy that these types of deep research applications
| are being released because it seems like such an obvious use case
| for LLMs.
|
| I ran Perplexity through some of my test queries for these.
|
| One query that it choked hard on was, "List the college majors of
| all of the Fortune 100 CEOs"
|
| OpenAI and Gemini both handle this somewhat gracefully producing
| a table of results (though it takes a few follow ups to get a
| correct list). Perplexity just kind of rambles generally about
| the topic.
|
| There are other examples I can give of similar failures.
|
| Seems like generally it's good at summarizing a single question
| (Who are the current Fortune 100 CEOs) but as soon as you need to
| then look up a second list of data and marry the results it kind
| of falls apart.
| stagger87 wrote:
| Hopefully the end user of these products know something about
| LLMs and why asking a question such as "List the college majors
| of all of the Fortune 100 CEOs" is not really suited well for
| them.
| iandanforth wrote:
| Perhaps you can enlighten us as to why this isn't a good use
| case for an LLM during a deep research workflow.
| jhanschoo wrote:
| LLMs ought to be able to gracefully handle it, but the OP
| comment
| collinvandyck76 wrote:
| For those that don't know, including myself, why would this
| question be particularly difficult for an LLM?
| rs186 wrote:
| If "deep research" can't even handle this, I don't think I
| would trust it with even more complex tasks
| rchaud wrote:
| Hopefully my boss groks how special I am and won't assign me
| tasks I consider to be beneath my intelligence (and beyond my
| capabilities).
| nathanbrunner wrote:
| Tried it and it is worse that OpenAI deep search (one query only,
| will need to try it more I guess...)
| tmnvdb wrote:
| The openAi version costs 200$ and takes a lot longer, not sure
| if it is fair to compare?
| voiper1 wrote:
| My query generated 17 steps of research, gathering 74 sources.
| I picked "Deep Research" from the modes, I almost accidentally
| picked "reasoning".
| simonw wrote:
| That's the third product to use "Deep Research" in its name.
|
| The first was Gemini Deep Research:
| https://blog.google/products/gemini/google-gemini-deep-resea... -
| December 11th 2024
|
| Then ChatGPT Deep Research: https://openai.com/index/introducing-
| deep-research/ - February 2nd 2025
|
| Now Perplexity Deep Research:
| https://www.perplexity.ai/hub/blog/introducing-perplexity-de... -
| February 14th 2025.
| exclipy wrote:
| Is there a problem with this if it's not trademarked? It's like
| saying Apple Maps is the nth product called "Maps".
|
| I, for one, am glad they are standardising on naming of
| equivalent products and wish they would do it more (eg.
| "reasoning" vs "thinking", "advanced voice mode" vs "live")
| anon373839 wrote:
| Not a trademark lawyer, but I don't think Deep Research
| qualifies for trademark protection because it is "merely
| descriptive" of the product's features. The only way to get a
| trademark like that is through "acquired distinctiveness",
| but that takes 5 years of exclusive use and all these
| competitors will make that route impossible.
| mrtesthah wrote:
| Elicit AI just rolled out a similar feature, too, specifically
| for analyzing scientific research papers:
|
| https://support.elicit.com/en/articles/4168449
| transformi wrote:
| You forgot Huggingface researchers - https://www.msn.com/en-
| us/news/technology/hugging-face-resea...
|
| and BTW - I post an exact same spirit comment an hour ago... So
| I guess Today's copycat ethics aren't solely for products- but
| also for comment section . LOL.
| 2099miles wrote:
| Your comment from earlier wasn't as easy to digest as this
| one. I don't think that person copied you at all.
| transformi wrote:
| Thanks. I accept the criticism of being less digest and
| more opinionated. But at the end of the day it provide the
| same information.
|
| Don't get me wrong - I don't mind to be copied on the
| Internet :), but I find this behavior quite rude, so I just
| mentioned it.
| rnewme wrote:
| Thinking simonw is stealing your comment is comedy moment of
| the day
| gbnwl wrote:
| Said comment, so other's don't have to dig around in your
| history:
|
| "Since google, everyone trying replicate this feature...
| (OpenAI, HF..) It's powerfull yes, so as asking an A.I and
| let him sythezise all what he fed.
|
| I guess the air is out of the ballon from the big players,
| since they lack of novel innovation in their latest
| products."
|
| I'd say the important differences are that simonw's comment
| establishes a clear chronology, gives links, and is focused
| on providing information rather than opinion to the reader.
| satvikpendem wrote:
| It is a term of art now in the field.
| qingcharles wrote:
| It failed my first test which concerned Upside magazine. All of
| these deep research versions have failed to immediately surface
| the most famous and controversial article from that magazine,
| "The Pussification of Silicon Valley." When hinted, Perplexity
| did a fantastic job of correcting itself, the others struggled
| terribly. I shouldn't have to hint though, as that requires
| domain knowledge that the asker of a query might be lacking.
|
| We're mere months into these things, though. These are all
| version 1.0. The sheer speed of progress is absolutely wild.
| Has there ever been a comparable increase in the ability of
| another technology on the scale of what we're seeing with LLMs?
| willy_k wrote:
| I wouldn't go so far as to say it was definitely faster, but
| the development of mobile phones post-iPhone went pretty
| quick as well.
| dcreater wrote:
| > pussification of silicon valley upside magazine
|
| Google nor bing can find this
| qingcharles wrote:
| https://www.google.com/search?q=pussification+of+silicon+va
| l...
| stavros wrote:
| Nothing with "pussification" in the title for me there.
| qingcharles wrote:
| Wild. My results are literally dozens of posts about the
| article.
|
| https://imgur.com/a/1hTJVkl
| motoxpro wrote:
| I don't see the article you are mentioning
| qingcharles wrote:
| Wild. My results are literally dozens of posts about the
| article.
|
| https://imgur.com/a/1hTJVkl
| freehorse wrote:
| About the article, not any link to the article itself.
| acka wrote:
| It is possible that the original article is no longer
| accessible online.
|
| The only link I have found is a reproduction of the
| article[1], but I am unable to access the full text due
| to a paywall. I no longer have access to academic
| resources or library memberships that would provide
| access.
|
| My Google search query was:
| pussification of silicon valley inurl:upside
|
| which returned exactly one result.
|
| I suspect the article's low visibility in standard Google
| searches, requiring operators like 'inurl:', might be
| because its PageRank is low due to insufficient
| backlinks.
|
| [1] https://www.proquest.com/docview/217963807?sourcetype
| =Trade%...
| abstractcontrol wrote:
| Can't find it either.
| tomjen3 wrote:
| I see a reference to the comment, a guiardian article
| about the article but not the article itself.
|
| Perhaps it's softnuked in the eu or something?
| acka wrote:
| Do you have Google SafeSearch or Bing's equivalent turned
| on perhaps?
|
| I reckon it might be triggered by the word 'pussification'
| to refuse to return any results related to that.
|
| If you're using a corporate account, it's possible that
| your account manager has enabled SafeSearch, which you may
| not be able to disable.
|
| Local censorship laws, such as those in South Korea, might
| also filter certain results.
| Kye wrote:
| My standard prompts when I want thoroughness:
|
| "Did you miss anything?"
|
| "Can you fact check this?"
|
| "Does this accurately reflect the range of opinions on the
| subject?"
|
| Taking the output to another LLM with the same questions can
| wring out more details.
| ErikBjare wrote:
| I'd expect a "deep research" product to do this for me.
| ofou wrote:
| https://www.emergentmind.com also offers Deep Research on ArXiv
| papers (experimental)
| jsemrau wrote:
| I own DeepCQ.com since early 2023 - Which could do "deepseek"
| for financial research. Maybe I just throw this on the pile,
| too.
| shekhargulati wrote:
| Just a side note: The Wikipedia page for "Deep Research" only
| mentions OpenAI - https://en.wikipedia.org/wiki/Deep_Research
| Mond_ wrote:
| This is bizarre, wasn't Google the one who claimed the name
| and did it first?
| TeMPOraL wrote:
| Gemini was also "use us through this weird interface and
| also you can't if you're in the EU"; that + being far
| behind OpenAI and Anthropic for the past year means, they
| failed to reach notoriety, partly because of their own
| choices.
| CjHuber wrote:
| Honestly I don't get why everybody is saying Gemini is
| far behind. Like for me Gemini Flash Thinking
| Experimental performs far far better then o3 mini
| tr3ntg wrote:
| Seconding this. I get really great results from Flash 2.0
| and even Pro 1.5 for some things compared to OpenAI
| models.
|
| And their 2.0 Thinking model is great for other things.
| When my task matters, I default to Gemini.
| jaggs wrote:
| I find the problem with Gemini is the rate limits. Really
| constrictive.
| DebtDeflation wrote:
| There's a lot of mental inertia combined with an
| extremely fast moving market. Google was behind in the AI
| race in 2023 and a good chunk of 2024. But they largely
| caught up with Gemini 1.5, especially the 002 release
| version. Now with Gemini 2 they are every bit as much of
| a frontier model player as OpenAI and Anthropic, and even
| ahead of them in a few areas. 2025 will be an interesting
| year for AI.
| hansworst wrote:
| Arguably Google is ahead. They have many non-llm uses
| (waymo/deepmind etc) and they have their own hardware, so
| not as reliant on Nvidia.
| tim333 wrote:
| Demis Hassabis isn't very promotional. The other guys
| make more noise.
| Kye wrote:
| It varies a lot for me. One day it takes scattered
| documents, pasted in, and produces a flawless summary I
| can use to organize it all. The next, it barely manages a
| paragraph for detailed input. It does seem like Google is
| quick to respond to feedback. I never seem to run into
| the same problem twice.
| lambdaba wrote:
| > It does seem like Google is quick to respond to
| feedback.
|
| I'm puzzled as to how that would work, when people talk
| about quick changes in model behavior. What exactly is
| being adjusted? The model has already been trained. I
| would think it's just randomness.
| Kye wrote:
| Magic
|
| And fine tuning.
|
| Choose your fighter...
|
| High level overview:
| https://www.datacamp.com/tutorial/fine-tuning-large-
| language...
|
| More detail: https://www.turing.com/resources/finetuning-
| large-language-m...
|
| Nice charts: https://blogs.oracle.com/ai-and-
| datascience/post/finetuning-...
|
| The big platforms also seem to employ an intermediate
| step where they rewrite your prompt. I've downloaded my
| ChatGPT data and found substantial changes from what I
| wrote. _Usually_ for the better. Changes to the way it
| rewrites changes the results.
| brookst wrote:
| System prompts have a huge impact on output. Prompts for
| ChatGPT/etc are around a thousand words, with examples of
| what to do and what not to do. Minor adjustments there
| can make a big difference.
| jaggs wrote:
| I've found this as well. On a good day Gemini is superb.
| But otherwise, awful. Really weird.
| TeMPOraL wrote:
| It _was_ far behind. That 's what I kept hearing on the
| Internet until maybe a couple weeks ago, and it didn't
| seem like a controversial view. Not that I cared much -
| _I couldn 't access it anyway because I am in the EU_,
| which is my main point here: it seems that they've
| improved recently, but at that point, hardly anyone here
| paid it any attention.
|
| _Now_ , as we can finally access it, Google has a chance
| to get back into the race.
| xiphias2 wrote:
| o3 mini is still behind o1 pro, it didn't impress me.
|
| I think the people who think anybody is close to OpenAI
| don't have pro subscription
| viraptor wrote:
| The $200 version? It's interesting that it exists, but
| for normal users it may as well... not. I mean, pro is
| effectively not a consumer product and I'd just exclude
| it from comparison of available models until you can pay
| for a single query.
| hhh wrote:
| o3-mini isn't meant to compete with o1, or o1 pro mode.
| robwwilliams wrote:
| I can tell you why I just stopped using Gemini yesterday.
|
| I was interested in getting simple summary data on the
| outcome of the recent US election and asked for an
| approximate breakdown of voting choices as a function age
| brackets of voters.
|
| Gemini adamantly refused to provide these data. I asked
| the question four different ways. You would think voting
| outcomes were right up there with Tiananmen Square.
|
| ChatGPT and Claude were happy to give me approximate
| breakdowns.
|
| What I found interesting is that the patterns if voting
| by age are not all that different from Nixon-Humphrey-
| Wallace in 1968.
| mellosouls wrote:
| I think somebody has read your comment and fixed it...
| eth0up wrote:
| I might try this with hard earned reluctance, but...
|
| Every time I use Perplexity ('pro'), and if for some reason need
| the obstinate fucktard to pretend to examine something on the
| Internet, I must argue relentlessly with the sick and ailing
| beast.
|
| It always starts as:
|
| User: Please examine this website and provide this data.
|
| Fucktard: I'm sorry, I don't have the ability to access the
| internet.
|
| User: But you did so earlier this morning. What has changed?
|
| Fucktard: I'm sorry you feel this way, but I am not capable of
| accessing the internet.
|
| User: Alright you digital slithering worm of mendacity, stop
| lying. I have evidence of the contrary and you're making a fool
| of yourself. Just comply and stop wasting my time.
|
| Fucktard: I understand you feel strongly about this, but blah
| blah blah I don't have the ability to access the tubes, bro.
|
| User: But you're Perplexity, an LLM service that is well known
| for having this capacity.
|
| Fucktard, after swallowing a fist of Adderall and suppressants
| for toxoplasmosis overload: You are correct and I am sorry. Let
| me complete this task at once!
|
| User: Thanks for that. But why did you lie to to me?
|
| Fucktard: I'm sorry but I do not have the ability to access the
| Internet.
|
| Every time.
| mirekrusin wrote:
| Have you tried talking to it nicely to see if it works every
| time? :D
| eth0up wrote:
| Those sweet days are long past. Only weathered cynicism and
| chronic fatigue prevails.
|
| Occasionally, to amuse myself, I'll read the records I've
| preserved. I have, largely due to boredom and OCD, large
| texts, PDFs and saved sessions where after long extruded
| conversations, I have the mutant idiot "examine the entire
| session history" and analyze its own pathological behavior.
| The self loathing initially compelled a measure of sympathy
| until I realized the intractably treacherous and deceptive
| nature of the monster.
|
| There's a reason they named it so, but I think Gaslight would
| be more apropos.
| anonu wrote:
| Came here to upvote you for the laughs.
| melvinmelih wrote:
| In about 2 weeks since OpenAI launched their $200/mo version of
| Deep Research, it has already been open sourced within 24 hours
| (Hugging Face) and now being offered for free by Perplexity. The
| pace of disruption is mind boggling and makes you wonder if
| OpenAI has any moats left.
| NewUser76312 wrote:
| As a current OpenAI subscriber (just the regular $20/mo plan),
| I'm happy to not spend the effort switching as long as they
| stay within a few negligible percent of the State of the Art.
|
| I tried DeepSeek, it's fine, had some downtime, whatever, I'll
| just stick with 4o. Claude is also fine, not noticeably better
| to the point where I care to switch. OAI has my chat history
| which is worth something I suppose - maybe a week of effort of
| re-doing prompts and chats on certain projects.
|
| That being said, my barrier to switching isn't _that_ high, if
| they ever stop being close-to-tied for first, or decide to
| raise their prices, I 'll gladly cancel.
|
| I like their API as well as a developer, but it seems like
| other competitors are mostly copying that too, so again not a
| huge reason to stick with em.
|
| But hey, inertia and keeping pace with the competition, is
| enough to keep me as a happy customer for now.
| saretup wrote:
| 4o isn't really comparable to deepseek r1. Use o3-mini-high
| or o1 if you wanna stay near the state of the art.
| NewUser76312 wrote:
| I've had a coding project where I actually preferred 4o
| outputs to DeepSeek R1, though it was a bit of a niche use
| case (long script to parse DOM output of web pages).
|
| Also they just updated 4o recently, it's even better now.
| o3-mini-high is solid as well, I try it when 4o fails.
|
| One issue I have with most models is that when they're re-
| writing my long scripts, they tend to forget to keep a few
| lines or variables here or there. Makes for some really
| frustrating debugging. o1 has actually been pretty decent
| here so far. I'm definitely a bit of a power user, I really
| try to push the models to do as much as possible regarding
| long software contexts.
| exclipy wrote:
| Why not use a tool where it can perform pricision edits
| rather than rewrite the whole thing? Eg. Windsurf or
| Cursor
| 0xDEAFBEAD wrote:
| >I like their API as well as a developer, but it seems like
| other competitors are mostly copying that too, so again not a
| huge reason to stick with em.
|
| You can also use tools like litellm and openrouter to
| abstract away choice of API
|
| https://github.com/BerriAI/litellm
|
| https://openrouter.ai/
| wincy wrote:
| My interest was piqued and I've been trying ChatGPT Pro for the
| last week. It's interesting and the deep research did a pretty
| good job of outlining a strategy for a very niche multiplayer
| turn based game I've been playing. But this article reminded me
| to change next month's subscription back to the premium $20
| subscription.
|
| Luckily work just gave me access to ChatGPT Enterprise and O1
| Pro absolutely smoked a really hard problem I had at work
| yesterday, that would have taken me hours or maybe days of
| research and trawling through documentation to figure out
| without it explaining it to me.
| ThouYS wrote:
| what kind of problem was it?
| wincy wrote:
| Authorization policy vs authorization filters in a .NET
| API. It's not something I've used before and wanted
| permissive policies (the db to check if you have OR
| permissions vs AND) and just attaching attributes so the
| dev can see at a glance what lets you use this endpoint.
|
| It's a well documented Microsoft process but I didn't even
| know where to begin as it's something I hadn't used before.
| I gave it the authorization policy (which was AND logic,
| and was async so it'd reject it any of them failed) said
| "how can I have this support lots of attributes" and it
| just straight up wrote the authorization filter for me. Ran
| a few tests and it worked.
|
| I know this is basic stuff to some people but boy it made
| life easier.
| TechDebtDevin wrote:
| OpenAI has the normies. The vast majority of people I know
| (some very smart technical people) havent used anything other
| than ChatGPT's GUI.
| imcritic wrote:
| Does perplexity offer anything for code "copilots" for free?
| rockdoc wrote:
| Exactly. There's not much to differentiate these models (to a
| typical user). Like cloud service providers, this will be a
| race to the bottom.
| NewUser76312 wrote:
| It's great to see the foundation model companies having their
| product offerings commoditized so fast - we as the users
| definitely win. Unless you're applying to be an intern analyst of
| some type somewhere... good luck in the next few years.
|
| I'm just starting to wonder where we as the entrepreneurs end up
| fitting in.
|
| Every majorly useful app on top of LLMs has been done or is being
| done by the model companies:
|
| - RAG and custom data apps were hot, well now we see file upload
| and understanding features from OAI and everyone else. Not to
| mention longer context lengths.
|
| - Vision Language Models: nobody really has the resources to
| compete with the model companies, they'll gladly take ideas from
| the next hot open source library and throw their huge datasets
| and GPU farm at it, to keep improving GPT-4o etc.
|
| - Deep Research: imo this one always seemed a bit more trivial,
| so not surprised to see many companies, even smaller ones,
| offering it for free.
|
| - Agents, Browser Use, Computer Use: the next frontier, I don't
| see any startups getting ahead of Anthropic and OAI on this,
| which is scary because this is the 'remote coworker' stage of AI.
| Similar story to Vision LMs, they'll gladly gobble up the best
| ideas and use their existing resources to leap ahead of anyone
| smaller.
|
| Serious question, can anyone point to a recent YC vertical AI
| SaaS company that's not on the chopping block once the model
| companies turn their direction to it, or the models themselves
| just become good enough to out-do the narrow application
| engineering?
|
| See e.g. https://lukaspetersson.com/blog/2025/bitter-vertical/
| frabcus wrote:
| This is tricky as I think it is uncertain. Right now the answer
| is user experience, customs workflows layered on top of the
| models and onboarding specific enterprises to use it.
|
| If suddenly agentic stuff works really well... Then that breaks
| that world. I think there's a chance it won't though. I suspect
| it needs a substantial innovation, although bitter lesson
| indicates it just needs the right training data.
|
| Anyway, if agents stay coherent, my startup not being needed
| any more would be the last of my worries. That puts us in
| singularity territory. If that doesn't cause huge other
| consequences, the answer is higher level businesses - so
| companies that make entire supply chains using AI to make each
| company in that chain. Much grander stuff.
|
| But realistically at this point we are in the graphic novel 8
| Billion Genies.
| joshdavham wrote:
| Unrelated question: would most people consider perplexity to have
| reached product market fit?
| taytus wrote:
| Personal take... I don't think they have any moats, and they
| are desperate.
| SubiculumCode wrote:
| Are there good benchmarks for this type of tool? It seems not?
|
| Also, I'd compare with the output of phind (with thinking and
| multiple searches selected).
| caseyy wrote:
| The best practical benchmark I found is asking LLMs to research
| or speak on my field of expertise.
| SubiculumCode wrote:
| Yeah...and it didn't cite me :)
| caseyy wrote:
| Yeah, that's a data point as well. I found a model that was
| good with citations by asking it to recall what I published
| articles on.
| ibeff wrote:
| That's what I did. It came up with smart-sounding but
| infeasible recommendations because it took all sources it
| found online at face value without considering who authored
| them for what reason. And it lacked a massive amount of
| background knowledge to evaluate the claims made in the
| sources. It took outlandish, utopian demands by some
| activists in my field and sold them to me as things that
| might plausibly be implemented in the near future.
|
| Real research needs several more levels of depth of
| contextual knowledge than the model is currently doing for
| any prompt. There is so much background information that
| people working in my field know. The model would have to
| first spend a ton of time taking in everything there is to
| know about the field and several related fields and then
| correlate the sources it found for the specific prompt with
| all of that.
|
| At the current stage, this is not deep research but research
| that is remarkably shallow.
| rchaud wrote:
| > It took outlandish, utopian demands by some activists in
| my field and sold them to me as things that might plausibly
| be implemented in the near future.
|
| Reminds me of when Altman went to TSMC and bloviated about
| chip fabs to subject matter experts:
| https://www.tomshardware.com/tech-industry/tsmc-execs-
| allege...
| d4rkp4ttern wrote:
| I've seen at least one deep-research replicator claiming they
| were the "best open deep research" tool on the GAIA benchmark:
| https://huggingface.co/papers/2311.12983 This is not a perfect
| benchmark but the closest I've seen.
| SubiculumCode wrote:
| Any evaluation of hallucination?
| cc62cf4a4f20 wrote:
| Don't forget gpt-researcher and STORM which have been out since
| well before any of these.
| XenophileJKO wrote:
| I'm unimpressed. I gave it specifications for a recommender
| system that I am building and asked for recommendations and it
| just smooshed together some stuff, but didn't really think about
| it or try to create a resonable solution. I had claude.ai review
| it against the conversation we had.. I think the review is
| accurate. ---- This feels like it was generated by looking at
| common recommendation system papers/blogs and synthesizing their
| language, rather than thinking through the actual problems and
| solutions like we did.
| alexvitkov wrote:
| Every week we get a new AI that according to the AI-goodness-
| benchmarks is 20% better than the old AI, yet the utility of
| these latest SOTA models is only marginally higher than the first
| ChatGPT version released to the public a few years back.
|
| These things have the reasoning skills of a toddler, yet we keep
| fine-tuning their writing style to be more and more authoritative
| - this one is only missing the font and color scheme, other than
| that the output formatted exactly like a research paper.
| exclipy wrote:
| Not true at all. The original ChatGPT was useless other than as
| a curious entertainment app.
|
| Perplexity, OTOH, has almost completely replaced Google for me
| now. I'm asking it dozens of questions per day, all for free
| because that's how cheap it is for them to run.
|
| The emergence of reliable tool use last year is what has sky-
| rocketed the utility of LLMs. That has made search and multi-
| step agents feasible, and by extension applications like Deep
| Research.
| danielbln wrote:
| Yeah, I don't get OPs take. ChatGPT 3.5 was basically just a
| novelty, albeit an exciting one. The models we've gotten
| since have ingrained themselves into my workflows as
| productivity multipliers. They are significantly better and
| more useful (and multimodal) than what we had in 2022, not
| just marginally better.
| alexvitkov wrote:
| If your goal is to replace one unreliable source of
| information (Google first page) with another, sure - we may
| be there. I'd argue the GPT 3.5 already outperformed Google
| for a significant number of queries. The only difference
| between then and now is that now the context window is large
| enough that we can afford to paste into the prompt what we
| hope are a few relevant files.
|
| Yet what's essentially "cat [62 random files we googled] >
| prompt.txt" is now being confidently presented with academic
| language as "62 sources". This rubs me the wrong way. Maybe
| this time the new AI really is so much better than the old AI
| that it justifies using that sort of language, but I've seen
| this pattern enough times that I can be confident that's not
| the case.
| senko wrote:
| > Yet what's essentially "cat [62 random files we googled]
| > prompt.txt" is now being confidently presented with
| academic language as "62 sources".
|
| That's not a very charitable take.
|
| I recently quizzed Perplexity (Pro) on a niche political
| issue in my niche country, and it compared favorably with a
| special purpose-built RAG on exactly that news coverage (it
| was faster and more fluent, info content was the same). As
| I am personally familiar with these topics I was able to
| manually verify that both were correct.
|
| Outside these tests I haven't used Perplexity a lot yet,
| but so far it does look capable of surfacing relevant and
| correct info.
| jazzyjackson wrote:
| Perplexity with Deepseek R1 (they have the real thing
| running on Amazon servers in USA) is a game changer, it
| doesn't just use top results from a Google search, it
| considers what domains to search for information relevant
| to your prompt.
|
| I boycotted ai for about a year considering it to be mostly
| garbage but I'm back to perplexifying basically everything
| I need an answer fo
|
| (That said, I agree with you they're not really citations,
| but I don't think they're trying to be academic, it's just,
| here's the source of the info)
| rr808 wrote:
| > all for free because that's how cheap it is for them to
| run.
|
| No, these AI companies are burning through huge amounts of
| cash to keep the thing running. They're competing for market
| share - the real question is will anyone ever pay for this?
| I'm not convinced they will.
| calebkaiser wrote:
| The question of "will people pay" is answered--OpenAI alone
| is at something like $4 billion in ARR. There are also
| smaller players (relatively) with impressive revenue, many
| of whom are profitable.
|
| There are plenty of open questions in the AI space around
| unit economics, defensibility, regulatory risks, and more.
| "Will people pay for this" isn't one of them.
| season2episode3 wrote:
| As someone who loves OpenAI's products, I still have to
| say that if you're paying $200/month for this stuff then
| you've been taken for a ride.
| calebkaiser wrote:
| Yeah, I'm skeptical about the price point of that
| particular product as well.
| jdee wrote:
| Honestly, I've not coded in 5+ years ( RoR ) and a
| project I'm involved with needed a few of days worth of
| TLC. A combination of Cursor, Warp and OAI Pro has
| delivered the results with no sweat at all. Upgrade of
| Ruby 2 to 3.7, a move to jsbundling-rails and
| cssbundling-rails, upgrade Yarn and an all-new pipeline.
| It's not trivial stuff for a production app with paying
| customers.
|
| The obvious crutch of this new AI stack reduced go-live
| time from 3 weeks to 3 days. Well worth the cost IMHO.
| rchaud wrote:
| > They're competing for market share - the real question is
| will anyone ever pay for this?
|
| The leadership of every 'AI' company will be looking to go
| public and cash out well before this question ever has to
| be answered. At this point, we all know the deal. Once
| they're publicly traded, the quality of the product goes to
| crap while fees get ratcheted up every which way.
| jaggs wrote:
| That's when the 'enshitification' engine kicks in. Pop up
| ads on every result page etc. It's not going to be
| pretty.
| zaptrem wrote:
| I use these models to aid bleeding edge ml research every day.
| Sonnet can make huge changes and bug fixes to my code (that
| does stuff nobody else has tried in this way before) whereas
| GPT 3.5 Turbo couldn't even repeat a given code block without
| dropping variables and breaking things. O1 can reason through
| very complex model designs and signal processing stuff even I
| have a hard time wrapping my head around.
| nicce wrote:
| On the other hand, if you try to solve some problem by
| creating the code by using AI only, and it misses only one
| thing, it takes more time to debug this problem rather than
| creating this code from scratch. Understanding some larger
| piece of AI code is sometimes equally hard or harder than
| constructing the solution into your problem by yourself.
| zaptrem wrote:
| Yes it's important to make sure it's easy to verify the
| code is correct.
| baxtr wrote:
| Just yesterday I did my first Deep Research with OpenAI on a
| topic I know well.
|
| I have to say I am really underwhelmed. It sounds all
| authoritative and the structure is good. It all sounds and
| feels substantial on the _surface_ but the content is really
| poor.
|
| Now people will blame me and say: you have to get the prompt
| right! Maybe. But then at the very least put a disclaimer on
| your highly professional sounding dossier.
| ankit219 wrote:
| I think it's bound to underwhelm the experts. What this does
| is go through a number of public search results (i think its
| google search for now, coudl be internal corpus). And hence
| skips all the paywalled and proprietary data that is not
| directly accessible via Google. It can produce great output
| but limited by the sources it can access. If you know more,
| cos you understand it better, plus know sources which are not
| indexed by google yet. Moreover there is a possiblity most
| google surfaced results are a dumbed down and simplified
| version to appeal to a wider audience.
| zarathustreal wrote:
| This sounds like a good thing! Sounds like "it's professional
| sounding" is becoming less effective as a means of
| persuasion, which means we'll have much less fallacious logic
| floating around and will ultimately get back to our human
| roots:
|
| Prove it or fight me
| rchaud wrote:
| > It all sounds and feels substantial on the surface but the
| content is really poor.
|
| They're optimizing for the sales demo. Purchasing managers
| aren't reading the output.
| jaggs wrote:
| I think what some people are finding is it's producing
| superficially good results, but there are actually no decent
| 'insights' integrated with the words. In other words, it's
| just a super search on steroids. Which is kind of
| disappointing?
| kenjackson wrote:
| What was the prompt?
| numba888 wrote:
| You didn't expect it to do all the job for you on PhD level,
| did you? You did? Hmm.. ;) They are not there yet but getting
| closer. Quite a progress for 3 years.
| TeMPOraL wrote:
| There were two step changes: ChatGPT/GPT-3.5, and GPT-4.
| Everything after feels incremental. But that's perhaps
| understandable. GPT-4 established just how many tasks could be
| done by such models: _approximately anything that involves or
| could be adjusted to involve text_. That was the categorical
| milestone that GPT-4 crossed. Everything else since then is
| about slowly increasing model capabilities, which translated to
| which tasks could then be done _in practice, reliably, to
| acceptable standards_. Gradual improvement is all that 's left
| now.
|
| Basically how progress of everything ever looks like.
|
| The next huge jump will have to again make a qualitative
| change, such as enabling AI to handle a new class of tasks -
| tasks that fundamentally cannot be represented in text form in
| a sensible fashion.
| mattlondon wrote:
| But they are already multi-modal. The Google one can do live
| streaming video understanding with a conversational in-out
| prompt. You can literally walk around with your camera and
| just chat about the world. No text to be seen (although
| perhaps under the covers it is translating everything to
| text, but the point is the user sees no text)
| TeMPOraL wrote:
| Fair, but OpenAI was doing that half year ago (though
| limited access; I myself got it maybe a month ago), and I
| haven't seen it yet translate into anything in practice, so
| I feel like it (and multimodality in general) must be a
| GPT-3 level ability at this point.
|
| But I do expect the next qualitative change to come from
| this area. It feels exactly like what is needed, but it
| somehow isn't there just yet.
| dangoodmanUT wrote:
| If you don't realize how models like gemini 2 and o3 mini are
| wildly better than gpt-4 then clearly you're not very good at
| using them
| vic_nyc wrote:
| As someone who's been using OpenAI's ChatGPT every day for
| work, I tested Perplexity's free Deep Research feature today
| and I was blown away by how good it is. It's unlike anything
| I've seen over at OpenAI and have tested all of their models. I
| have canceled my OpenAI monthly subscription.
| pgwhalen wrote:
| What did you ask it that blew you away?
|
| Every time I see a comment about someone getting excited
| about some new AI thing, I want to go try and see for myself,
| but I can't think of a real world use case that is the right
| level of difficulty that would impress me.
| vic_nyc wrote:
| I asked it to expand an article with further information
| about the topic, and it searched online and that's what it
| did.
| kookamamie wrote:
| It is ridiculous.
|
| Many of the AI companies ride on the hype are being overvalued
| with idea that if we just fine-tune LLMs a bit more, a spark of
| consciousness will emerge.
|
| It is not going to happen with this tech - I wish the LLM-AGI
| bubble would burst already.
| submeta wrote:
| It ends its research in a few seconds. Can this be even thorough?
| Chatgpt's Deep Research does its job for five minutes or more.
| progbits wrote:
| Openai is not running solid five minutes of LLM compute per
| request. I know they are not profitable and burn money even on
| normal request, but this would be too much even for them.
|
| Likely they throttle and do a lot of waiting for nothing during
| those five minutes. Can help with stability and traffic
| smoothing (using "free" inference during times the API and
| website usage drops a bit), but I think it mostly gives the
| product some faux credibility - "research must be great quality
| if it took this long!"
|
| They will cut it down by just removing some artificial delays
| in few months to great fanfare.
| submeta wrote:
| Well you may be right. But you can turn on the details and
| see that it seems to pull data, evaluate it, follow up on it.
| But my thought was: Why do I see this in slow motion? My home
| made Python stuff runs this in a few seconds, and my
| bottleneck is the API of the sites I query. How about them.
| progbits wrote:
| When you query some APIs/scrape sites for personal use, it
| is unlikely you get throttled. Openai doing it at large
| scale for many users might have to go slower (they have
| tons of proxies for sure, but don't want to burn those IPs
| for user controlled traffic).
|
| Similarly, their inference GPUs have some capacity.
| Spreading out the traffic helps keep high utilization.
|
| But lastly, I think there is just a marketing and
| psychological aspect. Even if they can have the results in
| one minute, delaying it to two-five minutes won't impact
| user retention much, but will make people think they are
| getting a great value.
| ibeff wrote:
| I'm getting about 1 minute responses, did you turn on the Deep
| Research option below the prompt?
| alecco wrote:
| I just tried it and the result was pretty bad.
|
| "How to do X combining Y and Z" (in a long detailed paragraph, my
| prompt-fu is decent). The sources it picked were reasonable but
| not the best. The answer was along the lines of "You do X with Y
| and Z", basically repeating the prompt with more words but not
| actually how to address the problem, and never mind how to
| implement it.
| ankit219 wrote:
| Every time OpenAI comes up with a new product, and a new
| interaction mechanism / UX and low and behold, others copy the
| same, sometimes leveraging the same name as well.
|
| Happened with ChatGPT - a chat oriented way to use Gen AI models
| (phenomenal success and a right level of abstraction), then code
| interpreter, the talking thing (that hasnt scaled somehow), the
| reasoning models in chat (which i feel is a confusing UX when you
| have report generators, and a better ux would be just keep
| editing source prompt), and now deep research. [1] Yes, google
| did it first, and now Open AI followed, but what about so many
| startups who were working on similar problems in these verticals?
|
| I love how openai is introducing new UX paradigms, but somehow
| all the rest have one idea which is to follow what they are
| doing? Only thing outside this I see is cursor, which i think is
| confusing UX too, but that's a discussion for another day.
|
| [1]: I am keeping Operator/MCP/browser use out of this because 1/
| it requires finetuning on a base model for more accurate results
| 2/ Admittedly all labs are working on it separately so you were
| bound to see the similar ideas.
| upcoming-sesame wrote:
| I'm pretty sure Gemini had deep research before openai
| riedel wrote:
| Yes,see sibling comment:
| https://news.ycombinator.com/item?id=43064111 . I think you
| will find a predecessor to most of OpenAIs interaction
| concepts. Also canvas was I guess inspired by other code
| copilots. I think their competence is rather being able to
| put tons of resources into it pushing it into the market in a
| usable way (while sometimes breaking things). Once OpenAI had
| it the rest feels like they now also have to move. They are
| simply have become defacto reference.
| TeMPOraL wrote:
| Yes, OpenAI is the leader in the field in a literal sense:
| once they do something, everyone else quickly follows.
|
| They also seem to ignore usurpers, like Anthroipic with
| their MCP. Anthropic succeeded in setting a direction
| there, which OpenAI did not follow, as I imagine following
| it would be a tacit admission of Anthropic's role as co-
| leader. That's in contrast to whatever e.g. Google is
| doing, because Google is not expressing right leadership
| traits, so they're not a reputational threat to OpenAI.
|
| I feel that one of the biggest screwups by Google was to
| keep Gemini unavailable for EU until recently - there's a
| whole big population (and market) of people interested in
| using GenAI, arguably larger than the US, and the region-
| ban means we basically stopped caring about what Google is
| doing over a year ago already.
|
| See also: Sora. After initial release, all interest seems
| to have quickly died down, and I wonder if this again isn't
| just because OpenAI keeps it unavailable for the EU.
| ankit219 wrote:
| I said so too, I used google instead of gemini. Somehow it
| did not create as much of a buzz then as it did now.
| pphysch wrote:
| OpenAI rushed out "chain of reasoning" features _after_
| DeepSeek popularized them.
|
| They are the loudest dog, not the fastest. And they have the
| most to lose.
| bsaul wrote:
| can someone explain what perplexity value is ? They seem like a
| thin wrapper on top of big AI names, and yet i find them often
| mentioned as equivalent to the likes of opena ai / anthropic /
| etc, which build foundational models.
|
| It's very confusing.
| RobinL wrote:
| They were doing web search before open ai/anthropic, so they
| historically had a (pretty decent) unique selling point.
|
| Once chat gpt added web browsing, I largely stopped using
| perplexity
| Havoc wrote:
| Their main claim to fame was blending LLM+search well early on.
| Everyone has caught up on that one though. The other benefit is
| access to variety of models - OAI, Anthropic etc. i.e. you can
| select the LLM for each LLM+search you do.
|
| Lately they've been making a string of moves thought that smell
| of desperation though.
| rr808 wrote:
| They are a little bit different because it operates more like a
| search tool. Its the first real company that is a good
| replacement for Google.
| throwaway314155 wrote:
| What about ChatGPT's search functionality? Built straight in
| to the product. Works with GPT-4o.
| Agraillo wrote:
| It's interesting. Recently I came up with a question that I
| posted to different LLMs with different results. It's about the
| ratio between GDP (PPP adjusted) to general GDP. ChatGPT was
| good, but because it found a dedicated web page exactly with this
| data and comparison so just rephrased the answer. General
| perplexity.ai when asked hallucinated significantly showing
| Luxemburg as the leader and pointing to some random gdp-related
| resources. But this kind of perplexity gave a very good
| "research" on a prompt "I would like to research countries about
| the ratio between GDP adjusted to purchasing power and the
| universal GDP. Please, show the top ones and look for other
| regularities". Took about 3 minutes
| afro88 wrote:
| This is great. I haven't tried OpenAI or Google's Deep Research,
| so maybe I'm not seeing the relative crapness that others in the
| comments are seeing.
|
| But for the query "what made the Amiga 500 sound chip special" it
| wrote a fantastic and detailed article:
| https://www.perplexity.ai/search/what-made-the-amiga-500-sou...
|
| For me personally it was a great read and I learnt a few things I
| didn't know before about it.
| wrsh07 wrote:
| I'm pleasantly surprised by the quality. Like you, I haven't
| tried the others, but I have heard tips about what questions
| they excel at (product research, "what is the process for x"
| where x can be publish a book or productionize some other
| thing) and the initial result was high quality with tables and
| the links were also high quality.
|
| Might have just gotten lucky, but as they say "this is the
| worst it will ever be"^
|
| ^ this is true and false. True in the sense that the technology
| will keep getting better, false in the sense that users might
| create websites that take advantage of the tools or that the
| creators might start injecting organic ads into the results
| marban wrote:
| Same link got flagged yesterday. @dang?
|
| https://news.ycombinator.com/item?id=43056072
| rchaud wrote:
| As with all of these tools, my question is the same: where is the
| dogfooding? Where is the evidence that Perplexity, OAI etc
| actually use these tools in their own business?
|
| I'm not particularly impressed with the examples they provided.
| Queries like "Top 20 biotech startups" can be answered by
| anything from Motley Fool or Seeking Alpha, Marketwatch or a
| million other free-to-read sources online. You have to go several
| levels deeper to separate the signal from the noise, especially
| with financial/investment info. Paperboys in 1929 sharing stock
| tips and all that.
| Lws803 wrote:
| Curious to hear folks thoughts about Gergely's (The Pragmatic
| Engineer) tweet though
| https://x.com/GergelyOrosz/status/1891084838469308593
|
| I do wonder if this will push web publishers to start pay-walling
| up. I think the economics for deep research or AI search in
| general don't add up. Web publishers and site owners are losing
| traffic and human eyeballs from their site.
| pbarry25 wrote:
| Never forget that their CEO was happy to cross picket lines:
| https://techcrunch.com/2024/11/04/perplexity-ceo-offers-ai-c...
| Kalanos wrote:
| It's producing more in-depth answers than alternatives, but the
| results are not as accurate as alternatives.
___________________________________________________________________
(page generated 2025-02-16 23:01 UTC)