[HN Gopher] Haystack 1.0 - open-source NLP framework to build NL...
___________________________________________________________________
Haystack 1.0 - open-source NLP framework to build NLProc back end
applications
Author : antti909
Score : 81 points
Date : 2021-12-09 18:27 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| artembugara wrote:
| Interesting. Could anyone think of a use case with news data? We
| index over 1 million news articles daily.
|
| Maybe, there's a way to have something for a specific industry?
|
| https://newscatcherapi.com/
| whalesalad wrote:
| This is perhaps just an innocent question for the community -
| but it's simultaneously an ingenious approach to inbound
| marketing.
| artembugara wrote:
| That's both and I feel no shame for it.
| antti909 wrote:
| Whoa, this is cool :) I could think of a ton of marketing (or
| maybe even 'devrel') applications for it.
| visarga wrote:
| Tried the demo but it could not answer correctly any of my
| questions, maybe it didn't have the answers in its index.
| tholor wrote:
| The demo corpus there just contains documents about countries
| and capital cities. So you could try asking questions like
| "What's the climate of Beijing?" or "How many people live in
| the capital of the US?".
| timomo wrote:
| btw the demo can be found at https://haystack-
| demo.deepset.ai/
| anentropic wrote:
| Haystack looks great, but the demo maybe highlights some
| difficulties with this kind of task.
|
| "What is the population of Italy?" ...gives the population of
| Rome as first answer at 78.32 relevance :)
|
| I get similar result for some other countries.
|
| "What is the population of Cambridge?" ...to be fair, this is
| an ambiguous place name as there are several around the
| world. However the answer it gives is quite far removed from
| any of them: "In 1788, Kingston had a population of 25,000",
| Relevance: 93.14
| antti909 wrote:
| Yep, that's definitely this challenge with commonly
| available models. In a real-life product development
| there's most often an important step of evaluating the
| model(s) and fine-tuning if necessary.
| antti909 wrote:
| Re "Kingston" - interesting! :) Probably, because of
| "Cambridgeshire"?
| szanz wrote:
| (Disclaimer: I'm a Haystack maintainer and I helped
| creating this demo)
|
| I had to try it out the questions you asked, because your
| first seems totally answerable to me. And indeed I do get
| the right answer in the first position (60 million). Did
| you ask exactly the same question you posted?
|
| For the second, unfortunately we included only country
| pages and capital city pages, so it's likely that the
| information about the population of Cambridge simply wasn't
| there.
|
| In general though I agree this task is not perfect for a
| demo. It's hard to tell whether the model is wrong because
| it doesn't have enough info, or whether it does have the
| data but couldn't find it. The best way to evaluate it will
| always be to try it out on your own data :)
| timomo wrote:
| What is the population of Cambridge?
|
| for me the demo returns that the model did not find an
| answer...
| JPKab wrote:
| I've been looking at this for a while now.
|
| My company has a need to accurately identify, with high
| precision, records in elasticsearch, but with a bit more of a
| semantic match that existing elasticsearch plugins don't support.
| Ideally the best of huggingface on top of elasticsearch.
|
| Has anyone on here tried this out? Curious what your experiences
| are.
| tholor wrote:
| Semantic document search is one of the core use cases we see in
| the community (besides Question Answering) and Haystack was
| pretty much started because we saw that you need much more than
| just models. It's so much pain to integrate models properly
| with document storage (e.g. elasticsearch), route requests
| effectively in larger pipelines or track user feedback in
| production. Have you tried using DPR or sentence transformers
| for your case?
|
| Disclaimer: I am one of the maintainers of Haystack:)
| antti909 wrote:
| Happy to help too - feel free to ask in the community channels
| as well :)
| matheist wrote:
| I had a client who was using sentence transformers with
| elasticsearch already. My colleague suggested switching to
| haystack to enable a larger number of model architectures.
| Switching over to haystack was pretty straightforward because
| we just used it as a wrapper around sentence transformers, but
| I do remember some inconvenience around all the other
| dependencies that haystack pulled in.
|
| Haystack does a lot more besides just wrapping sentence
| transformers, and we weren't using the rest of it, so it was
| just a lot of extra dependencies sitting around taking up disk
| space and memory (I think we had to go up to a larger instance
| size). I remember feeling a bit frustrated that the
| dependencies weren't split up into "core" and "optional" in a
| more fine-grained way, but maybe most users don't mind and so
| it doesn't make sense for them to prioritize that?
|
| [edit: looks like there's an open issue related to this:
| https://github.com/deepset-ai/haystack/issues/1070]
|
| [edit 2: 'JPKab happy to share more about using huggingface and
| elasticsearch. email is in my profile]
| antti909 wrote:
| Noted, we've been discussing dependencies internally indeed
| :) Thanks for the highlight above!!
| eriklarsonr wrote:
| Yeah I'm using their FAISS document store and QA pipeline to
| run semantic search over a set of YouTube transcripts. Was
| easier to set up than Jina AI in my specific use case, and the
| search results are actually useful. Only real constraint for me
| is GPU access, creating the embeddings to store in the FAISS
| index sans a GPU takes an unreasonable amount of time.
| antti909 wrote:
| Glad it worked for you - thanks for sharing!
| dexter89_kp3 wrote:
| Interesting. My current side project is exploring semantic search
| using sentence transformers. Will definitely check this out.
| antti909 wrote:
| Happy to help! - we've got Slack and everything :)
| dragosbulugean wrote:
| does it work on a typical ES index, or do you have to re-index in
| a certain way for it to work?
| sorenbs wrote:
| This is really cool!
|
| Can Haystack be used to index structured data, or just text?
|
| Is it required to use elastic as the backend, or can you use a
| simpler file-based or in-memory backend?
| antti909 wrote:
| Re structured data - in theory, yes :) We have to work a bit
| more in that direction. Here's the first step - querying table
| data, which could be really helpful for reports, financial
| data, etc. In regards to the storage backend - it's currently
| Elasticsearch, OpenSearch, SQL+FAISS/Milvus/Weaviate (when
| using dense vectors/dense passage retrieval). There is also an
| in-memory datastore using python primitives for fast
| prototyping.
|
| (Also, latest features highlights here
| https://www.deepset.ai/blog/new-features-in-haystack-v1.0)
| Der_Einzige wrote:
| Is there any path forward to make Haystack do word-level
| extractive summarization? e.g. like this:
| https://github.com/Hellisotherpeople/CX_DB8
|
| or like this:
| https://huggingface.co/spaces/Hellisotherpeople/Unsupervised...
|
| I am trying to find anything better than these two for this task.
| I feel like Haystack could be an option - but I am not sure.
___________________________________________________________________
(page generated 2021-12-09 23:00 UTC)