[HN Gopher] Loki: An open-source tool for fact verification
       ___________________________________________________________________
        
       Loki: An open-source tool for fact verification
        
       Author : Xudong
       Score  : 151 points
       Date   : 2024-04-06 10:59 UTC (12 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dscottboggs wrote:
       | That seems like something unlikely to do well at being automated,
       | and not that at least current-gen ai is capable of.
       | 
       | Does it...work?
        
         | Xudong wrote:
         | Hi there, I agree that fact-checking is not something that
         | current generative AI models can directly solve. Therefore, we
         | decompose this complex into five simpler steps, which current
         | techniques can better solve. Please refer to
         | https://github.com/Libr-AI/OpenFactVerification?tab=readme-o...
         | for more details.
         | 
         | However, errors can always occur. We try to help users in an
         | interpretable and transparent way by showing all retrieved
         | evidence and the rationale behind each assessment. We hope this
         | could at least help people when dealing with such problems.
        
         | szszrk wrote:
         | I just tried similar queries as they show on their screenshots
         | with Kagi. Basically asked it the exact same question.
         | 
         | While it answered a general "yes" when the more precise answer
         | was "no", the motivation in the answer was perfectly on point
         | and exactly the same things.
         | 
         | As a general LLM for regular user fastGPT (their llm service)
         | is in my opinion "meh" (lacks conversations for instance). But
         | it's really impressive that it contains VERY recent data (like
         | news and articles from last few days) and always provides great
         | references.
        
       | chamomeal wrote:
       | Very cool! I've toyed with an idea like this for a while. The
       | scraping is a cool extra feature, but tbh just breaking down text
       | into verifiable claims and setting up the logic tokens is way
       | cooler.
       | 
       | I imagine somebody feeding a live presidential debate into this.
       | Could be a great tool for fact checking
        
         | Xudong wrote:
         | ahah thanks!
        
       | vinni2 wrote:
       | It's a bit misleading to call it open source tool when it relies
       | on proprietary LLMs for everything.
        
         | btbuildem wrote:
         | Presumably the LLMs are swappable -- today the proprietary ones
         | are very powerful and accessible, but the landscape may yet
         | change.
        
           | vinni2 wrote:
           | Well but they don't mention that it is clickbait to call it
           | open source fact checking tool which needs LLMs to do
           | everything. Also code is not designed to easily swap with a
           | free locally running LLM.
        
             | Xudong wrote:
             | I apologize for any confusion caused earlier. The core
             | components have been defined separately
             | (https://github.com/Libr-
             | AI/OpenFactVerification/tree/main/fa...) to make
             | customization easier. We understand that switching between
             | different LLMs isn't particularly easy in the current
             | version. However, we will be adding these features in
             | future versions. You are most welcome to collaborate with
             | us and contribute to this project!
        
       | vinni2 wrote:
       | So only thing they open sourced is the prompts [1] and code to
       | call LLM APIs? There are plenty of such libraries out there. And
       | the prompts seem to be copied from here [2]?
       | 
       | [1] https://github.com/Libr-
       | AI/OpenFactVerification/blob/main/fa...
       | 
       | [2] https://github.com/yuxiaw/Factcheck-
       | GPT/blob/main/src/utils/...
        
         | raycat7 wrote:
         | regarding your last concern, I found that yuxiaw is their
         | COO[1], so it can't be considered a copy?
         | 
         | [1] https://www.librai.tech/team
        
           | vinni2 wrote:
           | Ok but bigger issue is there is evidence that the LLMs are
           | not better than specialized models for fact-checking.
           | https://arxiv.org/abs/2402.12147
        
             | Xudong wrote:
             | Hello vinni2, thank you for mentioning the paper. However,
             | I noticed that it hasn't gone through peer review yet.
             | Also, the paper suggests that fine-tuning may work better
             | than in-context learning, but that's not a problem. You can
             | fine-tune any LLMs like GPT-3.5 for this purpose and use
             | them with this framework. Once you have fine-tuned GPT, for
             | example, with specific data, you'll only need to modify the
             | model name (https://github.com/Libr-
             | AI/OpenFactVerification/blob/8fd1da9...). I believe this
             | approach can lead to better results than what the paper
             | suggests.
        
       | rjb7731 wrote:
       | Isn't this similar to the Deepmind paper on long form factuality
       | posted a few days ago?
       | 
       | https://arxiv.org/abs/2403.18802
       | 
       | https://github.com/google-deepmind/long-form-factuality/tree...
        
         | Xudong wrote:
         | Yes, they are similar. Actually, our initial paper was
         | presented around five months ago
         | (https://arxiv.org/abs/2311.09000). Unfortunately, our paper
         | isn't cited by the DeepMind paper, which you may see this
         | discussion as an example:
         | https://x.com/gregd_nlp/status/1773453723655696431
         | 
         | Compared with our initial version, we have mainly focused on
         | its efficiency, with a 10X faster checking process without
         | decreasing accuracy.
        
           | westurner wrote:
           | > _We further construct an open-domain document-level
           | factuality benchmark in three-level granularity: claim,
           | sentence and document_
           | 
           | A 2020 Meta paper [1] mentions FEVER [2], which was published
           | in 2018.
           | 
           | [1] "Language models as fact checkers?" (2020)
           | https://scholar.google.com/scholar?cites=3466959631133385664
           | 
           | [2] https://paperswithcode.com/dataset/fever
           | 
           | I've collected various ideas for publishing premises as
           | linked data; "#StructuredPremises" "#nbmeta"
           | https://www.google.com/search?q=%22structuredpremises%22
           | 
           | From "GenAI and erroneous medical references"
           | https://news.ycombinator.com/item?id=39497333 :
           | 
           | >> _Additional layers of these 'LLMs' could read the
           | responses and determine whether their premises are valid and
           | their logic is sound as necessary to support the presented
           | conclusion(s), and then just suggest a different citation URL
           | for the preceding text_
           | 
           | > [...] _" Find tests for this code"_
           | 
           | > _" Find citations for this bias"_
           | 
           | From https://news.ycombinator.com/item?id=38353285 :
           | 
           | > _" LLMs cannot find reasoning errors, but can correct them"
           | https://news.ycombinator.com/item?id=38353285 _
           | 
           | > _" Misalignment and [...]"_
        
       | RcouF1uZ4gsC wrote:
       | > This tool is especially useful for journalists, researchers,
       | and anyone interested in the factuality of information.
       | 
       | Sorry, I think an individual who is not only aware of reliable
       | sources to verify information, and who is not familiar enough
       | with LLMs to come up with appropriate prompts and judge output
       | should be the last person presenting themselves as the judger of
       | factual information.
        
         | Xudong wrote:
         | Thanks for your response. When discussing fact-checking
         | capabilities, the key question is always: Can we guarantee that
         | it will always offer the correct justification? While it's
         | unfortunate, errors can occur. Nonetheless, we prioritize
         | making the checking process both interpretable and transparent,
         | allowing users to understand and trust the rationale behind
         | each assessment.
         | 
         | We present the results at each step to help users understand
         | the decision process, which can be seen from our screenshot at
         | https://raw.githubusercontent.com/Libr-AI/OpenFactVerificati...
         | 
         | We will try our best to ensure this tool makes a positive
         | difference
        
       | tudorw wrote:
       | Anyone tried this?
       | https://journaliststudio.google.com/pinpoint/about
        
       | meling wrote:
       | My friend's startup: https://factiverse.ai/
        
       | Der_Einzige wrote:
       | You might want to look into integrating DebateSum or
       | OpenDebateEvidence (OpenCaseList) into this tool as sources of
       | evidence. They are uniquely good for these sorts of tasks:
       | 
       | https://huggingface.co/datasets/Hellisotherpeople/DebateSum
       | 
       | https://huggingface.co/datasets/Yusuf5/OpenCaselist
        
         | Xudong wrote:
         | Hi Der_Einzige, thanks for pointing out these two great
         | datasets! We are currently working on including customized
         | evidence sources internally and will definitely consider these
         | two datasets in the future version of this open-source project.
        
       | axegon_ wrote:
       | Overall great idea though, I'll be definitely checking it back in
       | the future. A few things that hit me out of the box:
       | 
       | * The idea behind using Serper is great, however it would be cool
       | if other search engines/data sources can be used instead, ie.
       | Kagi or some private search engine/data. Reason for the latter:
       | there are tons of people who are sourcing all sorts of
       | information which will not immediately show up on google and some
       | might never do. For context: I have roughly 60GB (and growing) of
       | cleaned news article with where I got them from and with a good
       | amount of pre-processing done on the fly(I collect those all the
       | time).
       | 
       | * Relying heavily on OpenAI. Yes, OpenAI is great but there's
       | always the thing at the back of our minds that is "where are all
       | those queries going and do we trust that shit won't hit the fan
       | some day". It would be nice to have the ability to use a local
       | LLM, given how many and how good there are around.
       | 
       | * The installation can be improved massively: setuptools +
       | entry_points + console_scripts to avoid all the hassle behind
       | having to manage dependencies, where your scripts are located and
       | all that. The cp factcheck/config/secret_dict.template
       | factcheck/config/secret_dict.py is a bit.... Uuuugh...
       | pydantic[dotenv] + .env? That would also make the containerizing
       | the application so much easier.
        
         | xyst wrote:
         | I fully expect some sort of enshittification of openai at some
         | point.
        
           | lta wrote:
           | That's assuming it's not done already with their mission of
           | being open completely forgotten
        
         | Xudong wrote:
         | Thank you for your suggestions, axegon!!! We will definitely
         | consider them and add the features in a future version shortly.
         | 
         | Regarding the first version, we are currently working on
         | enabling customized evidence retrieval, including local files.
         | Our plan is to integrate existing tools like LlamaIndex. Any
         | suggestion is greatly appreciated!
         | 
         | Regarding the second point, we have found OpenAI's JSON mode to
         | be greatly helpful, and have optimized our prompts to fully
         | utilize these advances. However, we agree that it would be
         | beneficial to enable the use of other models. As promised, we
         | will add this feature soon.
         | 
         | Lastly, we appreciate your suggestion and will work on
         | improving the installation process for the next version.
        
           | big_hacker wrote:
           | Dead internet.
        
             | antihipocrat wrote:
             | Have to agree with you, every comment from the product
             | creator reads like a chatGPT response.
        
               | Xudong wrote:
               | I will take it as a compliment, lol. But I do hope
               | ChatGPT or some agents could help me with this. Btw, our
               | recent study on machine-generated text detection might be
               | interesting to you.
               | 
               | https://arxiv.org/abs/2305.14902
               | https://arxiv.org/abs/2402.11175
        
       | swores wrote:
       | Feedback on the example gif: at the moment it's almost comically
       | useless. First you're bored watching the beginning 90% while
       | commands are slowly being typed, and then the bit that's actually
       | interesting and worth reading scrolls too fast and then resets to
       | the beginning of the gif before there's a chance to read it.
        
         | Xudong wrote:
         | Thanks for your feedback on the gif figure, swores! We will
         | revise it soon.
        
         | eMPee584 wrote:
         | mpv ftw: playback speed control even for gifs..
        
       | eeue56 wrote:
       | Interesting. In the Nordics, we have a couple of sites dedicated
       | to fact checking news stories, done by real people. I think these
       | kinds of automated tools can be helpful too, but needs to be tied
       | to reliable sources. This became pretty apparent to me with the
       | tech news coverage of xz, too. Lots of accidental (or sometimes
       | intentional?) misinformation being spread in news articles. I
       | wrote about it a bit[0], it was pretty sad to see big
       | international publishers publishing an article based entirely on
       | the journalist's misunderstandings of the situation. Facts and
       | truth is important, especially as we see gen AI furthering the
       | amount of legitimate looking content online that might not
       | actually be true.
       | 
       | [0] - https://open.substack.com/pub/thetechenabler/p/trust-in-
       | brea...
        
         | pelasaco wrote:
         | > In the Nordics, we have a couple of sites dedicated to fact
         | checking news stories, done by real people.
         | 
         | We have it everywhere. The problem is however well-known: Human
         | bias, political engagement from the fact checkers, etc.. AI
         | (without any kind of lock, political bias built-in etc) could
         | be the real deal, but because it may be not political correct,
         | it will never happen.
        
         | Xudong wrote:
         | I wholeheartedly agree on the necessity of linking fact-
         | checking tools to credible sources. Currently, our team's
         | expertise lies primarily in AI, and we find ourselves at a
         | disadvantage when it comes to pinpointing authoritative
         | sources. Acknowledging the challenges posed by the rapid spread
         | of misinformation, as highlighted by recent studies, we
         | developed this prototype to assist in information verification.
         | We recognize the value of collaboration in enhancing our tool's
         | effectiveness and invite those experienced in evaluating
         | sources to join our effort. If our project interests you and
         | you're willing to contribute, please don't hesitate to reach
         | out. We're eager to collaborate and make a positive impact
         | together.
        
       | siffland wrote:
       | When I saw Loki as the name, I instantly thought of Grafana Loki
       | for logging. I click on the GitHub and get Libr-AI and
       | OpenFactVerification.
       | 
       | I am not commenting on the actual software and I know names are
       | hard and often overlap, but with something as popular as Loki
       | already used for logging I think it might get confusing.
        
         | Xudong wrote:
         | Hi siffland! Thank you for your feedback. We understand your
         | concern about the potential confusion given the popularity of
         | Grafana Loki in the logging space. When naming our project, we
         | sought a name that encapsulates our goal of combating
         | misinformation. We chose Loki, inspired by the Norse god often
         | associated with stories and trickery, to symbolize our
         | commitment to unveiling the truth hidden within nonfactual
         | information.
         | 
         | When we named our project, we were unaware of the overlap with
         | Grafana Loki. We appreciate you bringing this to our attention!
         | I will discuss this issue with my team in the next meeting, and
         | figure out if there is a better way of solving this. If you
         | have any suggestions or thoughts on how we can better
         | differentiate our project, we would love to hear them.
         | 
         | Thank you again for your valuable input!
        
       | dekervin wrote:
       | I have a project where I take a different approach [0] . I
       | basically extract statements , explicit or implicit , that should
       | be accompanied by a reference to some data but aren't and I let
       | user find the most relevant data for those statements.
       | 
       | [0] https://datum.alwaysdata.net/
        
       | martinbaun wrote:
       | Maybe the name is not so fitting as Loki is a name in Norse
       | Mythology. Known for deceiving and lying which is basically the
       | opposite you're trying to do :)
        
         | smoyer wrote:
         | It's also the name of a well-known open-source log collection
         | system that's part of the LGTM stack (predominantly led by
         | GrafanaCloud Labs.)
        
         | croes wrote:
         | Maybe it's on purpose.
         | 
         | Who could better know the patterns of liars than the god of
         | lying.
        
       | badrunaway wrote:
       | I found it very interesting. I had this funny thought that just
       | like CAPTCHA, may be soon we will have to ask humans to give
       | their input on fact verification systems at scale.
        
       | redder23 wrote:
       | The name Loki is such a great fit! WOW!
       | 
       | This is some giant BS that is for sure. Some stupid, literally
       | brain-dead AI searching things created by humans to determine
       | what is a "fact". This is beyond dystopian crap.
       | 
       | We all know all the fact-checker orgs. used by big tech like
       | Facebook and others are filled with hyper biased woke people who
       | do not actually fact-check things but get off on having the power
       | to enforce their beliefs, feelings and biases.
       | 
       | I can already tell this is total BS without even looking into it,
       | what kinds of sources will it use? What ranking will they give
       | them? Snopes? ROFL. Probably just uses some woke infested,
       | censored and curated language model to determine a fact based on
       | what has the most matches or THE MOST LIKELY because that how AI
       | works. Has absolutely nothing to do with facts.
       | 
       | And it's even worse, we are literally in a time when AI
       | hallucinates things that do not exist. I won't use a stupid AI to
       | find me "facts".
        
       ___________________________________________________________________
       (page generated 2024-04-06 23:00 UTC)