[HN Gopher] Retrieval Augmented Generation for New Orleans City ...
       ___________________________________________________________________
        
       Retrieval Augmented Generation for New Orleans City Council
       Transparency
        
       Author : yhvstnpst
       Score  : 62 points
       Date   : 2024-01-03 17:39 UTC (5 hours ago)
        
 (HTM) web link (eyeonsurveillance.org)
 (TXT) w3m dump (eyeonsurveillance.org)
        
       | thecosas wrote:
       | Interesting idea for making public meetings more accessible to
       | the public. I know 3pm on a random weekday doesn't work for most
       | citizens and yet... that's when it feels like many council
       | meetings are.
       | 
       | Side note: Spotting a bit of lorem ipsum on the site which seems
       | odd.
        
       | datadrivenangel wrote:
       | The basic use case here of making videos and transcripts more
       | accessible is super valuable. Are LLMs better than full text
       | search though?
        
         | poxrud wrote:
         | For many cases yes. With llm based embeddings you get "semantic
         | search", so for example if someone searches for "pets" they
         | will most likely get results that include "dogs" and "cats".
         | This is not the case for regular text search.
        
         | blihp wrote:
         | The problem with a text search is that you have to get your
         | keywords exactly right. With LLMs you ask inexact questions
         | like 'has the topic of X ever been discussed?'[1] without
         | needing to have an exact match on X. An LLM front end which
         | could return references to the full text seems like the best
         | use of both.
         | 
         | [1] For example, your query might have had X be 'crime' and the
         | transcript would have references to multiple specific types of
         | crime such as 'muggings', 'vandalism' etc. which a full text
         | search isn't going to match. Further, with the LLM front-end
         | you could refine the query to ask about violent crime etc.
        
       | thorum wrote:
       | > Despite targeted outreach, we've noticed that community members
       | are not returning to use the tool after their initial
       | interaction. There has been minimal engagement apart from a spike
       | during a mid-November focus group. This indicates that Sawt is
       | not helpful enough yet for people to want to come back.
       | 
       | Perhaps they have trouble thinking of additional questions to
       | ask? Their first few interactions probably cover the things they
       | care most about. They get an answer and that's it. Maybe a
       | "subscribe to updates whenever this topic comes up in a meeting
       | again" feature would be useful?
        
         | araes wrote:
         | Your tool is too effective. It actually answered our questions.
         | Don't you know the money's in treating the symptoms? Partial
         | joke, yet interesting to see the jump to a result (weird when
         | they're all about bias).
         | 
         | The suggestion's a cool idea. In addition to generic warning
         | updates, you could also do "send me a summary" every time it
         | appears. Kinda RSS feed summaries. Since they're integrating
         | the news, they could also do full news coverage of the day
         | summaries. Might actually be helpful for some people. Daily
         | summary of "what happened in New Orleans yesterday?"
        
         | simonw wrote:
         | Right - my intuition is that most people don't have questions
         | that they want answered in this way very often.
         | 
         | Learning how to use this tool involves learning what kind of
         | questions can be answered by this data, and formulating those
         | questions requires a pretty in-depth knowledge of how city
         | politics actually works.
        
         | cdkmoose wrote:
         | Perhaps it's more apathy. Outside of hot-button issues, it
         | feels like many people don't care to pay attention to their
         | local government
        
       | animal_spirits wrote:
       | This is the kind of tool that makes me excited about the AI boom.
       | I can imagine this being helpful for journalists. Think of a tool
       | that you can say "Watch all the CSPAN footage and send me an
       | email when someone talks about $TOPIC"
        
         | mobiuscog wrote:
         | Good cop, bad cop
         | 
         | Think of a tool that you can say "Listen to all phone
         | conversations and notify when someone talks about $TOPIC"
        
           | linsomniac wrote:
           | "They" already have that tool, right?
        
             | selimthegrim wrote:
             | I think their Palantir contract is over and done with
        
           | happyopossum wrote:
           | "Listen to all phone conversations"
           | 
           | The people who are in a position to do this are probably in a
           | position to do anything else they want to do with said phone
           | calls. What's changing is that random joes can do this stuff
           | now.
        
         | trinsic2 wrote:
         | It feels more like it's going to make it easier to deceive
         | people. It seems like, as people rely more and more on
         | automated systems like this, it would be easy to own and change
         | the output on the fly.
        
       | 3d27 wrote:
       | How did you calculate accuracy and bias?
        
       | xrd wrote:
       | I'm really fascinated by this.
       | 
       | Attention is all you need. But not the way we here at HN expect.
       | 
       | If you investigate police statistics, you'll see that the "story"
       | about the statistics is often dictated by the availability of the
       | statistics. Go to a "safe" city and review the statistics on the
       | police department website. The availability of those statistics
       | is all over the place, and one city will claim they can't publish
       | the information from past years because of COVID-19, and the next
       | city will say that they can only publish information from the
       | past because of COVID-19. One city will claim impossible to
       | verify outcomes. And, another city will publish information which
       | will be used to prove political points by failing newspapers.
       | 
       | This feels like it could really shift attention to processing
       | information and bringing attention to when that information isn't
       | available.
        
         | onthecanposting wrote:
         | Humorous fact: half of Asher Avenue was renamed to Colonel
         | Glenn Avenue in Little Rock to confound crime reporting that
         | was linked to street names. Lies, damned lies, and
         | statistics...
        
       | MattDaEskimo wrote:
       | The described technique of RAG is not only expensive, but also
       | prone to hallucinations.
       | 
       | I would have liked more discussion on hallucinations, which is
       | the ultimate pitfall of LLMs. This is critical for discovery-
       | based public-facing chatbots.
       | 
       | I'm also very skeptical of real-world HyDE applications as they
       | depend on the underlying model to properly answer the question,
       | and can easily drift from the intention.
        
         | Der_Einzige wrote:
         | Fat citation needed on RAG being expensive. Most embeddings
         | models these days are smaller than most LLM models, and run
         | more cheaply and more effectively than the ones provided by
         | OpenAI.
         | 
         | If you mean that increasing token counts in expensive, I
         | suppose sure - but the retrieval side itself is not the cost
         | center in general.
        
           | MattDaEskimo wrote:
           | I am speaking of the OPs implementation of RAG, not in
           | general.
           | 
           | The retrieval part can be expensive if an LLM is used to
           | confirm that it is sufficient, and if it's decided to
           | continue searching if it's not.
           | 
           | OpenAIs retrieval is a perfect example. It works, but it's
           | very expensive
        
         | simonw wrote:
         | I've seen a lot of people (including people who I trust) say
         | RAG is the best current mitigation we have for hallucinations,
         | because grounding the LLM in additional context makes it much
         | less likely it will make something up as opposed to use the
         | information that has been passed to it along with the user's
         | question.
         | 
         | Have you heard differently?
        
           | simonw wrote:
           | Related: just saw this paper https://arxiv.org/abs/2401.01313
           | "A Comprehensive Survey of Hallucination Mitigation
           | Techniques in Large Language Models"
        
           | MattDaEskimo wrote:
           | RAG conceptually is the solution for hallucinations. I'm
           | being critical of the implementations used to achieve it and
           | the lack of awareness for hallucinations.
           | 
           | It's definitely not ready.... Yet.
        
           | zmmmmm wrote:
           | I spent a bit of time playing with h2ogpt, which is a popular
           | RAG framework. I gave it all our architectural documentation
           | for our software and then tried asking it questions that
           | transcended basic search (so the answer is not directly in
           | there, but you could formulate it if you put disparate parts
           | together). It started hallucinating pretty fast and told me
           | all kinds of BS about how our software works.
           | 
           | I think RAG _can_ be used in a way that eliminates or
           | drastically reduces hallucinations, but to do that you have
           | to do quite a lot of work to constrain the context and
           | structure the prompting to address very specific questions.
           | When you apply these more general frameworks they pump in
           | large amounts of context in an unstructured manner and you
           | just end right back at hallucinations again because the
           | context isn 't constrained enough.
           | 
           | So RAG is useful to me but not a silver bullet. It doesn't
           | solve the original problem of wanting all the features of an
           | LLM but without the hallucinations. It gives you some
           | targeted way to use the LLM that doesn't hallucinate but
           | misses a lot of the functionality people want.
        
       | araes wrote:
       | Total aside, yet had no idea what cell-site simulators were. Why
       | is simulating cell phones an issue? Turns out "mobile cell phone
       | interceptors". [1]
       | 
       | [1] https://www.eff.org/pages/cell-site-simulatorsimsi-catchers
       | 
       | University of Washington actually has a neat project where they
       | drove around Seattle and Milwaukee to try and do International
       | Mobile Subscriber Identity (IMSI)-Interceptor detection.
       | Basically, find cars or buildings with mobile towers that
       | intercept and redirect your cell signal through a man-in-the-
       | middle observation. [2] Notably, United States Citizenship and
       | Immigration Services building (USCIS) apparently operates one.
       | The pictures are rather pretty.
       | 
       | [2] https://seaglass.cs.washington.edu/
        
         | ajcp wrote:
         | > To covertly transmit on the same frequencies as the normal
         | cellular network, IMSI-catchers may mimic the identifying
         | properties (mcc, mnc, cell id, etc.) of legitimate cell towers.
         | [2]
         | 
         | How is this not illegal?
        
           | BobaFloutist wrote:
           | Because it's largely being done by law enforcement.
        
       | Dowwie wrote:
       | sawt: https://github.com/eye-on-surveillance/sawt
        
       | onthecanposting wrote:
       | On the surface this is a great idea, but my experience with city
       | councils and planning commissions is that real decision making is
       | done outside chambers and in private. What happens at public
       | meetings is performance art. I've never seen public discussion
       | impact decisions, even when very compelling legal arguments were
       | made. Searchable minutes are probably adequate for most purposes.
       | 
       | My own opinion is that convening in-person meetings with Robert's
       | Rules were necessary before telecommunications, but they're long
       | past their useful life.
        
         | mistrial9 wrote:
         | this may be true but is also a "gravity always wins" sort of
         | statement. Some legal venues have "sunshine rules" to
         | _mitigate_ and _dampen_ the inevitable. I have seen exactly as
         | you say, and on different councils, quite a lot of involvement.
        
       | sjkoelle wrote:
       | all time naming miss not going with ragtime
        
         | selimthegrim wrote:
         | The better one would probably be "Eye on your shoes"
        
       | Eextra953 wrote:
       | I think it would be great to augment this tool to diagram the
       | council meetings with questions, discussions, and motions. This
       | would make it easier to track what city staff presents, which
       | questions are asked by which council members, and finally how the
       | motions are made and modified for any action items. I find that
       | keeping track of all this during a meeting is very hard to do. It
       | can make your head spin.
        
       | selimthegrim wrote:
       | Thank the Lord, whoever is doing this- City Council meetings here
       | are hardly beacons of transparency as it is
        
       | mellosouls wrote:
       | The prompt seems reasonable enough but considering the groups
       | _somewhat partial_ agenda [1] and the massively loaded
       | supplementary datasets [2, 3, etc], I 'd be very cautious about
       | using this tool without significant testing for bias.
       | 
       | [1]
       | 
       | https://eyeonsurveillance.org/blog/nola-israel-connections
       | 
       | [2]
       | 
       | https://github.com/eye-on-surveillance/sawt/tree/main/packag...
       | 
       | [3]
       | 
       | https://github.com/eye-on-surveillance/sawt/blob/main/packag...
        
       ___________________________________________________________________
       (page generated 2024-01-03 23:00 UTC)