[HN Gopher] Show HN: A Who is Hiring app with AI filters
       ___________________________________________________________________
        
       Show HN: A Who is Hiring app with AI filters
        
       Author : bernawil
       Score  : 51 points
       Date   : 2024-01-03 19:10 UTC (3 hours ago)
        
 (HTM) web link (bernawil.github.io)
 (TXT) w3m dump (bernawil.github.io)
        
       | afropack wrote:
       | This is cool. You should add counts to the filters and the result
       | list.
        
         | simonw wrote:
         | Here's that data with counts on the filters, via Datasette
         | Lite: https://lite.datasette.io/?install=datasette-pretty-
         | json&jso...
        
       | NickC25 wrote:
       | You should also have filters for tech-tangential jobs, such as
       | product, operations, design, etc...
       | 
       | Not everyone here is a developer!
        
         | spondylosaurus wrote:
         | Seconding this, I'd love a filter for technical writers :)
        
         | bernawil wrote:
         | will probably do on a future iteration! The current job
         | categories where jus the ones could come up with and then used
         | the openAi api to check for conformance (or not) of each post
         | to it :)
        
       | simonw wrote:
       | Here's the underlying repo: https://github.com/bernawil/hn-who-
       | is-hiring
       | 
       | This JSON file has the annotated data:
       | https://github.com/bernawil/hn-who-is-hiring/blob/main/src/H...
       | 
       | Since it's JSON on GitHub you can explore using Datasette Lite
       | like this:
       | 
       | https://lite.datasette.io/?install=datasette-pretty-json&jso...
       | 
       | Here's an example of a custom SQL query:
       | https://lite.datasette.io/?install=datasette-pretty-json&jso...
        
         | bernawil wrote:
         | Thanks for this, I didn't know about datasette. Neat!
        
           | mrkstu wrote:
           | simonw is the best kind of spammer- he brings his creation
           | into the discussion- but he does so in a way that enriches
           | the value of what is being discussed.
           | 
           | His tool [Datasette] is of course such that it often is
           | immensely handy at dissecting data in useful ways so is often
           | exactly on point for a discussion on HN...
        
             | bernawil wrote:
             | haha right it's like, how did I make it this far without
             | knowing about datasette?
        
       | bugglebeetle wrote:
       | There should be a filter for breaking California law by not
       | including salary details for companies over 15 people:
       | 
       | https://californiapayroll.com/blog/californias-new-pay-trans...
        
         | bernawil wrote:
         | hah not a bad idea at all, watch out for next months update!
        
       | minimaxir wrote:
       | What's the AI usage here? From the raw JSON data, it seems you
       | wrote a prompt to an LLM to extract structured data from the Who
       | is Hiring comments, although I am not sure if that counts as an
       | "AI filter" since the filtering criteria are explicitly defined
       | beforehand.
        
         | bernawil wrote:
         | right, I'm feeding each post to several queries to the openAi
         | API. I guess I put "AI filters" so people knew this is actually
         | curated by post content and not just a contains() filter so you
         | get posts with the text "we don't do remote!" when you select
         | the remote checkbox
        
           | superfrank wrote:
           | Not knocking the approach, but how do you do quality control
           | on the posts? Are you just spot checking? How often have you
           | found bad data?
           | 
           | I've thought about doing something similar (using ChatGPT to
           | structure and categorize unstructured data) for a different
           | project in a completely different space and I'm worried about
           | ChatGPT hallucinating things, especially when it comes to
           | numbers.
        
             | jonnycoder wrote:
             | The quality control is a good question, and one that can
             | probably be addressed using evaluation as taught by some of
             | the deeplearning.ai short courses (1).
             | 
             | I made an interactive resume ai bot on my personal website
             | and there is an instance where I can ask it "tell me about
             | your intel experience" and it added in C++ as one of the
             | languages, but that is untrue. I had done C++ at a
             | different company.
             | 
             | 1. https://www.deeplearning.ai/short-courses/
        
               | superfrank wrote:
               | Can you give more details? No offense, but I'm not going
               | to sign up for a random site to watch a video of unknown
               | quality and length.
        
               | jonnycoder wrote:
               | I posted the short courses just as answer to how to
               | address quality control. I'm not selling anything, and
               | those courses are free anyway. deeplearning.ai was
               | cofounded by Andrew Ng, who is probably the most well
               | known for his work on teaching machine learning through
               | deeplearning.ai, Coursera, Stanford, etc. He has taught
               | and influenced millions.
               | 
               | https://en.wikipedia.org/wiki/Andrew_Ng
               | 
               | In regards to "evaluation", I think these is what those
               | short courses will cover:
               | 
               | Self-Evaluation with the LLM: The idea is to use the
               | language model to generate an answer and then use the
               | same or a different model to evaluate that answer. The
               | evaluation could involve asking the model to rate the
               | answer's accuracy, coherence, relevance, or any other
               | desired metric. This self-evaluation process can be
               | automated and scaled, although it's important to be aware
               | of the limitations, as the model might inherit biases or
               | blind spots from its training data.
               | 
               | LangChain for Structured Evaluation: LangChain can be
               | used to structure this self-evaluation process. It can
               | orchestrate the flow where the LLM first generates an
               | answer and then follows a series of steps to evaluate it.
               | This might include breaking down the evaluation into
               | specific questions or tasks that the LLM must perform to
               | assess its initial response.
        
             | bernawil wrote:
             | Well to be fair, the original who is hiring post doesn't do
             | much quality control. Then, the other apps do neither.
             | Honestly, this whole thing came out just of my frustration
             | using one of those and filtering for Remote, reading the
             | text and finding out it wasn't remote at all.
             | 
             | As for quality control, there's a step for categorization
             | that returns some tags. Posts that don't match any are
             | rejected, that's kind of filters for relevancy.
        
       | PaulRobinson wrote:
       | When providing a huge list of technologies, structure them
       | somehow. Alphabetically ordered, for example - I shouldn't have
       | to Ctrl-F to find my preferred programming language.
       | 
       | Great idea, just needs some UX love.
        
         | bernawil wrote:
         | you're absolutely right, will get it on a next iteration. Just
         | for now, know you can filter by your tech on the technologies
         | list. Most people are looking for either remote or a specific
         | location, and after choosing one of those honestly there are
         | not that many posts left to sort through.
        
           | araes wrote:
           | I think that's what is being noted. That there's a huge list
           | of technologies, and it would be nice if they could be sorted
           | by some criteria (alphabetical, old->recent tech, grouped
           | (all these are Javascript based), categories (compiled,
           | interpolated, data science, graphics/imagery, ...)
           | 
           | Also, is there a way to put in prior "Who is Hiring?" dates?
           | If people keep putting out the same listing again and again
           | it would be nice to have a way to find. Totally in the Nice
           | to Have category.
        
             | bernawil wrote:
             | Gosh now I get it! thanks for the clarification, looks like
             | I didn't get the point. Totally right.
             | 
             | The search bar does filter the filters list. So if looking
             | for "rust": 1. expand the technologies list 2. type rust in
             | the search bar 3. select rust option.
             | 
             | But yes, should probably sort it though.
             | 
             | As for old posts, I just overwrite the old data and modify
             | the month label just out of laziness. I Will keep posts
             | history in future iterations!
        
       | hughdbrown wrote:
       | I have a similar-but-hacky command-line app that I put together
       | to find just Rust positions:
       | 
       | https://github.com/hughdbrown/who-is-hiring
       | 
       | It's built to be pretty fast by not pulling data it does not
       | need. Since it operates in multiple passes on stored data, it
       | would be easy to modify/add a pass to get what you want. Feel
       | free to use parts you like.
       | 
       | A couple of things I think would help:
       | 
       | - sorted attributes (too hard to go through a hundred computer
       | technologies to find Rust)
       | 
       | - multiple geographic entries for the same name (multiple entries
       | for Germany, USA, UK, Europe)
       | 
       | - ability to select a month
       | 
       | - if you are showing a static pull of the data, the ability to
       | refresh some month would be helpful
        
       | Retr0id wrote:
       | It would be nice to have negative filters, e.g. "anything that
       | doesn't mention AI"
        
       ___________________________________________________________________
       (page generated 2024-01-03 23:00 UTC)