[HN Gopher] Show HN: Search inside 15,000 pitchdeck slides
       ___________________________________________________________________
        
       Show HN: Search inside 15,000 pitchdeck slides
        
       Author : ashiban
       Score  : 47 points
       Date   : 2023-01-27 20:00 UTC (3 hours ago)
        
 (HTM) web link (www.searchthedeck.com)
 (TXT) w3m dump (www.searchthedeck.com)
        
       | gnabgib wrote:
       | Other products in this space: OpenDeck (2020-Oct-11, 47 comments,
       | 203pts)[0], PitchDeckHunt (2020-Mar-26, 3 comments, 6pts)[1],
       | BillionDollarPitchDecks (2022-03-23, 17 comments, 80pts)[2]
       | 
       | A question of where the decks where sourced and whether there's
       | rights to redistribute sometimes comes up.
       | 
       | [0]: https://news.ycombinator.com/item?id=24745542 [1]:
       | https://news.ycombinator.com/item?id=23308267 [2]:
       | https://news.ycombinator.com/item?id=30783677
        
       | scrollaway wrote:
       | What's happening with some of your slides? The text looks like
       | it's drunk.
       | 
       | https://search-the-deck-images.s3.amazonaws.com/MONZO__5.png...
        
         | ashiban wrote:
         | the source images of some of the slides starts too low
         | resolution for the upscaling algorithm to recognize/improve it
         | - so it gets all mangled up
        
       | ohyoutravel wrote:
       | This is cool, I was in a similar position when I was going to try
       | to raise some money for a potential product (which I didn't end
       | up doing...). I was thinking about putting something together
       | like this for fun out of the hundred or so of decks I downloaded
       | and had found online, but wasn't sure how to go about requesting
       | permission from all the deck creators and even managing how to
       | find them. So I didn't go through with it.
       | 
       | The fact that you were able to get permission from all these
       | people, with an order of magnitude more decks than I had is
       | astounding! Kudos, do you mind if I ask about the secret sauce to
       | how you were able to get all these deck authors to agree to let
       | you use these on your site?
        
         | ashiban wrote:
         | I aggregated from other aggregators - not the deck authors
         | directly
        
       | moralestapia wrote:
       | Are these real?
       | 
       | Some of these seems like something an AI would generate.
       | thispitchdeckdoesnotexist or whatever.
       | 
       | https://search-the-deck-images.s3.amazonaws.com/Heyday__37.p...
        
       | ashiban wrote:
       | When I was putting together the pitchdeck for our startup I
       | wanted to search for slides to learn from - but I was looking for
       | specific sections or types of startups for slide decks. I had to
       | open tens of decks and scroll through them which sucked. So I
       | decided to make a tool that would allow me to search inside the
       | decks more easily. Happy to answer questions
        
         | lming wrote:
         | Nice project for looking for pitchdeck references. Thanks for
         | building and sharing it. I am curious about the tech behind it
         | - are you doing OCR on images? The search is very responsive -
         | it's definitely not elastic search, curious what index/search
         | system are you using?
        
           | gnabgib wrote:
           | You might enjoy the blog post[0] (150GB of images, tesseract
           | OCR, 2GB of data, Algolia for search). There's a github repo
           | too[1]
           | 
           | [0]: https://www.alashiban.com/search-the-deck/ [1]:
           | https://github.com/klothoplatform/klotho
        
           | ashiban wrote:
           | Glad it helps! There are 4 key steps that I took: - Upscaling
           | (using Upscayl[0]) - OCR (using tesseract[1]) - Indexing
           | (using Algolia[2]) - Scaling the processing and running on
           | AWS (Klotho[3] - our startup)
           | 
           | I wrote a more in-depth blog post about it[4]
           | 
           | [0] https://github.com/upscayl/upscayl [1]
           | https://github.com/tesseract-ocr/tesseract [2]
           | https://www.algolia.com/ [3]
           | https://github.com/KlothoPlatform/klotho [4]
           | https://www.alashiban.com/search-the-deck/
        
       ___________________________________________________________________
       (page generated 2023-01-27 23:00 UTC)