[HN Gopher] Show HN: Search inside 15,000 pitchdeck slides
___________________________________________________________________
Show HN: Search inside 15,000 pitchdeck slides
Author : ashiban
Score : 47 points
Date : 2023-01-27 20:00 UTC (3 hours ago)
(HTM) web link (www.searchthedeck.com)
(TXT) w3m dump (www.searchthedeck.com)
| gnabgib wrote:
| Other products in this space: OpenDeck (2020-Oct-11, 47 comments,
| 203pts)[0], PitchDeckHunt (2020-Mar-26, 3 comments, 6pts)[1],
| BillionDollarPitchDecks (2022-03-23, 17 comments, 80pts)[2]
|
| A question of where the decks where sourced and whether there's
| rights to redistribute sometimes comes up.
|
| [0]: https://news.ycombinator.com/item?id=24745542 [1]:
| https://news.ycombinator.com/item?id=23308267 [2]:
| https://news.ycombinator.com/item?id=30783677
| scrollaway wrote:
| What's happening with some of your slides? The text looks like
| it's drunk.
|
| https://search-the-deck-images.s3.amazonaws.com/MONZO__5.png...
| ashiban wrote:
| the source images of some of the slides starts too low
| resolution for the upscaling algorithm to recognize/improve it
| - so it gets all mangled up
| ohyoutravel wrote:
| This is cool, I was in a similar position when I was going to try
| to raise some money for a potential product (which I didn't end
| up doing...). I was thinking about putting something together
| like this for fun out of the hundred or so of decks I downloaded
| and had found online, but wasn't sure how to go about requesting
| permission from all the deck creators and even managing how to
| find them. So I didn't go through with it.
|
| The fact that you were able to get permission from all these
| people, with an order of magnitude more decks than I had is
| astounding! Kudos, do you mind if I ask about the secret sauce to
| how you were able to get all these deck authors to agree to let
| you use these on your site?
| ashiban wrote:
| I aggregated from other aggregators - not the deck authors
| directly
| moralestapia wrote:
| Are these real?
|
| Some of these seems like something an AI would generate.
| thispitchdeckdoesnotexist or whatever.
|
| https://search-the-deck-images.s3.amazonaws.com/Heyday__37.p...
| ashiban wrote:
| When I was putting together the pitchdeck for our startup I
| wanted to search for slides to learn from - but I was looking for
| specific sections or types of startups for slide decks. I had to
| open tens of decks and scroll through them which sucked. So I
| decided to make a tool that would allow me to search inside the
| decks more easily. Happy to answer questions
| lming wrote:
| Nice project for looking for pitchdeck references. Thanks for
| building and sharing it. I am curious about the tech behind it
| - are you doing OCR on images? The search is very responsive -
| it's definitely not elastic search, curious what index/search
| system are you using?
| gnabgib wrote:
| You might enjoy the blog post[0] (150GB of images, tesseract
| OCR, 2GB of data, Algolia for search). There's a github repo
| too[1]
|
| [0]: https://www.alashiban.com/search-the-deck/ [1]:
| https://github.com/klothoplatform/klotho
| ashiban wrote:
| Glad it helps! There are 4 key steps that I took: - Upscaling
| (using Upscayl[0]) - OCR (using tesseract[1]) - Indexing
| (using Algolia[2]) - Scaling the processing and running on
| AWS (Klotho[3] - our startup)
|
| I wrote a more in-depth blog post about it[4]
|
| [0] https://github.com/upscayl/upscayl [1]
| https://github.com/tesseract-ocr/tesseract [2]
| https://www.algolia.com/ [3]
| https://github.com/KlothoPlatform/klotho [4]
| https://www.alashiban.com/search-the-deck/
___________________________________________________________________
(page generated 2023-01-27 23:00 UTC)