[HN Gopher] Show HN: I made a website that converts YT videos in...
       ___________________________________________________________________
        
       Show HN: I made a website that converts YT videos into step-by-step
       guides
        
       Hey HN,  I've been working on this side project for the past month.
       It generates a step-by-step tutorial guide for YouTube videos that
       you can follow along without watching long videos. Best suited for
       tutorial videos but can work for other videos aswell. No BS. Just
       straight to the point.  The guides are generated from pure
       transcript so you don't have to worry about it being AI. It's my
       first project as a total beginner. Something I had to do inorder to
       get out of tutorial hell.  Please let me know if you have any
       suggestions or if you face any problems or bugs. I would try to fix
       them to the best of my abilities and as soon as possible.  I would
       appreciate your feedback on this. Let me know what you think!
        
       Author : aka_sh
       Score  : 134 points
       Date   : 2024-04-21 10:52 UTC (12 hours ago)
        
 (HTM) web link (stepify.tech)
 (TXT) w3m dump (stepify.tech)
        
       | toddmorey wrote:
       | Love how the AI turned "drop a comment below" into a project
       | step:
       | 
       | "Seek feedback from stakeholders or viewers by encouraging
       | questions and comments for further engagement."
       | 
       | This is from a bathroom remodel video.
        
         | aka_sh wrote:
         | Sorry for that, I'm looking into it. The problem is for videos
         | that have no transcript. Maybe it's because i'm feeding it the
         | description of the video for now. I'll find some workaround for
         | this. Thanks!
        
           | toddmorey wrote:
           | It's not a problem! Just funny sometimes what AI does
        
           | pushfoo wrote:
           | > The problem is for videos that have no transcript.
           | 
           | Whisper or other models can help with that too, but remember
           | to preprocess to cut silence. The dataset tends to include
           | ads in the captions, which results in hallucinated in from
           | silence.
           | 
           | You could also add a transcript-evaluation step which checks
           | whether this actually looks like a step-by-step video, but
           | I'd consider skipping it for cost and efficiency. Trying to
           | be helpful by evaluating whether the video is instructions or
           | not is added complexity where bugs and strange behavior can
           | creep in.
        
           | notahacker wrote:
           | Feels like you might have to explicitly ask it not to put
           | "drop a comment below" or "like and subscribe" into the
           | instructions (or strip it from transcripts), since most
           | YouTubers who take YouTube seriously are going to ask...
        
       | toddmorey wrote:
       | This is a great & useful resource! So many guides on YouTube are
       | unfortunately padded with so much silliness and fluff. Would be
       | great to link out to time codes if possible.
        
         | aka_sh wrote:
         | Thank you! Great suggestion. I'll try adding timecodes ASAP.
        
       | ejang0 wrote:
       | This seems like something people on HN have asked for before. I
       | clicked on one Recent video about how to create a simple Flask
       | app in 5 minutes and the instructions seemed good on a cursory
       | view.
       | 
       | I tried entering a new video but I got a Heroku application
       | error. Maybe it's a limits thing.
       | 
       | When I look at the Recent videos, a lot of them are not for
       | instructions/tutorials. Perhaps people do not understand the
       | purpose of this project. Maybe they are just testing it out with
       | non-tutorial content.
       | 
       | Maybe you could add representative videos towards the top so that
       | people would get a better sense of the use of this project?
       | 
       | I don't know why this isn't more popular here. It's a good idea.
       | (Maybe it has already been implemented elsewhere?) Reading is
       | much faster than watching a video for many instruction-based
       | tasks. Good luck!
        
         | aka_sh wrote:
         | Yeah, you just said what was on my mind since I launched it.
         | The code I wrote is for tutorial videos. Non-tutorial video
         | responses are just gibberish. The representative videos on the
         | top is a great idea. I'll look into it.
         | 
         | Can you tell me more about the video you entered? Did it have a
         | transcript? How many hours long was it?
        
       | mbesto wrote:
       | Super interesting. I recently went down the DIY rabbit hole for
       | solar, electricity, etc. I tested out
       | https://stepify.tech/video/O8eVxRVwlnw and looks decent:
       | 
       | 1. It took about ~45 seconds for the page to load once I put the
       | URL in. You should have a loader on a page showing that the
       | website is "doing something" while the AI transcribes.
       | 
       | 2. It would be great to sync the chapters in the YT video with
       | the guide details.
       | 
       | 3. Even more advanced would be the specific items like _" Drill
       | holes, insert expansion bolts, and secure the inverter to the
       | wall using nuts and washers."_ showed a timestamp and thumbnail
       | with a link to the video part.
       | 
       | 4. It would be great to have a checklist functionality (maybe
       | this is the "pro version"). I often do something, get halfway and
       | then need to scrub the YT video to find the specific place where
       | he talks about the action item.
       | 
       | EDIT:
       | 
       | 5. IMO iFixit has the best "guide" formatting:
       | https://www.ifixit.com/Guide/How+to+Recover+Data+From+a+MacB...
       | if you could somehow generate this by the video, that would be
       | insanely useful.
        
         | aka_sh wrote:
         | Great suggestions! I really appreciate your feedback. I'll work
         | on implementing these as soon as possible.
        
         | sonnyw603 wrote:
         | Checkout this app called Razzl. Pretty much does what you've
         | described.
        
       | anonymouse008 wrote:
       | Whoever did this is a prankster and hilarious:
       | https://stepify.tech/video/co7KgV2edvI
       | 
       | I hope that didn't wreck your compute costs
        
         | aka_sh wrote:
         | This one really made me laugh. Good thing the website takes in
         | only transcript to produce the response. This video had none,
         | otherwise it would've been a problem hah.
        
         | mrbluecoat wrote:
         | Yeah, definitely some interesting examples:
         | https://stepify.tech/video/ikc6PUSwdK4
        
       | metadat wrote:
       | This is a brilliant and useful application of LLM technology, I'm
       | impressed.
       | 
       | One question- On the backend, is it downloading each video CC
       | (closed-caption) transcript and feeding that into a tuned prompt?
       | What happens for videos where this is missing? Asking because
       | I've noticed CC is occasionally unavailable for some YouTube
       | videos.
       | 
       | If you cared to have a fallback, a potentially interesting
       | experiment / solution for such cases is to download the video,
       | extract the audio to a WAV file, then through the audio through
       | Whisper [1] to generate the transcript. Using CPUa, it will still
       | be incredibly intensive and slow, generally not much faster than
       | real-time (e.g. a 5 minute clip will take on the order of ~5
       | minutes to complete transcription). However, with Whisper running
       | on a fancy GPU it is insanely faster, between 100-200x faster,
       | meaning even for long videos, generating the transcripts will
       | complete in only a few seconds.
       | 
       | Great job @aka_sh!
       | 
       | [1] https://github.com/openai/whisper
       | 
       | p.s. Is there any chance you'd open source your code? Or do you
       | plan to turn this into a business? The code itself is exactly a
       | huge moat, and it'd be cool to see how you did this. Cheers.
       | 
       | p.p.s. stepify.tech app is currently crashing out to a heroku
       | error page when I try to submit a YT link.
        
         | aka_sh wrote:
         | Thank you! I'm getting the transcript through an API and
         | feeding it to the GPT. For now, the fallback function for no
         | captions is just to make something out of the description of
         | the video. I really appreciate the suggestion, i'll experiment
         | around using Whisper. Regarding open source or business. I
         | don't really know about that yet. Maybe, i'll lean towards the
         | business side to cover the costs and see where this goes. And
         | sorry for the downtime! API credits ran out. It should be fixed
         | by now
        
           | metadat wrote:
           | Eek, so many typos in my comment - but the most egregious was
           | where I meant to convey the code itself is not a huge moat.
           | Even still, no worries if you don't want to give it away, I
           | totally understand.
           | 
           | Keep up the good execution.
        
           | ravenstine wrote:
           | It's epic how well that works. Even with Whisper locally,
           | most of what I throw at it becomes readable.
        
         | j45 wrote:
         | Comparing yt transcript to open whisper transcripts could be
         | interesting if it could pick up on something extra.
         | 
         | There is limited need to reinvent the wheel to process audio
         | when other things can be solved.
        
       | jghn wrote:
       | As someone who can't stand the modern trend away from text and
       | towards video, I can't praise this idea enough. The number of
       | circumstances where a video is better than text with some
       | clarifying pictures is quite small
        
         | mavamaarten wrote:
         | Yeah. The only way to find some written instructions these days
         | is searching for reddit specifically. Which I'm not a big fan
         | of, either.
         | 
         | I've had multiple instances where I had a simple issue with
         | zero decent Google results, and a YouTube result with literally
         | the exact question I had in the title. I had to sift through 12
         | minutes of "like and subscribe", a dude clicking around in
         | various screens mumbling some stuff... I would have been very
         | happy with a simple blog post
        
         | aka_sh wrote:
         | Totally agree with you on that. I hope this lives up to your
         | expectations. Thank you!
        
         | SoftTalker wrote:
         | 100% agree. Video can be helpful for supplementary
         | illustration, to show exactly how to orient parts in an
         | assembly, etc. but at the cost of (often) sitting through a lot
         | of rambling monologue that is not.
         | 
         | I haven't tried this yet but it would be helpful if each step
         | included a link to the spot in the video where that step is
         | shown, so that in case you need it it's easy to find.
        
       | plufz wrote:
       | I made something a little similar, but just as a little cli
       | script that I run locally for myself. You can input a url for a
       | YouTube video, podcast link or local audio/video file. It
       | transcribes it with whisper and outputs the full transcript in
       | one text file and I use another model to summarize it into a
       | bullet list in a separate file.
       | 
       | I so appreciate these open source/access models allowing us to
       | build these kinds of tools without having to pay and send our
       | data to openai.
        
       | makuchaku wrote:
       | Great work. A few ideas
       | 
       | 1) Speed : the site is often showing heroku errors. Seems like
       | you are running the entire processing in the request-response
       | cycle. If not already done, please try to use a queueing system
       | to perform async processing - and then let the user know when
       | their video is ready to view as steps (probably via email or
       | browser notifications). This will stop your site from crashing
       | frequently and you'll be able to scale to many users very
       | quickly.
       | 
       | 2) Please add link-backs to the specific time in the video from
       | where the step is shown.
       | 
       | Cheers!
        
         | makuchaku wrote:
         | Also, +1 to chapters as someone mentioned in the comments.
        
         | aka_sh wrote:
         | Noted! I'll will look into that. Thank you.
        
         | j45 wrote:
         | Not sure if putting the site behind cloudflare or something
         | could help.
         | 
         | Heroku just wants a bigger bill.
        
       | Terretta wrote:
       | For the "Paid" or "Pro" version, let me have a browser extension
       | that replaces ALL OF YOUTUBE with your text based breakdowns.
       | 
       | // I'm not really kidding! Because boy do I hate 15 minute videos
       | with the one CLI command you need buried like a needle in a
       | haystack. Seeing the nonsense distilled into a handful of
       | straightforward steps is so refreshing. Awesome work!
        
         | layer8 wrote:
         | You'd have to be lucky to get the correct and complete CLI
         | command from the transcript though, unless this is also doing
         | OCR, which I don't think it is.
        
         | aka_sh wrote:
         | Thank you! I'll try implementing something like that and get
         | back to you.
        
       | typpo wrote:
       | Great idea and congrats on shipping the project!
       | 
       | I'm curious if you noticed certain models worked better for
       | summarizing and converting to steps. For example, in my projects
       | I've found that Gemini outperforms "better" models like GPT for
       | similar use cases, which I guess makes sense given Google's
       | interest in summarization.
        
       | iamflimflam1 wrote:
       | I think, to be fairer to the people actually creating the
       | content, you should make a much more obvious link back the
       | original video.
        
         | aka_sh wrote:
         | I will. Could you suggest a place where it would be more
         | obvious?
        
       | cvhashim04 wrote:
       | Wow you might have done something, saved
       | 
       | How are you managing costs and offering this for free?
        
         | aka_sh wrote:
         | I am not. I'm from a 3rd world country and trust me when I say
         | I this i've burned through half of my paycheck in a few hours
         | which is like barely 3 digits.
        
       | iamflimflam1 wrote:
       | Tried it on one of my latest videos. Interesting results. My
       | video is not quite a tutorial video, so I can understand why the
       | results are not perfect. But it has invented quite a lot of
       | content...
       | 
       | https://stepify.tech/video/1-Rm0mgg2RI
       | 
       | Here's the video for reference:
       | 
       | https://www.youtube.com/watch?v=1-Rm0mgg2RI
        
         | aka_sh wrote:
         | Thank you for trying this out!
        
       | robblbobbl wrote:
       | Hilarious thank you
        
       | toddmorey wrote:
       | This is fantastic for recipe videos:
       | https://stepify.tech/video/wUFbhygzbqQ
        
         | aka_sh wrote:
         | Recipe ones are the best lol
        
       | cushychicken wrote:
       | Interesting; this is similar to an idea suggested by a Scott
       | Galloway/Section weekly email.
       | 
       | 1) record an SOP using Loom while you narrate, 2) grab a
       | transcript of your narration, 3) feed transcript into ChatGPT to
       | write list of instructions.
       | 
       | Was billed as a way to easily hand off processes to contractors
       | or subordinates.
       | 
       | This seems like a cool riff on that. Neat.
        
       | Simon_ORourke wrote:
       | I've been looking for something like this for absolutely ages. If
       | I want to figure out how to fix my cellphone, reset a warning
       | sensor on my auto dashboard or more recently install a NAS box,
       | there's always this long winded YouTube video packed full of ads.
       | Thanks for helping cut through this nonsense.
        
         | aka_sh wrote:
         | Appreciate the kind words. This really means alot
        
       | brycelarkin wrote:
       | Love the Filthy Frank survival guide!
        
       | ghoulishly wrote:
       | Heh, it did more or less what I was hoping it would for the song
       | 'How To Be A Heartbreaker':
       | https://stepify.tech/video/vKNcuTWzTVw
        
       | userbinator wrote:
       | _The guides are generated from pure transcript so you don 't have
       | to worry about it being AI._
       | 
       | That just means you have to worry about voice recognition errors
       | instead.
        
         | notahacker wrote:
         | True, but voice recognition errors typically involve an oddly-
         | out of place word or two which you can usually spot and
         | mentally correct. That's less likely to make you take the wrong
         | series of steps than a completely coherent and topic-relevant
         | "hallucinated" sentence that just happens to not be part of the
         | guide at all.
         | 
         | Edit: although in this instance the LLM pretty heavily
         | editorialises the transcript anyway...
        
       | nickjj wrote:
       | Hi,
       | 
       | Is there a way to request items that were submit get removed? Can
       | you provide a way to contact you such as an email address? There
       | wasn't one posted on your site.
       | 
       | It's just a suggestion, I mean right now anyone can submit
       | anyone's videos without their consent or ownership verification.
       | How do you plan to handle that? I'm sure there will be folks out
       | there who wouldn't feel comfortable that a site will be scraping
       | their video content attempting to generate a large network of
       | pages on 1 domain with loads of SEO terms. It provides a conflict
       | of interest with the original creators. This conflict of interest
       | is around SEO competition, reducing views from original creators
       | and then there's the other can of worms of any future plans to
       | monetize your site through subscriptions, paid features or ads
       | where you'd be profiting from the content of others without their
       | consent.
       | 
       | I posted one of my videos just to see what would happen and then
       | it created a permanently hosted page on your domain with an AI
       | generated recap of the video. I didn't realize that was going to
       | happen. There was no warning, label of how it works, TOS that I
       | agreed to or options available to make it private and there's no
       | option to delete it. I put in the URL, hit submit and that was
       | it.
       | 
       | It's nothing personal and I hope you don't see this as a
       | deterrent. I'm all for building cool things and generally openly
       | share almost everything for free (I've been blogging and making
       | videos for ~9 years and don't have a single ad on anything I ever
       | posted) but the idea of having inaccurate AI generated content
       | does rub me the wrong way.
       | 
       | > The guides are generated from pure transcript so you don't have
       | to worry about it being AI.
       | 
       | You mentioned it's generated from pure transcripts but most of
       | the phrases used aren't what was mentioned in the video. It looks
       | like a paraphrased version of it but it's also missing all of the
       | details that would allow someone to follow along.
       | 
       | Directly under the video on the page it says "This response is AI
       | generated". One one hand you say it's not AI generated but then
       | on the other hand it is.
        
       | thih9 wrote:
       | > Internal Server Error
       | 
       | > The server encountered an internal error and was unable to
       | complete your request. Either the server is overloaded or there
       | is an error in the application.
       | 
       | Hugs all around - I'd take it as a positive feedback. Congrats on
       | the launch!
        
       | pedalpete wrote:
       | I could have used this on the weekend. I was working on my car,
       | and though I had watched a few videos about removing the door,
       | and electrical connections, etc etc. I missed on some of the
       | details, or had to make a mental note of "this, then this, not
       | the other way around".
       | 
       | What I think might be a great addition is if you had a screenshot
       | for each point? Though I'm not sure how you'd figure out which
       | image would best capture the action.
        
       ___________________________________________________________________
       (page generated 2024-04-21 23:01 UTC)