https://wakatime.com/blog/67-bots-so-many-bots Toggle navigation WakaTime Logo WakaTime * Leaderboards * Teams * Plugins * Docs * Sign Up * Log In Oh Snap! WakaTime needs JavaScript for all the pretty graphs and charts. Please enable JavaScript in your browser to view the site. All blog posts --------------------------------------------------------------------- Bots, so many Bots Oct 01, 2024 [] Alan Hamlett * Engineering 5 min read ProductHunt has over 1 million user signups. More than 60% of those are bots. How it started I've used ProductHunt since early 2014. Besides Hacker News, it was a good way to see the latest product launches in tech. Using the comments on products, I could discover similar tools or collect feedback on my own products. Lately though, I've noticed most of the comments seem to be generated by ChatGPT. A simple test To test my theory, I launched my own product on ProductHunt but with a simple LLM prompt injection in the product's description. Sure enough, almost all the comments were automated. Comments screenshot Can you spot the bots? Now we see that it's a waste of time for people launching on ProductHunt to reply to comments. Is launching on ProductHunt even worth it? If so many bots are commenting, they must be upvoting too right? Based on all the emails I receive offering votes for money, some people must be purchasing upvotes. The data To analyze upvotes, I found a publicly available list of all ProductHunt users, launches, upvotes, and comments. ProductHunt has over 1 million user sign ups, over 300 thousand launches, 2.5 million comments, and 20 million upvotes. Each product also has a daily rank, which is the score after 24 hours from midnight PDT when each new launch day starts. First place is daily rank 1. I didn't check what the lowest rank was, but some products have null rank. Maybe those were deleted, flagged, or never launched. Detecting bot accounts Detecting bots is difficult, especially with only public data. At first, I tried analyzing the times of day of user comments to find trends. For example, here's one user who signed up 677 days ago, commented 2,009 times and upvoted 4,649 launches. Definitely a power user and using some automation, but probably not a bot (and wasn't categorized as one). Comments by hour histogram real Now look at a bot user's comments. This user signed up 140 days ago, commented 173 times and upvoted 246 launches. Comments by hour histogram bot Notice how the bot comments at regular intervals and the chart looks boxy instead of smooth? However, this wasn't enough to detect bot accounts alone. I assigned each user a risk score based on many different criteria, like account activity duration, upvote patterns over time, number of upvotes shared with other bots, and content of comments. Did you know ChatGPT generated comments have a higher frequency of words like game-changer? Bot comments also contained characters not easily typeable, like em-dash, or the product's name verbatim even when it's very long or contains characters like (tm) in the name. They also commonly included the name and bio word for word from a real person's LinkedIn profile, but those people said they never created any ProductHunt account. Clustering works to an extent, but many bot accounts are thrown away after used so the bots only share one similar vote out of their many random votes. I ran some clustering, but only on small data sets because cupy and cudf haven't implemented the necessary methods to run on GPUs. If someone has more experience with this, clustering might improve bot detection. In the end, I detected over 60% of user signups to be automated bot accounts. That's a conservative number, because I didn't detect all the bots. It would be a lot easier for ProductHunt themselves to detect bot activity more accurately using insider data. Bot activity over time User signups Since 2018, there have been more bot users created than real users. Bot User Signups Comments In late 2022, bot comments really took off... around the same time ChatGPT was first widely available. The spike in 2024 is because over time bot accounts are deleted, I'm not sure if by ProductHunt or by the account owner. Newer accounts are more likely to not have been deleted, so we still have access to their comments. Bot Comments Upvotes Also in 2022, bot upvotes surpassed real votes. Bot Votes These bots form voting rings where makers pay for upvotes to increase their chances of getting into the ProductHunt newsletter. Rankings Most launches get only a few real upvotes. Since bots vote randomly to blend in, the bot trend line is smoother than the real user upvotes. Count of launches grouped by number of votes Daily rank First place launches get featured in the daily and weekly ProductHunt newsletter, so let's see how many bot upvotes the top launches receive. Launches in first place by number of bot votes Looks like 15% bot votes is a safe amount to get your product in first place for the day. Anything over 60% bot votes doesn't seem to make it to first place for some reason. Here's the same chart limited to launches after 2020, showing bots are accounting for more of the upvotes in top posts lately. Launches in first place by number of bot votes since 2020 Launches paying for upvotes probably aren't high quality products, so they often rank top 5 instead of first place: Launches in top 5 by number of bot votes since 2020 Final Thoughts I wanted to create a list of launches without the bot votes, to see if the top launch of the day changes with the bot votes are removed. However, I don't want to call out launches that didn't really pay for votes but just happen to have many upvotes from bots, and I don't want more publicity for the launches that did pay. Either way, I've spent too much time on this already so that will have to wait for a possible future blog post. Join the discussion on HN. Also check out my attempt to make this better with wonderful.dev, with my profile at wonderful.dev/alan. This article is open source, feel free to open a PR on GitHub. Tags in this article: * data-science * producthunt About WakaTime dashboard screen shot WakaTime is a collection of open source IDE plugins for insights about your programming. Categories New Features Engineering Yearly Code Stats Freelancing Tags plugins, year-end-report, integrations, dashboards, python, xcode, time tracking, devops, flask, sqlalchemy, github, eclipse, invoicing, teams, atom, haproxy, nginx, databases, vscode, goals, exporting, leaderboards, java, javascript, data-science, producthunt, case-study , secret-scanning, chatgpt, go, aws, s3, digitalocean, redis, ssdb, caching, gdpr, vim, ssl, coda, django, textmate, privacy, netbeans, howto, profiles, komodo, startups Subscribe to this blog Find us online Twitter GitHub Reddit * (c) 2024 WakaTime * Terms * Privacy * About * Blog * * * * Supported IDEs * Leaderboard * Status * Help {"post_url": "https://wakatime.com/blog/67-bots-so-many-bots", "post_title": "Bots, so many Bots"}