[HN Gopher] Google Algorithm Leaked
___________________________________________________________________
Google Algorithm Leaked
Author : certifiedloud
Score : 44 points
Date : 2024-05-29 17:25 UTC (5 hours ago)
(HTM) web link (www.seroundtable.com)
(TXT) w3m dump (www.seroundtable.com)
| advisedwang wrote:
| It's not clear to me whether the leak is actually for Google
| Search or one of the products around search that isn't "Search",
| like Document Warehouse [1]. Is there anything definitive one way
| or the other in all this? Nobody seems to even questioning this
|
| [1] https://cloud.google.com/document-warehouse/docs/overview
| 9dev wrote:
| If you read the original publication on this[1], they mention
| there's a stray commit publishing the internal variant of the
| SDK intended for the actual Google warehouse database. So the
| code bases probably live close enough together for someone to
| accidentally pass the wrong folder name or something.
|
| This has been fixed, but the commit and all it's changes are
| out there--and tragically, published alongside a copy of the
| Apache 2.0 license (intended for the document warehouse API
| SDK), which officially sanctioned freely copying and using the
| code. So there is really nothing Google can do about it.
|
| [1] https://ipullrank.com/google-algo-leak
| atonse wrote:
| This looks like it's written in Elixir (the docs are using
| ExDocs, Elixir's documentation toolset).
|
| This can't possibly be the actual search index rules (which is
| probably code that's decades old, my guess is either in Python or
| Java?) - unless they rewrote all of it in the past few years?
|
| Can anyone else confirm this?
| 9dev wrote:
| It's not. Google uses a content warehouse database internally
| that holds all stored web page content, and to access this vast
| database, they have an API. The code discovered here is a
| generated SDK for Elixir for this content warehouse API.
|
| Apparently, Google had a now deprecated product (who would have
| guessed that? Consider me shocked!) that provided customers
| with a trimmed-down version of this database for their own
| purposes, but mistakenly published the internal SDK code
| instead of that intended for Google Cloud customers to GitHub.
|
| So while this doesn't directly show the search index source
| code, it describes the data schema of the index in great
| detail, so there are at least some interesting educated guesses
| on the workings of the actual index to draw from it.
| ChrisArchitect wrote:
| [dupe]
|
| Some more discussion:
| https://news.ycombinator.com/item?id=40496967
___________________________________________________________________
(page generated 2024-05-29 23:02 UTC)