[HN Gopher] A 5x reduction in RAM usage with Zoekt memory optimi...
___________________________________________________________________
A 5x reduction in RAM usage with Zoekt memory optimizations
Author : janisz
Score : 56 points
Date : 2021-08-19 18:38 UTC (4 hours ago)
(HTM) web link (about.sourcegraph.com)
(TXT) w3m dump (about.sourcegraph.com)
| cbm-vic-20 wrote:
| Not directly related to the linked content, but "N times less"
| never really made that much sense to me. In this case, I'm
| guessing "5x reduction" means "80% less"? Or "20% of the previous
| usage"?
| DistractionRect wrote:
| N times less usually just means it uses 1/N. So this would mean
| it uses only 20% of the memory it used originally.
| rusk wrote:
| Yeah it doesn't make sense as the x in 5x I would have
| thought repetesents a multiplication rather than (divide)
|
| It would make more sense for me to say a fifth, or 1/5
| ksec wrote:
| Yes it is the marketing speak of 80% less. And generally
| speaking works much better than percentage.
| nytgop77 wrote:
| I saw advertisements "now 20% cheaper!", while price change
| (original->new) was 30EUR->25EUR.
|
| Marketing always finds a way.. 5/30=16.6%, 5/25=20%
| tyingq wrote:
| _" We went from 1400KB of RAM per repo to 310KB with no
| measurable latency changes."_
|
| So, not exactly, but close. ~22% of previous usage.
| [deleted]
| nijaru wrote:
| Yes that is correct. It's not always the most intuitive
| language.
|
| 5 times less than 20 would be 4 or (20 * 1/5), the same as 20
| is 5 times more than 4.
| cinntaile wrote:
| If anyone else but me wonders where the name comes from, Zoekt
| means Seek.
|
| Context: "Zoekt, en gij zult spinazie eten" - Jan Eertink
|
| ("seek, and ye shall eat spinach" - My primary school teacher)
| https://github.com/google/zoekt
| Scaevolus wrote:
| Author here, ask me anything! :-)
| beff_jesos wrote:
| Is this available on the self hosted version as well now? I am
| getting RAM issues on the AWS hosted cluster.
| therealmarv wrote:
| Related: The last time I've checked a standard Ubuntu does not
| have RAM compression (ZRAM) enabled by default (unlike current
| Windows and Mac which have that for years). It helps a lot with
| programs like browsers.
| klysm wrote:
| I understand why it's opt in though, not a clear win in all
| cases.
| kevincox wrote:
| I'm surprised that they are storing Unicode characters instead of
| bytes. For example Rust's regex library works on bytes and
| unicode patterns are compiled into byte patterns which means that
| it doesn't need to worry about unicode and variable length
| characters when matching.
|
| You'd think for code where the vast majority is ASCII this would
| be a huge improvement. I guess the downside is that searches for
| emoji and other "long" characters would need to look up more
| index entries. However I would expect that due to the rarity of
| that it would be beneficial overall.
| Scaevolus wrote:
| The source code itself is stored as UTF-8, but the trigrams
| were represented as Unicode codepoints in the index. The last
| optimization packs the ASCII trigrams for efficiency.
| kevincox wrote:
| But that's my point. Why not just store bytes, they you don't
| have to worry about packing, it is always packed.
|
| If I understand correctly they are using 8 bytes for 3
| codepoints. They could instead use 3 bytes for 3 bytes. This
| would use significantly less memory and would rarely be less
| selective. (If that was a concern they could probably
| consider 4-grams instead of trigrams and still use less
| memory.)
|
| This also doesn't precude the splitting into the first 2
| characters and last 1.
___________________________________________________________________
(page generated 2021-08-19 23:00 UTC)