[HN Gopher] A neat XOR trick
       ___________________________________________________________________
        
       A neat XOR trick
        
       Author : jmillikin
       Score  : 312 points
       Date   : 2022-12-11 22:03 UTC (2 days ago)
        
 (HTM) web link (www.mattkeeter.com)
 (TXT) w3m dump (www.mattkeeter.com)
        
       | random_coder wrote:
       | A neat trick indeed, using bitsets in a sliding window. Contest
       | programming makes heavy use of some such tricks.
        
       | jvanderbot wrote:
       | This is also essentially the trick applied in Kadane's Algorithm
       | for the "maximum contiguous sum" search, albeit using masks to
       | "undo" the set-of-characters check whereas Kadane's uses
       | subtraction to "undo" the addition for maximum sum check.
       | 
       | https://www.geeksforgeeks.org/largest-sum-contiguous-subarra...
        
       | ribit wrote:
       | One can improve this even further I think. First, using an
       | iterator instead of indexing into an UTF-8 string. Second, using
       | the codepoint value directly as a bitmask (why is author
       | realigning the bit mask to "a" in the first place?)
        
         | dahfizz wrote:
         | The point is that each character represents a single bit in the
         | mask, so it's easy to count the whole number of characters
         | using popcount. If you XOR in the characters directly you can
         | no longer easily count which characters are in the set.
        
         | grogers wrote:
         | The author passed the input as &[char] not &str, so indexing is
         | no problem. That's weird in its own right, but obviously not
         | the core of the post.
        
         | mytailorisrich wrote:
         | Looking at ASCII decimal values, a-z correspond to 97-122 so if
         | you want to use a 32 bit bitmap you remap 97-122 to 0-25 by
         | subtracting 97 ('a').
         | 
         | If you can use a 128 bit bitmap for the same cost then you
         | could indeed directly index by ASCII values.
         | 
         | You can also get rid of the 'if' on window size in the main
         | loop by partially unrolling it and taking those cases (start
         | and end of string) outside.
        
         | opk wrote:
         | I assume the author is realigning to get only one bit for each
         | letter. This relies on having an alphabet of 26 characters
         | where 26 is less than the 32 available in a u32. In most real-
         | world problems, you need to allow for the full ASCII or Unicode
         | range and this wouldn't be possible.
         | 
         | Using codepoints directly, there is overlap. 'f' ^ 'd' will
         | give you the same bit pattern as 'b'. You could keep an xor
         | value for each window size smaller than the full window but
         | that effectively brings back the inner loop that using xor is
         | avoiding and you could just use equality. With codepoints,
         | there may be a solution similar to a bloom filter so
         | efficiently determine whether a duplicate is possible but I've
         | not thought through that fully.
        
       | SaddledBounding wrote:
       | 'e' ^ 'f' ^ 'g' == 0x40. Uh-oh.
       | 
       | Edit: I misread the post; my bad. The post effectively uses a bit
       | vector to store the last N chars in a window, and the bit vector
       | happens to fit in a single machine word. Also, XOR happens to be
       | a good way to update the bit vector, because it turns out it's
       | sufficient to store how many times each character appears in the
       | window mod 2.
       | 
       | So to be clear, my "demonstration" above only works because the
       | ASCII representations of e, f, and g are not linearly independent
       | with respect to xor. However in the article, the representations
       | of all of the characters are chosen to be a linearly independent
       | set.
        
         | jandrese wrote:
         | The example code in the article left out the lookup table to
         | convert each character into a single bit representation. All
         | the tricky xor and popcount stuff could have been a 26 byte
         | array just as easily and been O(n).
        
           | robinhouston wrote:
           | There's no lookup table. The example code in the article does
           | the conversion using this expression:                   1 <<
           | (s[i + j] as u32 - 'a' as u32)
        
             | WorldMaker wrote:
             | Yes, for some additional explanation for those that might
             | use it, that's "ASCII math": it assumes all lowercase ASCII
             | Latin letters (which most Unicode encodings including UTF-8
             | and UTF-32 inherit). ASCII was intentionally designed so
             | that the lowercase letters a-z are in English alphabet
             | order and so subtracting 'a' from the letter gives you an
             | alphabet position number from 0-25. (The same applies in
             | ASCII for the upper case range A-Z and the numerical range
             | 0-9, though doing math _between_ ranges is less fun.)
             | 
             | Then you've just got a standard single 1 left shifted 0-25
             | positions. (So 'a' is 1 and 'z' is 1*2^25.)
        
         | quietbritishjim wrote:
         | Ah OK, the article is actually suggesting making a list of
         | Booleans whose length equals the number of possible characters.
         | It just happens that it's assuming 26 allowed characters which
         | fit in the bits of a 64 bit number.
         | 
         | The running time is does not depend on the window length, but
         | _does_ depend on the number of possible characters. If it 's
         | all of Unicode for example you'd be stuffed: you could fix the
         | vector of values in memory (currently there are approx. 150,000
         | unicode code points), even if you used a byte per value, but
         | counting number of true values will require iterating over the
         | whole vector.
         | 
         | Even just going from 26 Latin letters to 256 byte values makes
         | this trick quite messy unless your language has a really nice
         | bit vector type (admittedly many do).
         | 
         | This comment was helpful for me to understand what was going
         | on, even though it's a mistake followed by a correction.
        
       | xnorswap wrote:
       | I had considered this approach but the nature of AoC means it
       | rewards real world time to solution rather than and kind of
       | computation time, and a naive solution still runs fast enough
       | that it more than makes up in time spent programming it.
       | 
       | There's actually whole classes of problems which would be more
       | interesting if the naive solution wasn't fast enough, for example
       | even on day 7 (or was it 8?) naively exploring to every edge from
       | every square would be O(N^3) but still execute just fine.
       | 
       | I'm a couple of days behind but so far this year hasn't yet had
       | lanternfish style problems where the naive solution blows up
       | completely, but hopefully they will come as they are a lot more
       | interesting.
        
         | ladberg wrote:
         | If you aren't already familiar with it, you might find
         | projecteuler.net fun!
        
           | kadoban wrote:
           | codeforces.com or one of those is slightly more likely to be
           | fun for programmers.
           | 
           | projecteuler.net is great, but it _very_ quickly becomes deep
           | math.
        
         | n4r9 wrote:
         | Arguably day 11 part 2 is such a "lanternfish" problem,
         | although it essentially tells you to watch out for it.
        
           | WJW wrote:
           | I definitely ran into that. I read "impractically large
           | number" and thought "well the language I'm using has
           | arbitrary size integers so I should be fine". So, I changed
           | the code from 20 iterations to 10000 iterations and suddenly
           | all 16 GB of my RAM was filled with Santa's worry counters.
        
         | nraynaud wrote:
         | I love the Sunday quaterbacking of AoC, more than the
         | competition itself (I'm not even doing this year). And I agree
         | with you that a quick reminder that some problems are "naively
         | intractable" is a good wake up call once in a while.
        
       | hairtuq wrote:
       | On a related note, almost any task around shuffling or shifting
       | bits can be improved with xor. For example the straightforward
       | way to remove bit i is                   mask = -1 << i;
       | return (x & ~mask) | ((x >> 1) & mask);
       | 
       | but with xor we can do it in one less instruction:
       | mask = -1 << i;         return ((x ^ (x >> 1)) & mask) ^ x;
        
         | [deleted]
        
       | DannyBee wrote:
       | You don't need to count ones at all, because you can stop looking
       | at a window as soon as you discover it's not unique. As a result,
       | you only have to detect if the bitmask is changed from adding a
       | new character, not how many one bits there are in it. (This is
       | true in the hash-set version as well - if insert returned false
       | you can move on).                 old bit mask = current bit mask
       | current bit mask = current bit mask OR new character       if old
       | bit mask == current bit mask         window is not unique, move
       | to the next window       (otherwise window is so far unique)
        
         | meindnoch wrote:
         | And how do you remove the character leaving the window on the
         | left? I don't think this solution works the way you think it
         | does...
        
           | DannyBee wrote:
           | Yes, I wrote it too quickly you would have to xor characters
           | in and out instead. I doubt it is worth it vs popcnt at that
           | point
           | 
           | It doesn't change the fact that problem is incremental, and
           | most importantly you can early exit windows as soon as you
           | discover a single non-unique character. You don't have to
           | wait till the end of the window. They don't implement this
           | for hash set, for example.
           | 
           | Since most (n>1) windows are not unique (well, more
           | accurately the probability of the window being unique
           | decreases as the size of the window approaches the number of
           | possible characters), early exit from windows will likely
           | beat bit munging except for small windows, until you get into
           | SIMD solutions.
        
             | rep_lodsb wrote:
             | Bit operations are fast, branches are slow.
        
               | DannyBee wrote:
               | Well, yes, but most of these branches can be if-converted
               | into conditional moves anyway if you want.
               | 
               | So they are data dependencies and not control ones. There
               | are still some control ones.
               | 
               | Beyond that, let me be super clear: Imagine the following
               | six versions:
               | 
               | 1. One window at a time processing. No early exit, no bit
               | munging.
               | 
               | 2. One window at a time processing. No early exit, bit
               | munging.
               | 
               | 3. One window at a time processing. You don't use direct
               | bit munging, but you early exit the window when you hit
               | the non-unique character, and skip the window forward to
               | the first instance of that non-unique character.
               | 
               | IE given "hmma<something>", n=4, you early exit at the
               | second m, and skip processing windows forward to right
               | after the first m (since no window prior to that can be
               | unique, as they will all hit the double m)
               | 
               | 4. One window at a time processing, bit munging, same
               | otherwise as #3
               | 
               | 5. Sliding window processing, no bit munging
               | 
               | 6. Sliding window processing, bit munging.
               | 
               | The speedup between the 1-2 vs 3-6 is much greater than 5
               | vs 6 and 3 vs 4.
               | 
               | You could probably make 3 pretty darn competitive with 6,
               | and 4 could probably beat 6 with SIMD (IE processing
               | multiple windows in parallel really fast, and maybe
               | wasting work, vs guaranteeing you only do the minimum
               | work to find a unique window)
        
               | dahfizz wrote:
               | I don't think you are correct. Certainly not with a word
               | size of 4. For very large word sizes, the early return
               | will be a big win. But you are really under-rating how
               | expensive the hashmap will be vs "bit munging".
               | 
               | You could roll your own perfect hash, but you would end
               | up with a solution that looks almost identical to TFA. If
               | you use a stdlib hash, you are going to be chasing
               | pointers all over memory to do a single insert / lookup.
               | De-referencing a single pointer not in cache costs 50 -
               | 100 clock cycles on a modern system. By the time you do
               | one insert, TFA will have XOR'd at least 32 chars into
               | its bit mask. And your cache won't be as nice as in TFA,
               | slowing you down even more.
               | 
               | Given the two scenarios: 1) bit munging, no early return
               | 2) hash lookup, early return
               | 
               | I would guess scenario 1 wins until word size gets above
               | ~256. And obviously, we can add an early return to
               | scenario 1 to make it unquestionably the fastest.
        
         | rep_lodsb wrote:
         | OR wouldn't work with a sliding window since it can't be
         | inverted?
         | 
         | This should work:                 init bit mask and count of
         | bits to 0       for each new char:         old = bit mask
         | bit mask = bit mask XOR old char         if bitmask > old then
         | count++ else count--         old = bit mask         bit mask =
         | bit mask XOR new char         if bitmask > old then count++
         | else count--         if count == window length: return match
         | 
         | The idea is that each XOR will always set or clear exactly one
         | bit, so we can maintain a count of set bits.
         | 
         | But if there is a POPCNT instruction it would probably be
         | faster to use it.
        
           | DannyBee wrote:
           | My guess is, it's still faster to just check whether
           | hashset.insert returns false. Since most windows are not
           | unique early exit will likely beat faster processing.
        
             | gpderetta wrote:
             | For small windows it is possible that an early exit might
             | be counterproductive as it exercises the branch predictor.
             | 
             | One would have to test ti see where the cutoff is, but
             | often doing extra work is faster.
             | 
             | Edit: but this is discussed elsethread.
             | 
             | A fixed size run also helps with vectorization.
        
             | dahfizz wrote:
             | I doubt the hashmap would beat the XOR method from the
             | article. Hashmaps means allocations, and it means hashing.
             | Hashing is going to be at least as much work as XORing a
             | couple values in the mask. Hashmaps also means chasing
             | pointers all over memory and ruining your cache.
             | 
             | You can add an early return to the OPs XOR method by just
             | checking if the char is already in the bit mask. This will
             | be faster if the word size is large.
        
               | DannyBee wrote:
               | the domain/range is fixed, so you don't have to allocate,
               | actually, because you can have a fixed sized table and
               | perfect hash :)
               | 
               | That said, I agree you can make the bit munging faster in
               | the end, i'm just saying i don't think the speed up is
               | anywhere near the improvement from early-exiting.
        
               | rep_lodsb wrote:
               | Even if the hashmap library knew that the only valid keys
               | were 'a' .. 'z', it wouldn't be magically faster. The
               | best it could do is use basically the same code as a
               | hand-rolled implementation.
               | 
               | Bit operations and shifts take a single clock cycle, and
               | the mask can be stored in a register throughout the
               | entire loop.
               | 
               | If "early exit" brings any improvement, I don't see why
               | the best wouldn't be to combine the two solutions instead
               | of choosing one or the other.
        
               | dahfizz wrote:
               | > Even if the hashmap library knew that the only valid
               | keys were 'a' .. 'z', it wouldn't be magically faster.
               | The best it could do is use basically the same code as a
               | hand-rolled implementation.
               | 
               | This is known as a perfect hash[1]. knowing that you will
               | never have collisions does allow for a faster
               | implementation. The hash map can be backed by an array
               | which will never need to be resized, and you don't have
               | to fiddle with linked lists to chain collisions.
               | 
               | You're correct though, that this is something you will
               | have to implement yourself. Library hashmaps are going to
               | trade performance for general usefulness.
               | 
               | [1] https://en.wikipedia.org/wiki/Perfect_hash_function
        
           | meindnoch wrote:
           | You can combine the two XORs:                  new_mask =
           | old_mask XOR old_char XOR new_char        count +=
           | signum(new_mask - old_mask)
           | 
           | Where signum(x) = (x > 0) - (x < 0)
        
             | rep_lodsb wrote:
             | If the old and new char are different, two bits will have
             | changed state and comparing the old and new mask will have
             | unreliable results.
             | 
             | e.g.
             | 
             | old mask = 011100 new mask = 001111, numerically less even
             | though it has one more bit set
        
               | meindnoch wrote:
               | Dangit! I concede...
        
       | meindnoch wrote:
       | This is basically the prefix sum technique in disguise: the sum
       | over an interval is the difference between the prefix sums to the
       | endpoints. The summation can be substituted with any group
       | operation, in this case modulo-2 addition of 26-long vectors.
       | 
       | Many leetcode-type problems are amenable to O(n) speedups using
       | this technique.
        
       | hanemile wrote:
       | gamozolabs benchmarked this kind of stuff in a recent stream:
       | https://www.youtube.com/watch?v=ru97E9xA0s4
        
       | twic wrote:
       | One small thing:                       // Turn on bits as they
       | enter the window             set ^= 1 << (s[i] as u32 - 'a' as
       | u32);                  // Turn off bits as they leave the window
       | if i >= window_size {                 set ^= 1 << (s[i -
       | window_size] as u32 - 'a' as u32);             }
       | 
       | "turn on" and "turn off" actually mean "flip", right? Each bit is
       | not a present/not present flag, it's the parity of the count of
       | number of occurrences. Which still works, because of the
       | "counting how many characters appear an odd number of times in a
       | window" thing.
        
       | piersj225 wrote:
       | You can also swap two variables without using a temp variable
       | thanks to XOR
       | 
       | https://stackoverflow.com/a/19618810
        
         | rep_lodsb wrote:
         | Unless they are the same memory location!
         | 
         | http://underhanded-c.org/_page_id_16.html
         | 
         | x86 CPUs have a dedicated instruction to swap two registers, or
         | a register with memory.
        
         | eru wrote:
         | A note of caution: using that trick on modern hardware and/or
         | languages is likely to be slower than just using the temporary
         | variable. Because compilers and processors can see through the
         | latter better than the XOR.
        
       | rytis wrote:
       | I wonder how long before this question starts popping up in
       | interviews with immediate reject if one does not come up with
       | magic-xor solution in 5 minutes on the whiteboard. EDIT: neat
       | solution nonetheless!!
        
         | mafuy wrote:
         | This was basically part of a public set of Google interview
         | questions around 2009 already ;)
        
       | senderista wrote:
       | Note that you could also solve this with AND/OR, with just a
       | little more complexity. To "turn on" the bit for the new leading
       | character as you slide the window forward, you of course OR it
       | with the bitmask for the window, and to "turn off" the bit for
       | the old trailing character, you AND its complemented bit-value
       | with the bitmask. Not as elegant as XOR, but still works, and
       | shows that XOR properties aren't essential to the solution.
        
         | rep_lodsb wrote:
         | If the same character occurs twice in the window, the second OR
         | will have no effect but the AND will remove it.
         | 
         | The trick is that XOR will leave the bit set if there is an odd
         | number of occurrences, and the only way to have the bit count
         | equal the window size is if every char is unique.
        
       | DonHopkins wrote:
       | Of all the combinational logic functions, XOR is the coolest.
       | Even its name is cool! It sounds like an evil super hero, or an
       | ancient god.
       | 
       | ChatGPT did offer a useful suggestion I'd never heard of when I
       | asked it to describe The Mighty XOR:
       | 
       | >As I mentioned in my previous responses, XOR is a logical
       | operation and does not have any physical form or abilities, so it
       | cannot be a superhero. Therefore, it is not possible for a
       | character named "The Mighty XOR" to be part of the Marvel
       | universe or any other fictional universe, as they would not exist
       | in reality. If you are interested in characters from the Marvel
       | universe with powers related to digital logic or computing, you
       | might consider the character of The Calculator from DC Comics,
       | who has the ability to use his super-genius intellect to perform
       | complex calculations and hack into computer systems. However,
       | this character is not part of the Marvel universe and does not
       | have the name "The Mighty XOR".
       | 
       | https://en.wikipedia.org/wiki/Calculator_(character)
       | 
       | >Calculator (Noah Kuttler) is a supervillain appearing in
       | American comic books published by DC Comics. Originally
       | introduced as one of many villains in Batman's rogues' gallery,
       | the character was later redeveloped in the 2000s as a master
       | information broker, hacker, and tactical supervisor to other
       | supervillains, and foil to Batman's partner Oracle.
       | 
       | >[...] Calculator suffers from severe obsessive-compulsive
       | disorder, unbeknownst to his peers (even though this was hinted
       | at when he was in charge of monitoring Supergirl), and initially
       | controlled this with medication.
        
         | nerder92 wrote:
         | I ran the unoptimized version of the code into ChatGPT and he
         | was able to optimize it similarly to OPs solutions.
         | 
         | I was quite impressed, here is the ChatGPT version:
         | 
         | fn run(s: &[char], window_size: usize) -> usize { let mut
         | unique_chars = 0; for i in 0..window_size { unique_chars |= 1
         | << (s[i] as u32 - 'a' as u32); } if unique_chars.count_ones()
         | as usize == window_size { return window_size; }
         | for i in 1..s.len() - window_size {             let prev = s[i
         | - 1] as u32 - 'a' as u32;             let next = s[i +
         | window_size - 1] as u32 - 'a' as u32;             unique_chars
         | ^= 1 << prev;             unique_chars |= 1 << next;
         | if unique_chars.count_ones() as usize == window_size {
         | return i + window_size;             }         }
         | panic!("No unique window found");
         | 
         | }
         | 
         | //NOTE: my prompt was the make the code O(N)
        
         | ajnin wrote:
         | It's also overlooking that there is an actual hero named X-OR,
         | which is the French name of Space Sheriff Gavan, a somewhat
         | popular (at least in France) Japanese show of the prolific
         | "heroes in shiny suits and rubbery monsters from space" genre.
         | 
         | https://fr.m.wikipedia.org/wiki/X-Or
        
       | NathanaelRea wrote:
       | Isn't the memory just O(Unique) since the window size is bounded
       | by the max number of unique characters (in this case 26)? In the
       | general case, the hash would need to store one bit for each
       | unique element.
       | 
       | If you use a bit mask, you allocate all the memory up front. In a
       | set you allocate through runtime, but could just use set.add(char
       | - 'a') for a similar memory bound. But both need to be able to
       | store every unique element. They are both O(Unique), it just
       | happens that 26 <= num_bits(u32).
        
         | qsort wrote:
         | Yes. It's a neat trick, but from an algorithmic point of view
         | it's exactly the same as the obvious solution, replacing a set
         | with a hash-set. I did this with a counter in python:
         | def first_diff(s, n):           cnt = Counter()
         | for i, c in enumerate(s, 1):               cnt[c] += 1
         | if i > n:                   top = s[i - n - 1]
         | cnt[top] -= 1                   if cnt[top] == 0:
         | del cnt[top]               if len(cnt) == n:
         | return i
        
       | brilee wrote:
       | This trick is called zobrist hashing in chess/go programming. It
       | allows for incremental calculation of the board hash, which is
       | particularly useful because all of the game variations you're
       | exploring are all nearly identical to one another, saving a lot
       | of compute.
        
         | FartyMcFarter wrote:
         | It's reminiscent of Zobrist hashing in that it's an incremental
         | hash calculated with xor. But Zobrist hashing has the crucial
         | difference of using a random k-bit number to be xor'ed into the
         | hash (k is a constant, typically 64 these days) for each member
         | of the set that is present, instead of an N-bit string where N
         | is the number of possible elements in the set.
         | 
         | For example, in a chess engine: if the hash value for "white
         | knight on f3 square" is 0x8BADF00D and the hash value for
         | "white king on e4 square" is 0x1BADB002, then a chess board
         | containing only "white knight on f3 and white king on e4" would
         | be hashed as (0x8BADF00D xor 0x1BADB002) = 0x9000400F.
         | 
         | The upside of this trick is that if your set can contain N
         | different objects (e.g. 768 combinations of 12 different chess
         | pieces on each of 64 squares), you don't need to use an N-bit
         | hash value, which would be pretty big. In the particular
         | problem described in this blog, this isn't an advantage as it's
         | using a set containing only up to 26 letters, which neatly fits
         | in a u32 even if you dedicate a bit to each letter.
         | 
         | However there are downsides to Zobrist hashing - it's very hard
         | to guarantee that you don't get hash collisions in this manner,
         | as AFAIK you'd have to try all combinations of valid sets,
         | which is prohibitively expensive. So every algorithm relying on
         | this hash value either has to be robust to hash collisions, or
         | it accepts a small probability of failure.
         | 
         | Most importantly in this case, Zobrist hashing doesn't let you
         | test whether a particular element is present in the set, nor
         | does it let you count the number of elements in a set. So it
         | wouldn't work as a solution to the problem in this blog, which
         | requires counting how many unique letters are present in the
         | hash value.
        
         | thomasahle wrote:
         | Zobrist hashing is different. With Zobrist you have a table pd
         | random numbers. You look each key up in the table and xor the
         | values.
        
       | retrac wrote:
       | It's pretty well-known but my favourite XOR trick is XOR linked
       | lists. [1]
       | 
       | It's possible to store a doubly-linked list using only a single
       | pointer per node. For entry B, &B = &A ^ &C. You can walk the
       | list in either direction as long as you have pointers to two
       | neighbouring nodes in the list. Half as much memory overhead
       | compared to an implementation where each node stores two
       | pointers.
       | 
       | [1] https://en.wikipedia.org/wiki/XOR_linked_list
        
       | someweirdperson wrote:
       | count_ones() is a challenge on its own.
       | 
       | https://graphics.stanford.edu/~seander/bithacks.html#CountBi...
        
         | averne_ wrote:
         | You can just use __builtin_popcount or equivalent, which maps
         | to a single instruction on most platforms.
        
       | jhoechtl wrote:
       | > There's no moral to the story, other than "xor is cool"
       | 
       | The moral is knowledge is king and no matter how much iron you
       | throw at a problem a well-designed algorithm will beat it.
        
         | deathanatos wrote:
         | ...a simple histogram (the second proposed solution in TFA)
         | runs "instantly", for both parts.
         | 
         | Part of the meta-game to AoC is knowing that you can limit your
         | answer to _only_ the requirements in the combination of the
         | question and input given. If the naive solution runs fast
         | enough, and is vastly quicker to implement, that 's the one you
         | want.
        
         | mangamadaiyan wrote:
         | Cache misses, TLB flushes, and pipeline stalls would beg to
         | differ with you.
        
       | toxik wrote:
       | The true insight here is not using XOR, it's sliding the window
       | efficiently. Here is pseudocode not using XOR.
       | members = [0]*len(alphabet)              for i in 0...input
       | length:                  members[input[i-window_size]] = 0 if
       | bounds ok             members[input[i]]             = 1
       | if sum(members) == window_size:                 return i
       | 
       | If `members` is a bit vector and `sum()` is a popcount, this code
       | is equivalent.
        
       | [deleted]
        
       | ErikCorry wrote:
       | I had a very similar solution for my AoC in Toit, but I used a
       | Deque instead of just having a trailing index into the input
       | string. https://github.com/erikcorry/advent-of-
       | code-2022/blob/main/d...
       | 
       | Then a friend solved it with regular expressions, but I found a
       | sicker regex. After all, that's what regexes are about.
       | 
       | https://mobile.twitter.com/erikcorry/status/1600524753596456...
        
       | peeters wrote:
       | Note that for enumerable domains of less than 64, using a long
       | plus bit ops has been a standard Set implementation for quite a
       | while. For example, Java's EnumSet uses a long if the enum has
       | less than 64 values:
       | 
       | https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/c...
       | 
       | Where add() uses `|= bitmask`, remove() uses `&= ~bitmask`, and
       | size() uses a count of the 1's in the long.
       | 
       | Adding XOR as an efficient toggle would be interesting, but
       | unnecessary to keep this O(n), if I understand correctly. It's
       | just toggling the value, so (albeit with an extra branch), you
       | could implement it as:                   if (!set.contains(val))
       | {           set.add(val);         } else {
       | set.remove(val);         }
        
       | dentalperson wrote:
       | I've seen this use of constants inside big O notation in leetcode
       | and 'informal' discussions. Is it pedantic to say that O(NM) ==
       | O(N) if M is a constant (in this case since it's bound by 26)? Or
       | is this the current and expected usage?
        
         | tkzed49 wrote:
         | It's correct to omit constants from big O, but in the article W
         | is a distinct input variable so the usage is valid. The size of
         | the window W is bound by N, not a constant 26.
        
           | ErikCorry wrote:
           | But it's impossible for W to be bigger than 26 since you
           | can't have 27 letters in a row that are all different.
           | 
           | Normally for a search algorithm the size of the alphabet is
           | assumed to be constant.
        
       | bsza wrote:
       | W cannot be larger than the size of the character set, so even
       | the first algorithm runs in O(N) time.
       | 
       | It's a cool trick, but it didn't make the algorithm
       | asymptotically more efficient as the OP suggests.
        
         | [deleted]
        
         | TreeRingCounter wrote:
        
         | thomasahle wrote:
         | You could imagine a much larger character set though. For
         | example you might be interested in a sliding window of unique
         | words instead.
        
           | bsza wrote:
           | Yes, but then the last algorithm runs in the same time as the
           | one before it. Using xor here really only gives you a
           | constant factor improvement (which is still nice!).
           | 
           | Reminds me of the problem "find the only missing number in a
           | scrambled list of 1..n". You can get an "O(n)" solution using
           | xor, but once bit size becomes a concern, there are less
           | hacky solutions that are just as fast.
        
             | thomasahle wrote:
             | "Bit tricks" are worth a factor `log n`. Let me explain: If
             | you have an array of `n` integers in {0,1}, and you want to
             | count the number of 1s, you need `n` time. But if you have
             | an array of `n` bits, with bit level operations you can do
             | `n/W` time, where `W` is your word size. We often have
             | W=32, 64 or 256 on some architectures, but you may still
             | argue it's just a constant.
             | 
             | The thing is that we usually saw you have random access to
             | all of your input data in constant time. But if your input
             | size is `n`, your pointer size must be `log n`. So for
             | standard algorithm analysis to work, you need to allow a
             | word size of at least `log n`.
        
               | bsza wrote:
               | You're using n as a parameter of the machine's specs,
               | which is a big no-no. You can't just come up with a
               | bigger machine for each consecutive value of n. If that
               | were allowed, all algorithms would trivially be O(1).
               | 
               | We usually assume the pointer size to be infinite (e.g.
               | RAM machine), to avoid having to deal with access time.
               | But if we don't, then it must be a constant size (and we
               | have to adjust the big O formulas accordingly). In
               | neither of those cases do you get free arbitrary length
               | xor operations. Just like you don't get e.g. free
               | arbitrary length multiplications either.
               | 
               | One nice property of xor, however, is that you can
               | parallelize the heck out of it. So you can get arbitrary
               | constant factor improvements by adding a bunch of CPUs.
        
       | moloch-hai wrote:
       | The moral is that POPCNT is critical to zillions of massive
       | optimizations, and omitting it from base instruction sets, as has
       | been done over and over and then later corrected, each time at
       | great expense, is extremely foolish. The latest offender was
       | RISC-V, but the overwhelming majority of x86 code is still
       | compiled to a target version lacking it.
        
         | [deleted]
        
         | [deleted]
        
         | adrian_b wrote:
         | It is good to remember that the instruction POPCNT was
         | introduced for the first time in the instruction set of a
         | computer by Alan Turing, already in 1949, in Ferranti Mark I
         | (as "sideways add").
         | 
         | Unfortunately few other computers have included it before Cray
         | 1, which made it well known, under the current name.
        
           | js2 wrote:
           | Sideways add makes a bit of sense for the name of this
           | operation. But I can't figure out why Cray named it
           | "population count." It just seems odd to refer to the set
           | bits as part of a population that you're counting.
        
           | pklausler wrote:
           | We had it on the CDC 6600 in 1963 as 'Bi NXj' if I remember
           | rightly.
        
         | DannyBee wrote:
         | Well, actually, in this case, no, POPCNT is unnecessary here,
         | since you don't actually have to count anything. you only need
         | to know if the bitmask changed when you added a new character
         | ;)
         | 
         | But otherwise, yes.
         | 
         | I spent quite a while optimizing GCC's bitmap operations years
         | ago, including implementing some new sparse bitmap types, and
         | the sparse bitmap types i implemented ended up dependent on the
         | speed of popcnt and friends, so yes.
        
           | nebulous1 wrote:
           | it's not sufficient that the newest character in the window
           | is unique? the older characters need to be unique too.
        
             | DannyBee wrote:
             | I posted how to do it in another comment on the thread.
        
               | nebulous1 wrote:
               | If you're using OR (unlike the OP) how are you removing
               | the left side of the window? And still, even if the
               | newest character is unique, previous characters might not
               | be
        
               | rep_lodsb wrote:
               | Input: "abccdef"
               | 
               | Your algorithm will report a match at "bccd" since the
               | "a" wasn't removed from the bitmask.
        
               | DannyBee wrote:
               | Yes, you are correct you would have to xor them back out
        
               | actionfromafar wrote:
               | I swear I have seen this conversation before. A glitch?
        
               | DannyBee wrote:
               | No - he responded on both parts of the thread where i
               | commented. I started to respond here as well, then
               | updated it to redirect to the other comment so we don't
               | end up with two long threads.
        
               | nerpderp82 wrote:
               | If you XOR them together, you get back to the original
               | conversation.
        
           | syrrim wrote:
           | In practice if you want the popcount of anything larger than
           | a couple registers you'll want to use vector instructions
           | anyways though. There are a lot of operations that will speed
           | up certain applications if made into their own instruction,
           | not all of them need to be.
        
         | gpderetta wrote:
         | legend say that popcnt was the "NSA" instruction, was heavily
         | used for cryptographic analysis and that was kept out of common
         | instruction sets for a long time to give NSA an advantage.
         | 
         | It is probably just a legend though.
        
           | kjs3 wrote:
           | The usefulness of popcnt, et al, in cryptography was known at
           | least as far back as Alan Turing. It wasn't 'kept out of'
           | ISAs, if for no other reason than not all computer
           | manufacturers were (are) American, so the NSA wouldn't have
           | had much leverage to keep, say, Ferranti or Hitachi from
           | including it in their computers.
           | 
           | The legend you're _probably_ misremembering is the one where
           | the NSA approached Seymor Cray at CDC while he was designing
           | the 6600 super and  'suggested' that if he included a popcnt
           | instruction in the ISA, the NSA would certainly look
           | favorably on purchasing some. He did and they did (quite a
           | few). This story is also possibly apocryphal.
        
             | _a_a_a_ wrote:
             | > The usefulness of popcnt, et al, in cryptography was
             | known at least as far back as Alan Turing
             | 
             | Ok. Why??????????????
             | 
             | @gpderetta is correct at least in quoting hacker's delight
             | where it was also said to be rumoured the NSA wanted
             | popcount but it was unclear to HD's author why they wanted
             | it.
        
               | rep_lodsb wrote:
               | IIRC the Colossus machine built during WW2 was basically
               | counting the number of bits resulting from doing various
               | boolean operations on the input. It was used to crack the
               | Lorenz cipher, which XORed the plaintext with a pseudo-
               | random keystream (generated using mechanical rotors!).
               | 
               | Cryptography has advanced since then - and I'm not an
               | expert - but there may still be statistical weaknesses
               | that could be found by counting bits?
        
               | moloch-hai wrote:
               | I think it is generally taken as useful in, specifically,
               | cryptanalysis, presumably for estimating Hamming
               | distance.
        
         | snvzz wrote:
         | >RISC-V
         | 
         | Doesn't B have a POPCNT?
         | 
         | B is mandatory in RVA22.
        
           | moloch-hai wrote:
           | Can you buy any RISC-V embodiment that implements such an
           | instruction? It seems to be extremely optional.
        
             | Vt71fcAqt7 wrote:
             | B is pretty recent, but Sifive p650 will have it
             | 
             | https://www.sifive.com/cores/performance-p650
             | 
             | Heres another:
             | 
             | https://www.andestech.com/en/2022/12/08/andes-technology-
             | unv...
             | 
             | I expect all future Linux cores to have it
        
           | cbm-vic-20 wrote:
           | These are the cpop and cpopw instructions.
           | 
           | https://github.com/riscv/riscv-
           | bitmanip/blob/main/bitmanip/i...
        
       | iamevn wrote:
       | If ascii characters aren't guaranteed, would this solution still
       | be able to scale? I wonder how the overhead from scanning the
       | input to find the domain to use for the bitset compares to
       | overhead of allocating a bunch of sets.
       | 
       | Edit: More generally, I feel like these sorts of tricks are
       | getting less relevant over time. With the prevalence of Unicode
       | text how many clever interview question optimizations that assume
       | characters are just another way of saying u8 are no longer
       | relevant?
        
         | [deleted]
        
       | polynomial wrote:
       | I am wondering why the bit masks are Big (not Little) Endian? I'm
       | sure there's a reason the example is done that way, I just don't
       | know what it is.
        
         | mkeeter wrote:
         | I wanted to put the 1 visually close to the equal sign, so you
         | didn't have to scan all the way to the end of
         | 0000000000000000000000000000000 to find it. Perhaps this is
         | misleading, because the code places the set bit at the LSB!
        
       | ivanstojic wrote:
       | Isn't the real performance gained by eliminating the triple
       | nested for loops and memoizing over a sliding window, rather than
       | the use of XOR? You would get the same order-of-magnitude
       | improvement if you used a map combined with iterate-only-once
       | approach.
       | 
       | Ie, it's undeniable that the final program works much faster, but
       | the article should really be called "a smarter memoization
       | trick."
        
       | boredumb wrote:
       | Something about seeing fun bit tricks solving higher level
       | problems is exciting. As someone who writes higher level software
       | in general using golang/rust/python to solve "boring business"
       | problems, seeing someone using XORs to accomplish something feels
       | like leaving my house and going camping for the weekend.
        
         | foverzar wrote:
         | Right? Things like fast inverse square root feel like total
         | black magic.
        
           | boredumb wrote:
           | Very curious what you're up to on camping trips that involves
           | black magic.
        
         | timerol wrote:
         | One fun repository of logic operations like this is
         | https://graphics.stanford.edu/~seander/bithacks.html
        
         | chasil wrote:
         | This is vaguely reminiscent of the xor swap procedure, although
         | it seems more practical.
         | 
         | https://en.wikipedia.org/wiki/XOR_swap_algorithm
        
       | [deleted]
        
       | fifilura wrote:
       | I think my intuition tells me that this is analogous to the
       | convolutions you do in neural network image processing. Just that
       | this is one dimensional.
        
       | timvisee wrote:
       | I did something similar and counted from the end, to make much
       | bigger jumps!
       | 
       | https://github.com/timvisee/advent-of-code-2022/blob/master/...
       | 
       | Maybe a bit less neat in the binary sense, but definitely very
       | fast.
        
       | scrapheap wrote:
       | That's a nice low level solution the problem - I have to confess
       | that I just wrote a regular expression for it :)
        
         | commandlinefan wrote:
         | > I just wrote a regular expression
         | 
         | Running time of O(MG)
        
       | notorandit wrote:
       | Very cool! ^
        
       | dukoid wrote:
       | FTR: This can be done with "only" two loops (opposed to the
       | "naive" 3 in the article) -- without any extra data structures.
       | It's sufficient to keep track of how many characters have been
       | unique so far. For each "new" character, check whether it's
       | different from all of them. If so, increase count. Otherwise,
       | reset count to the distance to the match.                 def
       | main():         count = 1         for i in range(1, len(SIGNAL)):
       | for j in range(0, count):             if SIGNAL[i - j - 1] ==
       | SIGNAL[i]:               count = j               break
       | count = count + 1                  if count == 4:
       | print(i + 1)             break
       | 
       | If we consider the word length a constant this should be O(n).
        
         | flatline wrote:
         | That was my conclusion too, this is a super easy problem with a
         | fixed window length. The bit hashing is neat but I was too
         | confused by what seemed to be an over-complication of the
         | problem to really appreciate it. Did you and I both miss
         | something here?
        
           | dukoid wrote:
           | I don't think so. As much as I love using bit operations: in
           | this case I'd actually prefer a table of character counts for
           | a "true" O(n) solution, as bit counting isn't guaranteed to
           | be a "native" operation.
        
         | segmondy wrote:
         | The article started with 3 loops, took it down to 2 then one.
         | Why are you talking about 2 loops?
        
           | dukoid wrote:
           | The article took it down to 2 by adding an extra hashtable.
           | It's possible to go down to 2 without using an extra data
           | structure.
        
       | gigatexal wrote:
       | Can someone explain the 1 << stuff? I'm very new to bit fiddling
       | but I did follow the blog.
       | 
       | " fn run(s: &[char], window_size: usize) -> usize { let mut set =
       | 0u32; for i in 0..s.len() { // Turn on bits as they enter the
       | window set ^= 1 << (s[i] as u32 - 'a' as u32);
       | // Turn off bits as they leave the window             if i >=
       | window_size {                 set ^= 1 << (s[i - window_size] as
       | u32 - 'a' as u32);             }                  // Check the
       | current window and see if we're done             if
       | set.count_ones() as usize == window_size {                 return
       | i + 1;             }         }         panic!("No unique window
       | found");
       | 
       | } "
        
         | dahfizz wrote:
         | The "stuff" assigns an integer value to the char. 'b' - 'a' =
         | 1, for example. This assigns each char a unique integer value.
         | By shifting 1 up that many times, you assign a unique bit
         | position to each char.
        
           | alanning wrote:
           | For those new to bitshifting, here are the results of the
           | "stuff" operation that @dahfizz describes:
           | 'a' - 'a' = 0, so take 1 and shift it left 0 times:
           | 00000000000000000000000000000001 = 'a'            'b' - 'a' =
           | 1, so take 1 and shift it left 1 time:
           | 00000000000000000000000000000010 = 'b'            'c' - 'a' =
           | 2, so take 1 and shift it left 2 times:
           | 00000000000000000000000000000100 = 'c'
           | 
           | (I'm not sure why the post's author chose to use
           | 10000000000000000000000000000000 as their example for 'a'
           | rather than the above which IIUC is how the code actually
           | works.)
        
         | OJFord wrote:
         | '1 << stuff' = 'bit shift left 1 by stuff many positions'. Say
         | you have a byte, 8 bits, '1' is 00000001 - if we shift it left
         | 3 bits we have 00001000 (8).
         | 
         | 'Stuff' there is just any expression (to understand separately)
         | that evaluates to the number of positions to shift; `<<` is the
         | operator for bit-shifting (left, cf. `>>`).
         | 
         | In brief `(s[i - window_size] as u32 - 'a' as u32)` is finding
         | the character that just left the window on the left,
         | represented as a number, starting from 'a' as 0 so that all 26
         | fit inside 32 bit positions.
        
           | gigatexal wrote:
           | Thank you all who kindly took to answering my question. It
           | makes sense now!
        
         | jsd1982 wrote:
         | 1 << n sets a 1 bit at the nth bit position. << is the left
         | shift operator.
        
       | twic wrote:
       | This seems a bit overcomplicated to me.
       | 
       | Using a bitmap to track letters you've seen is a great idea. But
       | the parity and counting gives me a headache. How about you start
       | with a zero-size window at the start of the string, then
       | iteratively try to grow it by moving the end forward until it's
       | long enough, and if the character added would be a duplicate,
       | moving the start forward until it's no longer a duplicate? Sort
       | of inchworm-style movement.
       | 
       | I am not smart enough to write Rust, so here it is in Java:
       | private static int run(String input, int windowSize) {
       | int windowStart = 0; // start at the start             int
       | windowEnd = 0; // the window is initially zero-size
       | int lettersSeen = 0; // a bitmap of letters we have seen
       | while (windowEnd - windowStart < windowSize) {                 if
       | (windowEnd >= input.length()) throw new
       | IllegalArgumentException("No unique window found");
       | int letterToAdd = 1 << input.charAt(windowEnd) - 'a';
       | while ((lettersSeen & letterToAdd) != 0) {                     //
       | duplicate, slide the start until we drop the original
       | int letterToRemove = 1 << input.charAt(windowStart) - 'a';
       | assert (lettersSeen & letterToRemove) == letterToRemove; //
       | sanity check that we have seen this letter
       | lettersSeen ^= letterToRemove;                     ++windowStart;
       | }                      // add the new letter
       | assert (lettersSeen & letterToAdd) == 0; // sanity check that we
       | have seen this letter                 lettersSeen ^= letterToAdd;
       | ++windowEnd;             }                  return windowStart;
       | }
       | 
       | This is not quite optimal in terms of bit operations - i mask the
       | bitmap in the inner loop condition, but you could just test
       | letterToRemove against letterToAdd and break if they're the same.
       | I think that's a bit less readable though.
       | 
       | Note that although this has two nested loops, they don't both
       | independently range over the whole string, so this is not O(n^2).
       | The outer loop moves windowEnd over the whole string, and the
       | inner one moves windowStart over the whole string bit by bit.
       | Each index visits each position in the string at most once.
        
         | nerpderp82 wrote:
         | I asked ChatGPT for a Rust port, https://play.rust-
         | lang.org/?version=stable&mode=debug&editio...
         | 
         | I had to make zero changes, it appears to work.
        
         | feoren wrote:
         | This is clever: you let the window size vary such that you
         | never allow two copies of the same character in your window,
         | and just stop when your window size reaches the target size.
         | Seems like it should work well. But:
         | 
         | > i mask the bitmap in the inner loop condition, but you could
         | just test letterToRemove against letterToAdd and break if
         | they're the same
         | 
         | You still need to check ((lettersSeen & letterToAdd) != 0) once
         | to know whether you need to enter that loop at all. Worst case
         | (a long string of the same letter) you do this twice per letter
         | in the input anyway.
        
       | n4r9 wrote:
       | I think the claim at the end is supposed to say `O(D)` running
       | time rather than `O(N)` (or I can't see where `N` is defined).
       | 
       | Does this technique catch letters duplicated more than once?
       | 
       | In a Python 3 shell:                 >> 2 ^ 4       6       >> 2
       | ^ 4 ^ 2       4       >> 2 ^ 4 ^ 2 ^ 2       6
       | 
       | Yes I guess it does, because inputs are used up in order to turn
       | bits on and off. Nice!
        
         | Robin_Message wrote:
         | The insight is that the only way to get W bits set is for the
         | last W letters to all have been unique. Duplicate letters
         | toggle bits on and off, but never get you any extra 1 bits set.
         | 
         | If you wanted to detect V unique characters from a window of W
         | characters where V < W, this trick wouldn't work. But you could
         | still have a rolling window, it would just be a map counting
         | "how many of this character in the window", which you can
         | increment the relevant character for as it enters the window,
         | and decrement as it leaves.
        
         | mkeeter wrote:
         | > I think the claim at the end is supposed to say `O(D)`
         | running time rather than `O(N)` (or I can't see where `N` is
         | defined).
         | 
         | Good catch; this should be fixed (using N for input length and
         | W for window size consistently in the writeup).
        
       | [deleted]
        
       | thomasahle wrote:
       | One issue with this technique is when the set of characters grow
       | above 64, so you can no longer give each one a unique bit.
       | 
       | This could happen if we are instead interested in a window of
       | unique words in a long document.
       | 
       | Amazingly there is still a solution based on XOR!
       | 
       | It is a variant of Bloom Filters, which I know everyone loves,
       | but it uses XOR instead of OR to combine things.
       | 
       | Basically each word is hashed to three locations and you flip
       | those bits. Then you have to do some probabilistic analysis.
       | 
       | https://arxiv.org/abs/2211.03683
       | 
       | Edit: Actually the technique in this paper would take linear time
       | to determine uniqueness. I wonder if there's a way to do it in
       | constant time...
       | 
       | I guess you could just implement linear probing or cuckoo hashing
       | in bits. But then update time would be amortized constant only.
        
         | nynx wrote:
         | This article's technique can definitely expand into dynamically
         | allocated bitsets.
        
           | FartyMcFarter wrote:
           | In that case the complexity becomes O(N * A) where A is the
           | size of the alphabet, since at every step you have to go
           | through the whole bitset in order to count how many bits are
           | set.
           | 
           | Edit - actually it should be possible to update the count
           | incrementally, so it should still be O(N) I think.
        
           | thomasahle wrote:
           | Say we don't want to allocate memory dynamically though. We
           | just want to allocate O(W) bits at the start of the program.
           | Can you still do it?
        
         | at_compile_time wrote:
         | >One issue with this technique is when the set of characters
         | grow above 64, so you can no longer give each one a unique bit.
         | 
         | You can xor arrays of integers.
        
           | thomasahle wrote:
           | Right, but if the universe of possible characters grow much
           | larger than the window size (as it does in many
           | applications), we want a method that only pays in terms of
           | window size, not universe size.
        
             | twic wrote:
             | So you use a sparse bitmap, ie a hash set. The size of the
             | hash set is bounded to the size of the window, so this
             | should be pretty efficient.
             | 
             | If you have a huge string, a huge alphabet, and also a huge
             | window, then probabilistic techniques start to make sense.
             | But let's not run while we can successfully walk!
        
               | thomasahle wrote:
               | Sure. We can always use a hash set, but then you need to
               | use chaining or some other collision resolution policy.
               | The whole of this whole exercise is to avoid that and get
               | something really fast and truly constant time :-)
        
       | mjburgess wrote:
       | I may be missing something, but isnt the obvious answer just to
       | scan-and-compare 4 chars/time ? (Noting that != is XOR).
       | int main(void) {                  char *c = 0, *data =
       | "nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg";                  for (
       | c = data;                  !((c[1] != c[0]) &&
       | ((c[2] != c[1]) && (c[2] != c[0])) &&                 ((c[3] !=
       | c[2]) && (c[3] != c[1]) && (c[3] != c[0])));                 c++
       | );                  printf("%ld", c-data+4);         }
        
         | leipert wrote:
         | 2nd part is a window of length 14.
        
         | aarestad wrote:
         | Sure, but now do 11 at a time with a similar algorithm (which
         | was part 2 of the problem).
        
           | mjburgess wrote:
           | I've just submitted in order to see the second part, which
           | just says "now look for 14 unique".
           | 
           | I dont see the problem here, if you want a notation to
           | express it, a macro for something like,                   0
           | != 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13         1 != 0,
           | 2, 3, 4, 5, 6, 7, ...
           | 
           | is straightforward, and could be parameterised on the window
           | size.
        
             | jstanley wrote:
             | That's fine, but it's O(n) in the window size whereas the
             | xor trick is O(1).
        
               | mjburgess wrote:
               | The window size is always constant in the problem, so
               | it's O(1)
               | 
               | There shouldnt be a loop over the window size in
               | solutions to this (particular) problem
        
               | deathanatos wrote:
               | TFA's implementation is using a loop over the window size
               | to compare each character to all the other characters, to
               | see if they're unique. So, the article's O(N * W2) is
               | correct, but I agree, the window size is fixed and small;
               | it's fair to boil it down to just O(N).
               | 
               | (I never even considered doing a for loop like that to
               | determine if the window is unique; I just histogram the
               | window, and then check if all the values in the histogram
               | are 1. If yes, unique. If I had been cleverer, I wouldn't
               | rebuild the histogram as the window shifts, but there was
               | no need for that. This is essentially the next solution
               | TFA presents.)
        
               | deathanatos wrote:
               | Part 1 and part 2 are the same, only the window size
               | varies. But within a part, the window size is constant.
               | For most, it should have just meant that, if you hard-
               | coded the window size during part 1, that you need to
               | make it a parameter.
               | 
               | But for both parts, the naive solution should be O(N *
               | W), where N is the input length, and W is the window
               | size. Like a sibling says, since within a part the window
               | size is fixed, it is acceptable to call it O(N).
               | 
               | Edit: ah, I see what his solution is doing. It is
               | correct. I need to adjust my definition of naive, I
               | guess. His is O(N * W2). You can still reasonably
               | consider W2 constant, though, I think. And he gets to
               | what I would call the "naive" solution.
               | 
               | (Previously.) ~The "naive solution" in the OP does not
               | appear to be correct. Or at least, if it does work,~ it
               | appears to do far more computation that it needs to.
               | 
               | The "xor trick" is O(N), too. (There cannot be a more
               | efficient solution, as the entire input must be
               | considered in the worst case. TFA is correct that its
               | final form omits W, though.)
        
         | deathanatos wrote:
         | If you consider W the window size, and N the input size, this
         | is still O(N * W2), as TFA states. (TFA's implementation is
         | generic over W, yours is constant and essentially -funroll'd in
         | place. But it's the same, for big-O; as you mention in your
         | reply later, you could even make a macro, generic over W.)
        
       | mihaic wrote:
       | Am I the only one bothered that he didn't first optimize the
       | HashSet solution to O(N)? When sliding the window, you increase
       | the counter for the new element, decrease the counter for leaving
       | element, and update for each element how many have their counter
       | set to 1.
       | 
       | That makes a bit weaker the case for using popcount, since that
       | actually is O(W/Wordsize), which happens to be O(1) in this case.
        
         | DannyBee wrote:
         | You don't even have to do this.
         | 
         | If HashSet.insert ever returns false, the character was already
         | in the set, and the window is not unique, so you can early exit
         | and move on. If you get through the window without insert
         | returning false, it is unique. You don't need to do anything
         | else (like check the length of the hash set at the end)
        
           | TYMorningCoffee wrote:
           | I think he meant to use a hash map plus a numOfOnes counter
           | instead of hash set. With a hash map, you don't need to build
           | the entire window every loop. It's only a constant amount of
           | work per iteration, achieving O(N)
        
             | DannyBee wrote:
             | Yes. You can slide a hash map and a counter, as we went
             | through below. You can also replace the hash map with a
             | bitmask to store the set.
             | 
             | You actually don't need any extra anything at all if you
             | want, as you can just keep a single count because you don't
             | need to look back at old windows, and can keep moving
             | forward to the last non-unique point, making it O(N).
             | 
             | I'm sure this being HN, someone will come along and post
             | how to do this.
             | 
             | I already got caught out once by trying to code it too fast
             | so not gonna do it ;)
        
         | kevinwang wrote:
         | Thank you. I was wondering the same thing and I think the
         | article is overfocusing on "cool bit tricks" while this simple
         | approach works just fine.
        
           | SquareWheel wrote:
           | Without the cool bit trick, would it be worth publishing the
           | article at all? I found it a novel approach to explore.
        
       ___________________________________________________________________
       (page generated 2022-12-13 23:01 UTC)