[HN Gopher] On a great interview question
       ___________________________________________________________________
        
       On a great interview question
        
       Author : behdad
       Score  : 36 points
       Date   : 2023-04-12 19:54 UTC (3 hours ago)
        
 (HTM) web link (behdadesfahbod.medium.com)
 (TXT) w3m dump (behdadesfahbod.medium.com)
        
       | texuf wrote:
       | I guess I've always been confused by this question. If the
       | dictionary isn't infinitely large, and you have plenty of space,
       | why can't you put the dictionary in a Set and look up words in
       | O(1), reducing the overall complexity to O(n)? Obviously hashing
       | has its limitations, but i thought we could hand wave over that
       | part. If we're limited by space can we assume that the dictionary
       | is ordered and make the lookup in log(n) via a simple binary
       | search? Or are we obsessing over the string equality operation? I
       | feel like I'm missing something.
        
         | behdad wrote:
         | Yes, the string equality itself is o(n).
        
         | elijaht wrote:
         | Yea, I dislike it (a bit) for that reason. The "meme" in
         | leetcode interviews is that "hashmap lookup is O(1)". It does
         | feel like a bit of a gotcha to (correctly, mind you) expect an
         | answer of O(n) for the lookup.
         | 
         | That being said, I think you could get around this with careful
         | prompting (ie, ask first "what is the Big-O for dictionary
         | lookup with the respect to the number of letters" before asking
         | the overall big O). But I think there is a bit too much wiggle
         | room to make this a completely consistent question, too much
         | variance candidate to candidate.
         | 
         | I like the problem a lot, but wouldn't feel comfortable using
         | it.
        
           | texuf wrote:
           | Because the author didn't bother to explain that this is what
           | she's looking for in a well thought out blog post, I assume
           | she also didn't explain this is what she was looking for in
           | the interview, leaving a lot of candidates who didn't have
           | the same educational background floundering.
        
       | Herval_freire wrote:
       | The interviewer is not thinking logically. How does he know it's
       | a good interview problem? Let's look at the data:
       | 
       | >The fastest I had a candidate solve the entirety of the problem
       | with all bells and whistles was in 20 minutes, by a freshman from
       | the University of Waterloo, the type who did high-school
       | competitions.
       | 
       | >The most depressing failure I saw was a PhD graduate from
       | University of Toronto who could not produce working code for the
       | first section of the problem in 45 minutes.
       | 
       | >I once had a candidate with 15-years work experience give me
       | lots of attitude for asking them such a simple question, while at
       | the same time struggling at it.
       | 
       | All of this data points to the fact that this question may not be
       | good. A phd graduate and a person with 15 years of experience
       | rejected for someone who practices programming for competitions?
       | What gets me is that the data is painting an obvious picture
       | here. A very obvious picture. An obvious picture that we aren't
       | sure what's a good interview and a bad interview question.
       | 
       | But the problem is that most people when looking at this
       | completely miss it. It's not obvious to the interviewer and it's
       | not obvious to alot of people who like google style questions. We
       | literally have not much data and not much science backing any of
       | this up.
       | 
       | It's an illustration of how biased humans are and illustration of
       | how extra biased interviewing for software positions is. If
       | there's anything more unknowingly biased and then the replication
       | crisis in science it's technical interviews for companies. There
       | needs to be real feedback loops that correlate interview question
       | passing with Actual performance.
       | 
       | Google is in a good position to grab this data but I'm not sure
       | they are doing so given how they just okayed this guys gut
       | decision to go against the grain and use this question. I'm not
       | against this question, but certainly to call this great in the
       | face of controversial data that he himself gathered and listed on
       | his post is just a complete blueprint of the extent of human
       | blindness.
       | 
       | The reality of what's going on here is the person here in the
       | interview is just getting off on dominating other people with a
       | hard question. It's not on purpose but he's doing it without
       | realizing it. The blog post in itself is a bit showy. It's like
       | "I can answer this question but a phd graduate can't".
        
         | abtinf wrote:
         | As a hiring manager, I've noticed most peer hiring managers
         | vastly overestimate their ability to hire.
         | 
         | What I've seen happen is that there are two narratives they
         | flip on: I hired this great person and they are doing amazing
         | so I am amazing; I hired this person and they aren't doing
         | great so I fired them so I am amazing.
         | 
         | Of course, they also tend to be terrible at assessing who is
         | actually good/bad.
         | 
         | I call it the "arbitrary onion": layer after layer of plausible
         | claims that each turn out to be just as arbitrary as the outer
         | claim.
        
       | tptacek wrote:
       | This is a fun puzzle, but it's not clear to me how this selects
       | for people with the aptitude to perform well in any particular
       | software development job, as the number of times I've been asked
       | to determine is a string is/contains the concatenation of two
       | dictionary words is I think zero.
       | 
       | Instead of doing stuff like this, you can just give candidates
       | realistic programming challenges and have them write code. You
       | don't even need to be in the room with them; after all, you won't
       | be when they're actually doing the work.
        
         | abecedarius wrote:
         | I actually have had to solve a version of this problem on the
         | job. There was a database of tagged photos; for some reason all
         | the tag strings had had their spaces stripped. So you had to
         | efficiently find a probable chunking into words. (Of course the
         | original text wasn't always all dictionary words.)
         | 
         | OTOH re the concern for O(n^2) time when n is bounded by word
         | lengths, and when the inner loop was in already-optimized
         | string comparisons in your language's hashtable library -- I
         | think that part is very artificial.
        
         | adamwk wrote:
         | At least in my experience dynamic programming only comes up
         | when I'm studying for interviews. Of all algorithms common in
         | interview questions, I think DP has the highest ratio for
         | occurrence on whiteboards vs occurrence in most software
         | development jobs. I understand it's fun when you spot the
         | solution but I think it only really proves your candidate did
         | their homework
        
           | dekhn wrote:
           | I came to the field from a slightly different direction; I
           | never studied DP in classes (I was a bio major) but used them
           | extensively in my undergraduate research and later when
           | implementing a few toy bioinformatics algorithms (it's a real
           | joy when it works for the first time!). Even though I've
           | written them a number of times, I really doubt I'd ever be
           | able to spot a DP problem and solve it with that, without
           | some hinting. I do not think it provides useful signal unless
           | you're looking for people who will be writing algorithms.
        
           | behdad wrote:
           | You are correct in that. A few points though:
           | 
           | - While DP might be rare, memoize is quite common
           | alternative. Sticking a @functools.cache in Python for
           | example. They are functionally equivalent in many cases,
           | 
           | - I believe knowing their data-structures and algorithms is
           | the difference between an okay programmer and a good one,
           | 
           | - When interviewing entry-level software-engineers, there's
           | not much else you can ask them except for what they learned
           | in class, which is these kinds of questions.
        
             | lmarcos wrote:
             | > I believe knowing their data-structures and algorithms is
             | the difference between an okay programmer and a good one
             | 
             | For me, the difference between and okay programmer and a
             | good one is: how much Linux/Unix they know (processes, core
             | system calls, networking, sockets, awk/sed/bash). Whether
             | you are writing apis for ecommerce platforms or writing
             | custom k8s providers, Linux knowledge is a must. Knowing
             | how to implement a trie? I bet most of us have never been
             | in that situation.
        
               | bonzini wrote:
               | That depends on what kind of system you are developing.
               | For example if you're writing APIs for e-commerce
               | platforms you may need to know how to tune a container's
               | CPU and memory limits but not the difference between
               | cgroups v1 and v2.
               | 
               | Likewise, there are jobs where being able to reason on
               | complexity and data structures is important.
        
             | adamwk wrote:
             | Those are fair points and you also said you didn't expect
             | them to implement DP. I'm curious what your minimum
             | expectations are for the candidate to solve. When I ask
             | these questions it's usually about the journey more than
             | the destination (which I believe you mentioned in the
             | beginning) but I'm guessing you'd at least expect the
             | candidate to come up with the brute force solution.
        
         | lmarcos wrote:
         | But even if the job actually requires solving puzzles like that
         | one, what's the point of asking the candidate to solve it in 45
         | min? Without googling? That's what it's actually totally
         | unrealistic.
         | 
         | Give me any problem related to algorithms+ds to be solved in
         | 45mins without internet access and someone looking over my
         | shoulder and I will give you a poor solution. But give me a day
         | and internet connection and I will solve the problem in the
         | most efficient/readable/maintainable way possible.
        
           | tptacek wrote:
           | It's especially ironic that this puzzle structure selects
           | away from candidates whose ordinary method would be to look
           | up available algorithms, rather than just going with the best
           | answer they came up with off the top of their head.
        
           | bonzini wrote:
           | To be honest the first solution is literally one line of
           | code. The data structures are those included in Python,
           | there's nothing to look up. It's an important skill to take
           | some code written by others and understand precisely it's
           | performance characteristics.
           | 
           | The interview question is not just about knowing data
           | structures and is not about Behdad showing off either. It's
           | the starting point for a Socratic discussion.
        
         | behdad wrote:
         | That might have worked before ChatGPT. :)
        
           | pcwalton wrote:
           | Unfortunately, I just checked and ChatGPT gives the correct
           | (slow) answer to your question:                   def
           | is_concatenation_of_two_dict_words(word, dict_words):
           | """             Returns True if the input word is a
           | concatenation of two words in the input list of dictionary
           | words,             and False otherwise.
           | Args:             - word (str): The input string to check.
           | - dict_words (list[str]): A list of dictionary words to check
           | against.                          Returns:             -
           | (bool): True if the input word is a concatenation of two
           | words in the input list of dictionary words,             and
           | False otherwise.             """             for i in
           | range(1, len(word)):                 prefix = word[:i]
           | suffix = word[i:]                 if prefix in dict_words and
           | suffix in dict_words:                     return True
           | return False
           | 
           | In general, I don't think that LeetCode questions have any
           | particular advantage over work sample tests when LLMs are
           | involved. Your questions will end up on LeetCode, where LLMs
           | will index them and will be able to recite the answers.
        
             | espadrine wrote:
             | Bing Chat also gave this answer to part 2:
             | def is_concatenation_of_dictionary_words(s, words):
             | if not s or not words:                 return False
             | n = len(s)             dp = [False] * (n + 1)
             | dp[0] = True             for i in range(1, n + 1):
             | for j in range(i):                     if dp[j] and s[j:i]
             | in words:                         dp[i] = True
             | break             return dp[n]
             | 
             | It doesn't find the trie (or regex) solutions to parts 1
             | and 2, though. It also finds the right complexity for its
             | solution for part 1, but when asked for an O(n) solution,
             | it first repeats an equivalent algorithm, and suddenly
             | claims it is O(n) because the hash is O(1).
             | 
             | That said, I believe an engineer with the answers it gave,
             | would easily figure out from its output what the right
             | complexity is. (Figuring out the trie, may not be so easy.)
             | 
             | That said, meanwhile, ChatGPT is not yet at a point where
             | it can write out a full working repository solving a
             | nontrivial problem. It can help, but it is not autonomous;
             | and realistically, if someone can get that help to reach a
             | good solution for a test, they can do so for a salary.
        
             | sho_hn wrote:
             | > I don't think that LeetCode questions have any particular
             | advantage over work sample tests when LLMs are involved.
             | 
             | I agree. We do work sample tests, and in addition to the
             | code and docs the candidates hand in, what really matters
             | is the walkthroughs we do with them. Why did they do it
             | that way? What alternatives did they consider? What are the
             | pros and cons? What past projects did they draw on? How did
             | they research this?
             | 
             | Candidates usually enjoy this - most programmers enjoy
             | talking about a just-finished project, especially when they
             | feel good about the result - and you get to learn a lot
             | about them.
             | 
             | If someone turned in an LLM-assisted work and lied about
             | it, I doubt they'd fare well. And if they did use LLM
             | assist - could be an interesting conversation all the same.
             | What did you reject? What did you correct? Why?
        
               | Paul-Craft wrote:
               | [dead]
        
             | [deleted]
        
         | sfvisser wrote:
         | I don't think it's super relevant for the challenge to be
         | similar to the day job challenges. Maybe it's even unclear what
         | those are at the point of interviewing.
         | 
         | What is important is that the candidate's skills generalize
         | well to both the challenge in the interview as well as the day
         | job. A question like this might be a good indicator or not
         | depending on the job.
         | 
         | If people completely fail at this question it can reveal a lot
         | of relevant information.
        
         | sethammons wrote:
         | "work sample test"
         | 
         | I've given this same advice for _years_ for designing technical
         | interviews. Take a problem that you or your team has actually
         | faced in the last n months. Distill it down to remove tribal
         | knowledge and get it to something you or a capable colleague
         | can hammer out in m minutes. Give the candidate 3m minutes to
         | solve it. Typically this means it should take you 15 minutes to
         | solve and give them 45 minutes.
         | 
         | Our biggest hit for a question that followed this for an SRE
         | type role: we had a cron job that was legacy and buggy and
         | caused a couple of issues over 6 months. We went back and
         | created a very similar script with very similar bugs, added
         | some similar data and re-seeding scripts, and told the
         | candidates "customers say that feature $x is not working, and
         | that is ran by this script. Here is how to run it, here is the
         | user's id that can be found in the data seed." We added some
         | other data validation problems. After they fixed it, we asked
         | them what they would do to make this more production safe and
         | dig in on what they have and have not done for improving
         | existing systems. We expected discussions on structured
         | logging, metrics, alerting, testing, code layout, etc.
         | 
         | Over time, the roles that you are looking to fill and the
         | weaknesses on the existing team that you want to shore up will
         | change and you will need different interview questions. Maybe
         | more towards distributed systems or more towards testing or
         | observability.
        
         | OkayPhysicist wrote:
         | I suspect it's less useful as a "this person will excel" test
         | and more useful as a "this person won't fail miserably" test.
         | Which for most hiring decisions is the more important factor. I
         | would under no circumstances hire someone who couldn't at least
         | get the brute-force level solutions and talk intelligently
         | about performance concerns.
         | 
         | IMO, the disdain from LeetCode from developers vs. its
         | continued use disconnect stems from the fact that we developers
         | tend to assume that the goal of the hiring process is to find
         | the very best possible candidate, whereas the goal in reality
         | is to find somebody "good enough" as efficiently as possible.
        
           | activitypea wrote:
           | The disdain is fueled by companies' behavior. Your approach
           | makes sense, but if that were the case, then a DSA whiteboard
           | interview should be a fail/pass -- a company shouldn't prefer
           | a candidate that managed to cook up an O(n log n) solution
           | over a candidate that only got a solution in O(n2). In my
           | experience, no company doing whiteboard interviews works like
           | that. As OP themselves said in another comment, "it's not a
           | binary yes/no, but gives a range to score the candidate"
           | 
           | At the end of the day, you can't test for negatives and just
           | pick a random person that doesn't fail anything. You need to
           | have a defined set of values to look for in a candidate, and
           | to design an interview that tests for those values. That's
           | why I hate DSAs -- for most (all?) companies, they're treated
           | as dogma rather than a pragmatic solution to a difficult
           | problem.
        
           | lisasays wrote:
           | _I suspect it 's less useful as a "this person will excel"
           | test and more useful as a "this person won't fail miserably"
           | test._
           | 
           | Unfortunately the OP seems to be using it in the former sense
           | (when applied to the tries solution that he things will
           | separate the "good" candidates from the plodding lunkheads).
        
           | Paul-Craft wrote:
           | [dead]
        
           | kolbe wrote:
           | "it's less useful as a "this person will excel" test and more
           | useful as a "this person won't fail miserably" test."
           | 
           | This is also an unproven statement with dubious validity.
           | 
           | I come from the trading world where the leetcode equivalent
           | is a brain teaser. After much deliberation, the only thing we
           | could think of for why a brain teaser was useful was that it
           | often proved someone wanted the job so badly that they
           | memorized all the brain teasers. And that type of dedication
           | and drive is what we really wanted.
           | 
           | To me, I don't understand why someone who develops with
           | advanced IDEs, using complex tools and libraries, is in the
           | least bit judged by putting them in front of what is
           | essentially a text editor, and asking them to solve a problem
           | that's of no relevance to their job, using no tools that
           | they're comfortable with
        
       | agolio wrote:
       | > I then push them on this until they realize that any
       | implementation of the dictionary lookup function of a string
       | would be o(n) at best, and as such the total runtime of the
       | solution is O(n2).
       | 
       | Is that true?
       | 
       | I am actually siding towards the candidates on this one
       | 
       | presumably it would be O(n_words_in_dictionary) for the
       | dictionary check, which as the number of words in the dictionary
       | is constant, would make the overall algorithm O(n) ...
        
         | behdad wrote:
         | You still need to query every byte of the input string...
         | 
         | Any better than that you can do would be O(min(n, k)) where k
         | is the length of the longest word in the dictionary. But
         | without loss of generality that would be O(n).
        
           | light_hue_1 wrote:
           | I don't think so.
           | 
           | You take the first n bytes of your input string. You hash
           | them. You look them up in the dictionary. The lookup is O(1)
           | but hashing your input is O(n).
           | 
           | But there's a simple way around this.
           | 
           | We have incremental hash functions. When you add a byte to
           | the string and you use an incremental hash, you do a fixed
           | amount of extra work to update the hash. Like that you only
           | need to O(1) extra work to compute the new hash.
           | 
           | One trivial incremental hash is to break your input into
           | fixed size chunks, hash each chunk, and xor the chunks
           | together.
        
             | behdad wrote:
             | That's the same as the trie solution.
        
           | abtinf wrote:
           | You only need to do that if the length of words is unbound,
           | which would be nonsensical.
           | 
           | The maximum number of iterations is length of the largest
           | word in the dictionary.
        
           | agolio wrote:
           | Good point, though I guess the point of the disagreement is
           | then for me that the question's "dictionary words" implies to
           | me an english dictionary, rather than an abstract dictionary
           | of words of potentially infinite length
        
       | sverona wrote:
       | > I then push them on this until they realize that any
       | implementation of the dictionary lookup function of a string
       | would be o(n) at best, and as such the total runtime of the
       | solution is O(n2).
       | 
       | Is n the length of the word or the length of the dictionary? And
       | if it's the length of the word (<< the length of the dictionary),
       | why does it matter if it's linear or quadratic?
        
         | behdad wrote:
         | n is the length of the string. As for why it matters whether
         | it's linear or quadratic, we are assessing the candidates
         | ability to analyze a problem and possibly improve an algorithm.
         | It might not matter in the problem at hand.
        
           | sverona wrote:
           | Okay, yeah, that's fair.
        
           | edflsafoiewq wrote:
           | Surely there is a longest word in the dictionary, so you can
           | short circuit for any string longer than twice that. So an
           | asymptotic analysis in the length of the string doesn't
           | really make sense.
        
       | Ensorceled wrote:
       | A question I used for a number of years was a simple
       | "compression" algorithm that would turn a string of lower case
       | letters into a shorter string by length encoding using the
       | capital equivalent:
       | 
       | abbbbbcdddddde => aB5cD6e
       | 
       | The number of developers who struggled or utterly failed with
       | this was depressing.
        
       | returningfory2 wrote:
       | This is very interesting, thank you for sharing! I do have one
       | tangential comment based on this:
       | 
       | > I continued to use [the interview question] because I got good
       | signal from the candidates.
       | 
       | Working at Google, this is something I hear a lot: that some
       | question, or other interview thing, is good for getting "signal"
       | from candidates. But how do you actually know?
       | 
       | Like, I feel to definitely know a question is good you would
       | actually need to track candidates after they start working and
       | see if their performance on the question correlates with their
       | job performance. (Unfortunately, it seems impossible to track
       | candidates who don't get an offer.)
        
         | behdad wrote:
         | Thanks for the comment. By saying "I get good signal" I mean
         | that it's not a binary yes/no, but gives a range to score the
         | candidate. You are right about actual correlation to job
         | performance. I know the HR at Google performed correlation
         | analysis between interview scores and job performance and made
         | internal observations, which as you note is biased because only
         | includes hired candidates.
        
         | gwbas1c wrote:
         | I find that "good signal" is understanding a candidate's
         | thought process and ability to intuit general programming
         | concepts.
         | 
         | These are hard to do if you think of an interview question like
         | a high school quiz or a college final.
        
       | abtinf wrote:
       | There are a finite number of words in the dictionary, which
       | implies there is an upper bound in the length of words.
       | 
       | The maximum number of iterations is the length of the longest
       | word in the dictionary.
       | 
       | This means that part 1 of the problem can be solved in O(1)+O(1).
       | 
       | That doesn't mean you will be happy with it. Not all O(1)s are
       | equal.
        
         | hhmc wrote:
         | Similarly you can precompute a dictionary for the Cartesian
         | product of the dictionary with itself, then lookup into that.
         | 
         | Big in space but O(1) (wrt word len at least)
        
           | abtinf wrote:
           | Excellent point.
           | 
           | The only way to evaluate if this is a good or bad solution is
           | the actual operational context.
           | 
           | If you had a service that had to do millions of these matches
           | per second with low latency, then this might be a reasonable
           | solution.
        
       | languagehacker wrote:
       | I'm a big fan of trie-based interview questions because if an
       | individual can call them out immediately, they know their data
       | structures -- awesome.
       | 
       | If they only know simple data structures, going from a hash to a
       | hash of hashes is one of those "eureka" moments that can really
       | shows how a candidate is willing to collaborate on a problem and
       | react to new ideas.
       | 
       | Heaps are an interesting data structure to use for interviewing,
       | but I do find they can be a little less intuitive, and their use
       | cases translate less directly to business logic and are more of
       | an optimization task. It depends on what talents you're looking
       | for on your team!
        
       | jhp123 wrote:
       | compiling the dictionary into a DFA would give you O(n) runtime.
        
         | lightbendover wrote:
         | Both the author's and your run-times pretty plainly illustrate
         | everything wrong with BigO.
        
           | jhp123 wrote:
           | ?
        
         | behdad wrote:
         | How?
        
           | jhp123 wrote:
           | use a regex like (aardvark|apple|...|zebra)* and the standard
           | DFA construction for regular expressions.
        
             | behdad wrote:
             | You are technically right. Though if we assume that the
             | size of the dictionary is at least o(n), then the size of
             | such DFA will be exponential in n, and indexing it will be
             | o(n), resulting in a o(n^2) solution again, I think.
        
               | [deleted]
        
               | ghusbands wrote:
               | They are completely correct. If the DFA fits in RAM,
               | following state transitions will be O(1) and using the
               | DFA for a concatenated string of length N will be O(N).
               | You simply missed a good solution.
        
               | jhp123 wrote:
               | The number of states is exponential for a general regex.
               | This regex has an NFA closely resembling the trie for the
               | dictionary. There could be at most one live state per
               | depth in the trie. So the number of possible states is
               | bounded at least by (words in dictionary)^(length of
               | longest word). Really much smaller because the states at
               | different levels have to share prefixes.
        
             | bonzini wrote:
             | Without the *, you get a graph that is a compressed version
             | of the trie. With the *, the size of the DFA is extremely
             | likely to explode.
        
       | behdad wrote:
       | Between 2010 and 2019 I interviewed dozens of Software Engineer
       | candidates at Google. Almost always I asked the same interview
       | question. Moreover, this question happened to be on the banned
       | list at Google, because it was publicly available on Glassdoor
       | and other interview websites, but I continued to use it because I
       | got good signal from the candidates.
        
         | foobarian wrote:
         | I stuck to the same question for nearly a decade too, even if
         | it was barely more difficult than fizzbuzz. The ability to
         | compare against the past candidates was very useful, which
         | would not be possible when changing questions too often.
         | 
         | The sad reality was that it was waaaay too effective as a
         | fizzbuzz eliminator (> 80%) which I don't know what it tells me
         | about the candidates in general or just about our recruiting
         | :-)
        
           | yamtaddle wrote:
           | > The sad reality was that it was waaaay too effective as a
           | fizzbuzz eliminator (> 80%) which I don't know what it tells
           | me about the candidates in general or just about our
           | recruiting :-)
           | 
           | I suspect there are a lot more people who frequently choke on
           | easy coding questions, under the very specific conditions of
           | interview-stress (which is very different from, "the client
           | is unhappy and is also onsite and you'll be in a meeting with
           | them later" stress, and from "production broke" stress, so
           | no, I don't think it's a good proxy for general "grace under
           | fire", as it were, either) than there are actual full-on
           | bullshitters who manage to land software jobs while being
           | _truly_ incapable of doing even the basics of the job.
           | 
           | For one thing, I think such top-tier, extremely-convincing
           | bullshitters would have an easier and less-stressful time
           | applying that skill directly to some other role, since there
           | are some well-paid roles where that kind of thing is exactly
           | what companies want, so it doesn't pass the smell test for me
           | that so _very_ many would be trying to land software jobs--
           | not just ones who exaggerate skill or try to get into the
           | industry while woefully unprepared, but who _also_ are able
           | to fool interviewers at a reasonably-high rate in
           | conversational interviews, yet fail simple coding tests.
           | 
           | That is, I suspect a large majority of "ha! I caught a
           | faker!" anecdotes or anec-data are actually a false-negative
           | on the test from people choking under interview conditions,
           | and that most _real_ fakers would also have been caught in
           | any half-competent conversational interview anyway.
           | 
           | The remainder should be probably be hired regardless, then
           | re-homed to sales when you realize what's going on, since
           | they're evidently top-percentile bullshitters and social
           | chameleons. Joking, of course... kinda. Maybe.
        
             | medvezhenok wrote:
             | There's certainly part of that, but there's also likely
             | selection/sample bias. The better the programmer, the less
             | time they probably spend in interviews. And the "reward" on
             | passing a programming interview if you're marginal is quite
             | high, depending on your alternative choices of employment.
             | 
             | (Top tier Ivy League university acceptance rate is 3-10%,
             | so if you're looking for someone of at least that caliber
             | among the general population (or even among developers), an
             | 80% rejection rate might not be crazy. FAANG (or its ilk)
             | are the "Ivy League" of Software Development jobs)
        
       | activitypea wrote:
       | "A great interview question" for what? The author sings their
       | praises, but never actually explains what this question is good
       | at testing or what makes it worth an interviewer and candidate's
       | time. Results _might_ favor younger or earlier-in-career folks --
       | okay, why is that a good thing?
        
         | musicale wrote:
         | Younger/earlier in career folks are usually cheaper, which
         | companies like.
         | 
         | Though I imagine they could just ask "when was your last
         | algorithms course?" (and possibly where, and what grade you
         | got) and save a lot of time.
         | 
         | But that wouldn't be as useful for "making the candidate jump
         | through more hoops" and reinforcing compliance and status
         | hierarchy, which is a major purpose of interviews.
        
       | jacknews wrote:
       | What kind of programming will be done on the job by these
       | candidates? Probably nothing like this puzzle.
       | 
       | This kind of 'let's watch people struggle with my favourite LC
       | puzzle, which I know inside-out' just seems very inconsistent,
       | not very predictive, and just quite flaky to me.
       | 
       | On the job, my very first response to any kind of problem that
       | looks algorithmic is to google the hell out of it, even if I
       | think I know the answer already. In future, it will be to get gpt
       | to write it.
       | 
       | I think the actual challenges in real-world software are not
       | algorithmic, but architectural and beyond, and then only after
       | thoroughly understanding the problem in the first place.
        
       | bonzini wrote:
       | I am guilty as charged of saying impulsively that the first
       | solution is O(n)!
       | 
       | The reverse trie is genius, but with dynamic programming do you
       | still need it? I would say that if in the general setting you can
       | afford O(n^2) preparatory steps, and therefore you can just
       | remember which (start, end) pairs correspond to a valid word with
       | regular trie lookups. That is O(n) steps for each start position,
       | giving a total of O(n^2).
        
         | eesmith wrote:
         | Doesn't that confuse two uses of n?
         | 
         | One is the length of the word. The other is the number of
         | dictionary elements.
         | 
         | If your language is constrained to 1,000 words, then the
         | dictionary will be pretty small and the hash table can be
         | constructed with no chains. That should have constant-time
         | lookup no matter how long the query concatenation is.
         | 
         | I don't see why O(n^2) should be optimal. The question is only
         | to determine if a match exists, and that can be expressed as a
         | regular expression. In the following I omit all 1-letter words
         | since my dictionary includes all the letters.
         | >>> words = [line.strip() for line in
         | open("/usr/share/dict/words") if len(line.strip()) > 1]
         | >>> words.sort(key=len, reverse=True) # because Python uses
         | left-to-right match, not longest       >>> pattern = "(" +
         | "|".join(words) + ")+$"       >>> import re       >>> matcher =
         | re.compile(pattern) # takes a few seconds       >>>
         | matcher.match("ducksoup")       <re.Match object; span=(0, 8),
         | match='ducksoup'>       >>> matcher.match("ducksquom")
         | >>> matcher.match("theremanywordsinthisone")       <re.Match
         | object; span=(0, 23), match='theremanywordsinthisone'>
         | >>> matcher.match("theremanywordsqinthisone") is None
         | True
         | 
         | Python uses backtracking, so this probably isn't O(n),
         | especially with the ability to choose the dictionary.
         | 
         | But with there are non-backtracking matchers which would make
         | this O(n). Here's re2 from https://github.com/google/re2 :
         | >>> import re2       >>> opts = re2.Options()       >>>
         | opts.max_mem = 200000000       >>> matcher =
         | re2.compile(pattern, opts)       >>> match =
         | matcher.match("ducksoup")       re2/re2.cc:821: DFA out of
         | memory: pattern length 2493008, program size       1323834,
         | list count 1089251, bytemap range 54       >>> match.start(),
         | match.end()       (0, 8)       >>> match =
         | matcher.match("ducksquom")       re2/re2.cc:821: DFA out of
         | memory: pattern length 2493008, program size       1323834,
         | list count 1089251, bytemap range 54       >>> match is None
         | True
         | 
         | This is first time I've used re2. There's probably a way to
         | prevent that warning message - I believe this is falling back
         | to an NFA, which might not be O(n)?
         | 
         | Anyway, here it is processing the concatenation of 100,000
         | words selected at random (I've omitted the warning message):
         | >>> s="".join(random.sample(words, 100000))       >>> len(s)
         | 956249       >>> import time       >>> t1 = time.time();
         | print(f"is concatenation? {matcher.match(s) is not None} " +
         | f"time: {time.time()-t1:.1f} seconds")       is concatenation?
         | True time: 12.3 seconds       >>> t1 = time.time(); print(f"is
         | concatenation? {matcher.match(s+'Q') is not None} " + f"time:
         | {time.time()-t1:.1f} seconds")       is concatenation? False
         | time: 12.3 seconds
         | 
         | Cut the size by 10 and it's about 10x faster:
         | >>> s="".join(random.sample(words, 10000))       >>> len(s)
         | 95457       >>> t1 = time.time(); print(f"is concatenation?
         | {matcher.match(s) is not None} " + f"time: {time.time()-t1:.1f}
         | seconds")       is concatenation? True time: 1.3 seconds
         | 
         | For this test it appears to be linear in the query size.
         | 
         | Edit: if I increase max_mem to 2_000_000_000 then I don't get
         | the warning message. But then the DFA-based search takes 141.5
         | seconds. That should be O(n) time in the query length, but an
         | order of magnitude slower than the NFA-based search.
         | 
         | That's why O(n) analysis isn't enough to really figure out "a
         | faster solution."
        
           | bonzini wrote:
           | In the article n always refers to the length of the string.
        
         | behdad wrote:
         | With the dynamic-programming you don't need the reverse trie
         | indeed.
        
       ___________________________________________________________________
       (page generated 2023-04-12 23:01 UTC)