[HN Gopher] A regular expression to check for prime numbers (2007)
___________________________________________________________________
A regular expression to check for prime numbers (2007)
Author : graderjs
Score : 267 points
Date : 2022-03-05 04:34 UTC (18 hours ago)
(HTM) web link (www.noulakaz.net)
(TXT) w3m dump (www.noulakaz.net)
| [deleted]
| amp108 wrote:
| Interesting, I wrote a quick script to use this:
|
| #!/usr/bin/env ruby
|
| def is_prime(n) ("1" \* n.to_i)!~
| /^1?$|^(11+?)\1+$/
|
| end
|
| num = ARGV.shift
|
| $stdout.puts is_prime(num)
|
| ... then decided to test it on a button-mashed number:
|
| $ruby ~/prime.rb 767423
|
| ... and it hung.
|
| Thought it was the length of the number, so I tried something
| smaller (101). Quick "true". Then I tried progressively more
| digits. I got to `ruby ~/prime.rb 7673121` (false) and it took
| less than a second.
|
| But it doesn't like something about the number 767423.
| TrianguloY wrote:
| The regexp basically tests all dividers of the number (from 2
| to the number-1), if it founds one, it says false and stops, if
| nothing is found, it says true.
|
| 101 is prime, but it does "only" a hundred checks approx.
|
| 7673121 is divisible by 3, so as soon as it tries to match
| groups of 3 it succeeds and stops.
|
| 767423 is prime, so it needs to do hundreds of millions of
| checks, that's probably why it hungs.
|
| Basically: the runtime of the algorithm should be equivalent to
| the minimum divisor excluding 1 of the number.
| vbezhenar wrote:
| Simpler regexp: ^(11+)\1+$
|
| it does not handle "1" for clarity. I also don't understand why
| author used `+?`, because simple `+` should work just as well. At
| least it works in Idea for me.
| MauranKilom wrote:
| Yeah, should work even with greedy matching as it just does the
| same checks but in reverse order.
| catlifeonmars wrote:
| I wonder which one will reject composite numbers faster. The
| non greedy version starts with the smallest (potential) factor
| first, while the greedy version starts with the largest
| candidate factor first.
|
| I suspect that since a factor must be smaller than the square
| root of a composite, the greedy version you proposed would
| waste a lot of time checking numbers which could never be
| factors of the composite under test.
|
| If the number is prime, I think both variants have the same
| performance because they will exhaustively evaluate all numbers
| less than or equal to the composite under test.
|
| Edit: this is not quite correct. One of the prime factors must
| be smaller than the square root of the composite numbers, but
| the other one can obviously be larger. For example: 2 * 17 =
| 34. I'm leaving the original statement as is, because I'm
| hoping someone with a better math background can chime in and
| explain if the way that prime factors are distributed is
| symmetrical or not.
| johnx123-up wrote:
| FWIW...
|
| Better explanations in http://neilk.net/blog/2000/06/01/abigails-
| regex-to-test-for-...
|
| Abigail's other regular expression hacks
| https://metacpan.org/author/ABIGAIL?sort=[[3,1]]
| bawolff wrote:
| This is interesting - i thought it would be something complex,
| but really its just using the fundamental meaning of prime.
|
| Literally what its doing is turn a number in to all 1s, if the
| number is 1 return true. Otherwise try all possible ways to put
| it into 2 or more equal groups, return true once we find one that
| works.
|
| You don't get closer to the definition of "prime" than that.
| russfink wrote:
| If the number of 1's itself is prime, then n is prime. That's
| all this does. Search the other comments for 'hung' or 'hang'
| to see examples of small numbers that take forever to resolve.
| bawolff wrote:
| Sure. its basically a less efficient form of trial division.
| pxeger1 wrote:
| Here is a regex to check for integer powers of four:
| https://codegolf.stackexchange.com/a/223129
|
| And a regex to integer-divide numbers by the square root of 2:
| https://codegolf.stackexchange.com/a/198428
| userbinator wrote:
| It's basically https://en.wikipedia.org/wiki/Trial_division using
| the + operator, and enforcing the bounds of the match to require
| a zero remainder.
| mmaunder wrote:
| I prefer the source which I think is a clearer explanation. The
| key is that perl's x operator turns the number into a string of
| 1s that is the length of the number. The rest is a hack to use
| regex to perform division using minimal capturing and back
| references.
|
| http://montreal.pm.org/tech/neil_kandalgaonkar.shtml
| btdmaster wrote:
| It's crazy how fast Perl's regex engine does this!
| avinash wrote:
| Aha. I wrote this post 15 years ago after explaining to my wife
| (who is a Language major) how this "regular expression" works.
|
| It has been discussed a few times on Hacker News, Reddit, etc.
| over the past 15 years...
| bmn__ wrote:
| Credit is due to Abigail
| <https://web.archive.org/web/2020/http://www.abigail.be/>.
| Please give it and be a good Internet citizen by amending the
| post.
| phgn wrote:
| @dang can we please add (2007) to the title?
| beardyw wrote:
| It doesn't seem to be past its "best before" date.
| phgn wrote:
| What do you mean? I actually love posts about old articles,
| they're often more useful than any fleeting news.
| chx wrote:
| As an aside, a Hungarian mathematician wrote an entire book
| explaining how mathematics works to a, I guess, a language
| major (well, Marcell Benedek was a professor, a writer, a
| literary historian, a translator, a theatre director).
|
| https://en.wikipedia.org/wiki/Playing_with_Infinity
| [deleted]
| rofr999 wrote:
| I'm going to test this on large numbers and post the results!
| saagarjha wrote:
| You'll probably not be able to finish before the 14-day reply
| period expires.
| xcambar wrote:
| I can see you have a very efficient method for wasting time and
| energy.
|
| Why? The method works. Size doesn't matter. But it's extremely
| inefficient. Which can be calculated rather than measured.
| tmalsburg2 wrote:
| I think OP understands this and their comment was a joke.
| mrjin wrote:
| Yep. It seems doesn't matter what, there is always people
| trying to use it in a way it wasn't meant to.
| wittycardio wrote:
| Not sure if anything about his comment suggests that tbh.
| reactspa wrote:
| Is this a case of P != NP?
| pxx wrote:
| primality testing is in P by the AKS construction
| (https://www.microsoft.com/en-us/research/wp-
| content/uploads/...)
| rofr999 wrote:
| dhosek wrote:
| At first I was thinking this was actually operating on a binary
| representation of the numbers. Some interesting things become
| apparent when you look at numbers in binary, for example, numbers
| of the form 2^n-1 can be represented as a string of n 1s. From
| there, it's pretty obvious that if n is composite that 2^n is
| also composite since you can take that string of n=kp 1s and
| write it as k groups of p 1s so obviously, 2^p-1 divides 2^n-1.
| There's the famous quote from Leopold Kronecker, "God made the
| integers" and I always thought that when he did, he wrote them in
| binary.
| Rerarom wrote:
| No, in unary.
| cmeacham98 wrote:
| .
| zodiac wrote:
| From reading the proof, I think he meant "n composite => 2^n
| - 1 composite" and it's just a typo
| randv wrote:
| while compact and looks elegant...
|
| it is basically trying to group by 11 (divide by 2), if there no
| match (via backtracking) group by 111 (divide by 3) and so on
|
| not very efficient by any stretch esp with string matches.
| randv wrote:
| steps:
|
| - it is turning the number into a string of 1s 3 => 111 4 =>
| 1111 and so on
|
| - then trying to simulate divisibility test by trying to find
| perfect groups of '11' to simulate division by 2 failing which
| (that is the backtracking part) and trying find perfect groups
| of '111' to simulate division by 3 and so on...
|
| Clever but very inefficient
| dorianmariefr wrote:
| > require "prime" => true > 10.prime? =>
| false
| [deleted]
| brightffw wrote:
| tasty_freeze wrote:
| What it is really doing is checking if the length of a string is
| a prime number or not. And as coded, it is checking specifically
| if the length of a string consisting of some number of '1'
| characters is prime or not.
| randv wrote:
| - it is turning the number into a string of 1s 3 => 111 4 =>
| 1111 and so on
|
| - then trying to simulate divisibility test by trying to find
| perfect groups of '11' to simulate division by 2 failing which
| (that is the backtracking part) and trying find perfect groups
| of '111' to simulate division by 3 and so on...
|
| Clever but very inefficient
| rawling wrote:
| I've done some Codewars problems where you have to write a
| regex to check whether an input number is divisible by a
| specific number, so I was really interested to see how that
| would work for primes.
|
| Nope, it's just playing with lengths.
| colanderman wrote:
| I'll be "that guy" -- this is _not_ a regular expression [1]. A
| Perl regexp, sure. But backreferences are not part of the grammar
| of regular expressions, and moreover, using them in that
| particular manner (to refer to a capture of unbounded length)
| cannot be encoded as a DFA and therefore does not describe a
| regular language.
|
| (I have no clue whether unary-representation primality is
| actually regular or not -- it's been far too long since my
| automata class to remember how to (dis)prove that -- but I guess
| it's at least sublinear, since primality is polynomial w/r/t the
| logarithm of the number being tested.)
|
| [1]
| https://en.wikipedia.org/wiki/Regular_expression#Formal_defi...
| dudeman13 wrote:
| Thanks for that.
|
| My initial reaction to the title was "wait, you can describe
| prime numbers using formal languages?". I could feel it in my
| guts that this would probably have some mind blowing Math/CS
| implications.
|
| A shame how I'm so rusty in my Mathematics and CS nowadays :(
| aghilmort wrote:
| had same immediate reaction, like whoa, huge if true, time to
| read the comments
| rurban wrote:
| So left recursive is not regular, so not a language? Every
| simple language is defined recursively. Regularity just defines
| it as terminating, to avoid the simplicity of recursion.
|
| DFA is the absurdity, not NFA.
| colanderman wrote:
| What are you talking about? "Not a language?" "DFA is the
| absurdity?" I'm sorry, I don't follow.
| rurban wrote:
| read for example about the state explosion of DFA's:
|
| "It is important because NFAs can be used to reduce the
| complexity of the mathematical work required to establish
| many important properties in the theory of computation. For
| example, it is much easier to prove closure properties of
| regular languages using NFAs than DFAs." https://en.m.wikip
| edia.org/wiki/Nondeterministic_finite_auto...
|
| but more importantly look at the various attempts to the
| solve backtracking "problem" (stack overflow by unbounded
| recursion), as done recently in perl and pcre, which led to
| more insecure AND more unmaintainable regex code by
| converting their simple recursion to iterations on the
| heap. first it got much slower, as the state is now handled
| on the heap, and second it got much more insecure as DOS
| attacks can now exhaust the whole heap, not just around
| 1024 stack depths. and third, it got 10x more complicated,
| and unreadable.
|
| ad language: you can of course define a language by
| forbidding left recursion, which yacc folks would prefer.
| but you can define it much easier with left recursive
| parsers, like 10x smaller. of course you can always expand
| simple NFA's to deterministic DFA's at the cost of
| exponential state explosion and increased complexity. but
| it is a monstrosity only impractical CS folks prefer. SW
| engineers prefer the similar, faster and more secure NFA
| backtracking versions, as long as you count the stack
| depth.
| colanderman wrote:
| As I mentioned elsewhere, linear performance is a hard
| requirement for some domains such as intrusion
| prevention. Unbounded recursion isn't the primary
| problem; exponential runtime is.
|
| If state-space explosion is a concern, you can implement
| a non-backtracking regular expression matcher using term
| rewriting. For true regular expressions, the size of the
| rewritten terms are a function of the size of the pattern
| (and thus performance is linear). For non-regular
| expressions, this does not hold.
| jonstewart wrote:
| You can also just implement an NFA but apply
| determinization to much of it, with tight memory
| allocation bounds, to get most of the benefit.
| danieldk wrote:
| _DFA is the absurdity, not NFA._
|
| Every non-deterministic finite state automaton can be
| converted into a deterministic finite state automaton that
| accepts the same language (see powerset construction [1]), so
| I am not sure what you are saying here.
|
| [1] https://en.wikipedia.org/wiki/Powerset_construction
| rurban wrote:
| yes, and this absurdity does not make it a better language.
| rather a worse.
|
| eg tons of simple recursively defined languages were
| converted to DFA monstrosities, and since then they became
| slower and unmaintainable.
| DyslexicAtheist wrote:
| as the saying goes if your only tool is a hammer ...
| perl/regexp is obviously the wrong tool for the job. Finding
| new prime numbers, minting bitcoin, running seti@home, is what
| C++ exceptions have been invented for.
| colejohnson66 wrote:
| Coming soon: "Mining Bitcoin at Compile Time using constexpr"
| pxx wrote:
| you can use the pumping lemma (https://en.wikipedia.org/wiki/Pu
| mping_lemma_for_regular_lang...) to do this proof trivially.
| it'll look something like
|
| {1^x : x is prime} is not a regular language. suppose it is a
| regular language. then by the pumping lemma there exists a p
| such that every string of length at least p can be written as w
| = xyz, where |y| >= 1, |xy| <= p [not an important observation
| for this proof], and we can "pump" a new string w' = xy^nz (for
| any n > 0) and have w' remain in L.
|
| let p' be the prime immediately above p. then the string
|
| w = 1^p'
|
| is in the language L and must be "pumpable" for some N = |y| >
| 0. Choose the number of times we "pump" (n in the expression
| xy^nz) to be p'. Then that would imply
|
| w' = 1^(p' + p' * N)
|
| is in the language L, a contradiction, as p' + p' * N has
| factors p' and (1 + N). Note we can also use the CFL pumping
| lemma to prove that this particular language is not context-
| free either; if we choose s = uvwxy, we can still just pump p'
| times; the new string just is of size p' * (1 + |v| + |x|), and
| |v| + |x| > 0.
| thaumasiotes wrote:
| > we can "pump" a new string w' = xy^nz (for any n > 0)
|
| While this is certainly true, I would prefer to note that n
| >= 0. n > 0 is just a special case of that.
|
| (You can see clearly that 0 repetitions will always be valid
| by considering the proof of the pumping lemma: since a
| regular expression is recognized by a finite state machine,
| the only way to recognize a string of length exceeding the
| (finite!) number of states in the machine is to visit some
| states more than once, which means traveling around a cycle
| in the state graph. The 'pumpable substring' of the pumping
| lemma is a path through such a cycle, and repeating it zero
| times corresponds to traveling around the cycle zero times.
| The rest of the path is just as valid as always.)
| maweki wrote:
| Just to note additionally that the pumping lemma may only be
| used in that direction, not the other way.
|
| The language {1^x : x is not prime} is easily pumped with
| x=z=1 and y=11. And regular languages are closed under
| complement.
|
| No pumping Lemma -> Not regular, Yeah Pumping Lemma -> Who
| knows
| Capira wrote:
| this is why I love hackernews
| kevinventullo wrote:
| It's been a while since I took Theory of Computation. Does
| this imply that primes in binary or n-ary are also not
| regular?
| jwilk wrote:
| You would need a different proof (likely a much more
| complicated one) for n-ary primes.
| jwilk wrote:
| https://math.stackexchange.com/questions/1232463/how-to-
| prov... has a proof for base 2. This method works for
| other bases too.
| colanderman wrote:
| Fantastic! It's been 15 years since I've had to do those
| proofs, I'll use your proof to refresh my memory how they
| work.
| petmon wrote:
| "A Perl regexp, sure." is not part of the grammar of English
| because it lacks a verb.
| 867-5309 wrote:
| perhaps you meant "does not follow". only the loose rules are
| "part of"
|
| besides, "it is" is implied in that sentence, so there's your
| verb
| xigoi wrote:
| Natural languages don't have a formal grammar.
| tromp wrote:
| While we're nitpicking, it doesn't check for prime numbers
| either. It checks for composite length.
| nickelpro wrote:
| I'll be "the other guy" -- no one uses the term regex or
| regular expression in a programming context to mean a strictly
| regular language in the theoretical computer science sense.
| Nitpicking this point is ignorant of how the usage of the term
| has evolved within the programming community.
|
| Good. There's that obligative conversation over and done with.
| firechickenbird wrote:
| Regular expressions are Regular Languages, thus equivalent to
| deterministic finite state automata. These languages have
| very interesting properties due to their regularity (eg.
| pumping lemma), and they are a very small strict subset of
| Context-Free languages, which are also a strict subset of
| Turing-Complete languages. If you add recursion and
| backtracking to RegExps you easily go outside the Regular
| Languages class, and you probably end up in the TC class. Now
| you are dealing with an undecidable machine which is also
| very hard to define and reason about (and you also have two
| more problems). What's the point of calling it regular?
| nickelpro wrote:
| I'm hitting Poe's Law here, are you joking?
|
| Your average coding bootcamp kid cracking away in node and
| invoking the builtin regex engine has no idea what any of
| those words mean.
|
| To those programmers the words "regular expression" have
| become completely separated from their original etymology
| in lexical analysis. To them a regex means "input to a
| regex engine", and thus the term "use a regular expression"
| means "use a regex engine". It could be "use a froofroo
| boingo", for all the meaning the words hold. "Regular
| expression" just became the vernacular due to its
| origination in lexical analysis.
| firechickenbird wrote:
| Wow, I have no idea what regexps mean? I can write a book
| about regexps, and if you really want to learn what
| regexps are, let me know, we can scedule a session.
|
| Anyway, real REGUALR expressions belong to the regular
| languages class, thus equivalent to DFAs. Given a regexp,
| you can construct a DFA in O(2^n), where n is the number
| of operators in the regexp. Now this DFA will have a
| Linear time complexity match. If you extend regular
| expressions with backreferences, you can't build DFAs
| anymore, and you also lose the fundamental property of
| regularity: the pumping lemma for regular languages.
| Furthermore, the languages you can build with this
| extension go even outside the context-free class. Not
| only that: you are also dealing with an NP-complete
| matching. Again, why do you still call them regular?
| Biganon wrote:
| I think you two are agreeing 100%, but your egos prevent
| you from even realizing it
| eutropia wrote:
| I'll be the guy who postscripts that "hardmode" regexes would
| have important mathematical and computer science implications
| when/if they solve certain problems (like identifying
| primes). So it's not pure pedantry, and there's an important
| difference between "regexes to solve programming problems"
| and "regexes solving math problems".
| dpwm wrote:
| "hardmode" regexes do have important computer science
| implications: they can be matched orders of magnitude
| faster [0].
|
| Demonstrating the power of a regex engine by solving a
| problem like this is interesting and very clever, but it's
| hopefully not used in production.
|
| [0] https://swtch.com/~rsc/regexp/regexp1.html
| colanderman wrote:
| Having worked in an industry where the linear vs. exponential
| performance implications of regular expressions and PCREs was
| crucial (intrusion protection systems), I'd just as soon
| software professionals stop being sloppy with terminology.
|
| Not to mention the plethora of mostly-but-not-totally
| compatible regexp languages which exist. C++ supports 6 [1],
| and that doesn't even include PCREs. Miscommunication about
| which language one is actually using is a common source of
| regexp bugs.
|
| [1]
| https://en.cppreference.com/w/cpp/regex/syntax_option_type
| earleybird wrote:
| Yes, thank you.
|
| If you're sloppy with words you probably have other nasty
| habits. (apologies to Heinlein)
|
| There are reasons to employ regular languages with
| extensions (to not quite context free languages) but be
| clear about what you're talking about.
| efitz wrote:
| Grammar sticklers usually fail on this point: human
| languages are not static, stuck in place from the time
| the stickler learned the rules.
|
| The current state of a human language is defined by
| common usage. The purpose of a language is for people to
| be able to communicate ideas. Ideas exist all up and down
| a specificity spectrum, e.g. "regex" might mean a
| specific syntax or one of several syntaxes depending on
| context. Kind of like "kleenex" might refer to a specific
| brand or to the general concept of tissue.
|
| If 99.9% of people successfully (by their own measure,
| not yours) communicate what they mean when they use the
| term "regular expression", but you are among the 0.1% for
| which that interaction fails, then I would argue that the
| problem is your pedantry, not the term. I'm not saying
| this to be mean; I have similar issues.
| bawolff wrote:
| I'd argue that if this is sloppy, then the use of "you"
| instead of "thou" is sloppy too. Using "you" for
| everything is confusing. How can i possibly know if you
| mean plural or singular?
|
| Sure some people might say "thou" is archaic, however so
| is using "regex" outside of academic contexts to mean
| regular languages. I don't see how one could justify one
| but not the other.
| lloeki wrote:
| > Having worked in an industry where the linear vs.
| exponential performance implications of regular expressions
| and PCREs was crucial (intrusion protection systems)
|
| same boat. that's why we use re2 in a way inspired by Go
| (which means losing a few features but the tradeoff is
| worth it):
|
| > The regexp implementation provided by this package is
| guaranteed to run in time linear in the size of the input.
| (This is a property not guaranteed by most open source
| implementations of regular expressions.) For more
| information about this property, see
|
| > https://swtch.com/~rsc/regexp/regexp1.html
|
| > or any book about automata theory.
|
| https://pkg.go.dev/regexp
| colanderman wrote:
| OCaml's regexps run in linear time also. They're based on
| Ville Laurikari's Tagged REs: https://laurikari.net/tre/
| skissane wrote:
| > C++ supports 6
|
| Wow, TIL. I always thought C++ had evolved into an
| excessively complex language, but this really seems to take
| the cake. ECMAScript (with C++-only changes), basic POSIX,
| extended POSIX, POSIX awk, POSIX grep, and POSIX extended
| grep. You are right there is no PCRE in there, yet.
| (There's always C++23... and if not that, C++26 or C++30)
|
| (Although ECMASCript regex was heavily inspired by
| Perl/PCRE anyway, albeit without being 100% compatible with
| either, so in a sense PCRE is already there.)
| leni536 wrote:
| It's unlikely that std::regex will gain new dialects.
| Unfortunately it is very broken, it's unlikely that a
| proposal to extend it would go through.
| MauranKilom wrote:
| Can you elaborate (or point me to sources about) how
| std::regex is broken? I'd like to learn more.
| leni536 wrote:
| AFAIK it's not even the interface that is broken, but
| some standard libraries poushed out slow implementations
| that cannot be changed without an ABI break.
| tomcam wrote:
| I tried to imagine the test suite for these C++ variations
| but failed due to a coronary event
| bspammer wrote:
| This isn't entirely a pedantic thing. Cloudflare had an
| outage a couple of years ago which was caused by someone not
| knowing the difference:
|
| https://blog.cloudflare.com/details-of-the-cloudflare-
| outage...
| Thorrez wrote:
| >no one uses the term regex or regular expression in a
| programming context to mean a strictly regular language in
| the theoretical computer science sense.
|
| The RE2 library seems to:
|
| >loosely speaking, a regular language is a set of strings
| that can be matched in a single pass through the text using
| only a fixed amount of memory.
|
| https://github.com/google/re2/wiki/Syntax
|
| Note that RE2 doesn't support backreferences.
| jltsiren wrote:
| There is no such thing as _the_ programming community. Only a
| huge number of partially overlapping communities using wildly
| different terminology. Some communities still talk about
| regular expressions in the original meaning (because labeled
| graphs are useful objects), just like some communities still
| use the word "algorithm" outside machine learning or
| "crypto" in non-blockchain contexts.
|
| I personally prefer using "regular expression" in the
| language-theoretic sense and "regex" or "regexp" for related
| concepts in programming languages and software.
| [deleted]
| dpwm wrote:
| > I personally prefer using "regular expression" in the
| language-theoretic sense and "regex" or "regexp" for
| related concepts in programming languages and software.
|
| This is also what Larry Wall proposes:
|
| > [...] Pattern Matching, generally having to do with what
| we call "regular expressions", which are only marginally
| related to real regular expressions. Nevertheless, the term
| has grown with the capabilities of our pattern matching
| engines, so I'm not going to try to fight linguistic
| necessity here. I will, however, generally call them
| "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).
|
| https://raku.org/archive/doc/design/apo/A05.html
| thaumasiotes wrote:
| > (or "regexen", when I'm in an Anglo-Saxon mood)
|
| There's some kind of idea going around that because the
| plural of "ox" is "oxen", plurals with _-n_ must be
| triggered by a word ending in "x". So you see cutesy
| references to "boxen" or similar.
|
| But _-n_ is just an ordinary plural marker; the same
| ending that makes "oxen" the plural of "ox" also makes
| "kine" the plural of "cow" and "children" the plural of
| "child". (In those last two cases, the words are sporting
| two plural markers in sequence.)
| knome wrote:
| >There's some kind of idea going around
|
| It's not new, by any stretch. I remember seeing such many
| references in the early 2000s and beyond. Doing some
| vague poking around textfiles.com, I didn't pull any easy
| to find 1980s 'boxen' references. It almost looks like it
| might have been the addition of 'unix boxen' to the 'new
| hacker dictionary' by esr which spread the joke into the
| mid-90s communities. I wasn't online at that point
| myself, so I can't speak from any personal experience.
| But the various hacker zines etc are using 'boxen' by the
| later 90s and early 2000s, which would allow for it to
| have spread through the folks of the time finding and
| trolling through esr's version of the 'hacker
| dictionary'. https://www.dourish.com/goodies/jargon.html
| complains about esr adding unix references, and does not
| include any boxen references that are found in the http:/
| /cd.textfiles.com/hackersencyc/ETC/MISC/JARG_331.TXT
| version folks in the 90s would likely have spread around.
|
| edit: nope. definitely there in the 80s on usenet
|
| so he was just encoding existing slang usage, rather than
| introducing it, on this point.
|
| https://www.usenetarchives.com/view.php?id=net.wanted&mid
| =PD...
|
| more edit: considered the second oldest boxen reference
| in the usenet archive is esr himself, maybe he was the
| one pushing it :)
|
| https://www.usenetarchives.com/view.php?id=comp.sys.att&m
| id=...
| thaumasiotes wrote:
| I didn't say it was new. I said it was current. (And, um,
| unmotivated.)
|
| We might also note that from the perspective of modern
| English, plurals in -en are more characteristic of
| Geoffrey Chaucer than of Alfred the Great. We preserve
| three such words: oxen, children, and brethren. But only
| one of those is an Anglo-Saxon form; in Old English, the
| plural forms of cild and brothor are cildru and
| brothru[1]. The -ens appeared later.
|
| [1] "Brother" is a fairly irregular word, but there's no
| form ending in -n.
| samatman wrote:
| > _[1] "Brother" is a fairly irregular word, but there's
| no form ending in -n._
|
| It's brethren, for the record.
| ectopod wrote:
| That footnote is clearly talking about Old English.
| samatman wrote:
| Well. You're on Hacker News. So I'm going to quote the
| Jargon File and then you'll stop complaining. Right,
| cobber?
|
| > Further, note the prevalence of certain kinds of
| nonstandard plural forms. Some of these go back quite a
| ways; the TMRC Dictionary includes an entry which implies
| that the plural of `mouse' is {meeces}, and notes that
| the defined plural of `caboose' is `cabeese'. This latter
| has apparently been standard (or at least a standard
| joke) among railfans (railroad enthusiasts) for many
| years.
|
| > On a similarly Anglo-Saxon note, almost anything ending
| in `x' may form plurals in `-xen' (see {VAXen} and
| {boxen} in the main text). Even words ending in phonetic
| /k/ alone are sometimes treated this way; e.g., `soxen'
| for a bunch of socks. Other funny plurals are
| `frobbotzim' for the plural of `frobbozz' (see
| {frobnitz}) and `Unices' and `Twenices' (rather than
| `Unixes' and `Twenexes'; see {UNIX}, {TWENEX} in main
| text). But note that `Unixen' and `Twenexen' are never
| used; it has been suggested that this is because `-ix'
| and `-ex' are Latin singular endings that attract a
| Latinate plural. Finally, it has been suggested to
| general approval that the plural of `mongoose' ought to
| be `polygoose'.
|
| > The pattern here, as with other hackish grammatical
| quirks, is generalization of an inflectional rule that in
| English is either an import or a fossil (such as the
| Hebrew plural ending `-im', or the Anglo-Saxon plural
| suffix `-en') to cases where it isn't normally considered
| to apply.
| polski-g wrote:
| I wish everytime an application said "regular expression" it
| meant PCRE. People say sed supports RE but it's so divorced
| from PCRE I end up using `| perl -pe ....`
| nvader wrote:
| And takes space exponential to the number of bits in the input.
| xcambar wrote:
| "Take a problem. Solve it with a RegExp. Now you have two
| problems".
| asicsp wrote:
| See https://blog.codinghorror.com/regular-expressions-now-
| you-ha... for context around this (mis)quote.
| bmn__ wrote:
| Collapse that indirect pointer. The source is
| http://regex.info/blog/2006-09-15/247
| xcambar wrote:
| Thx, I had actually no idea it came from there.
| cqcn1991 wrote:
| this is very useful; thanks!
| rabbits77 wrote:
| I think it's worth noting that the original author of the regex
| primality test is Abigail, a well known Perl hacker.
|
| https://www.masteringperl.org/2013/06/how-abigails-prime-num...
|
| This blog article does a nice job discussing Abigail's work.
| ahmed_ds wrote:
| I feel like Charlie Kelly from IASIP reading this regex. "1.
| Starts out with 1, then there is other stuff...."
|
| https://youtu.be/AFDU_R6ShmI?t=74
| Dylan16807 wrote:
| Huh, looking at some regex explaining sites, they don't seem to
| do a good job of simplifying.
|
| Which is a shame because it's really easy to simplify away most
| of this regex.
|
| / does nothing
|
| ^ and $ only exist to make it do the whole string
|
| | is just saying "or", and only the part to the right really
| matters.
|
| And the ? doesn't do anything meaningful.
|
| Throw all that out and all you have is (11+)\1+
|
| It's still a mash of symbols, but it's _much_ simpler.
|
| And changing it to (cc+)\1+ would probably help naive
| readability too.
| mordechai9000 wrote:
| The question mark isn't meaningless - it changes the matching
| behavior of + from greedy to non-greedy. The non-greedy
| version will try to match shorter strings first. This should
| be more efficient, because non-primes will fail sooner. Most
| numbers will fail a lot sooner.
| Dylan16807 wrote:
| It takes just as long on primes, so the overall runtime
| will be very similar. So sure it's an optimization but it's
| not one to care about when you're trying to understand it.
| aghilmort wrote:
| my fave regex explainer gives this, idk that it adds much? ht
| tps://regexper.com/#%2F%5E1%3F%24%7C%5E%2811%2B%3F%29%5C1%...
| Dylan16807 wrote:
| That's the best I've seen so far, at least. Much better
| than a bunch of bullet points of various indentation.
| ogogmad wrote:
| Can you decide if a number is prime using a _true_ regular
| expression?
|
| No. This won't work because the time cost of primality testing is
| too high (cubic in the number of binary digits?), and checking if
| a string satisfies a regular expression takes only linear time.
| Since cubic is bigger than linear, this won't work. Too good to
| be true!
|
| Not a proof, but a rule of thumb. No need to even use the pumping
| lemma.
| thaumasiotes wrote:
| > This won't work because the time cost of primality testing is
| too high
|
| This is just a conjecture, though.
|
| > No need to even use the pumping lemma.
|
| But it's _really easy_ to prove this with the pumping lemma.
| This could be the first problem in a textbook section that
| introduces the pumping lemma. It would be difficult to come up
| with a language for which the pumping lemma disproof was more
| obvious.
| chakkepolja wrote:
| The canonical example for pumping lemma disproof is a^n b^n,
| which is more obvious IMO.
| thaumasiotes wrote:
| Here are the disproofs:
|
| -----
|
| (1) a^n b^n
|
| The pumping lemma tells us that for some string in this
| language, we can divide the string into three substrings _x
| y z_ such that _x_ _y_ ^ _n_ _z_ remains in the language
| for all _n_.
|
| There are two cases:
|
| (1)(a) The number of 'a' characters in _y_ does not equal
| the number of 'b' characters. In this case, pumping _y_
| will violate the constraint that a string in the language
| must have an equal number of 'a's and 'b's.
|
| (1)(b) _y_ consists of _k_ 'a's followed by _k_ 'b's for
| some _k_ > 0. In this case, pumping it will violate the
| constraint that for every string in this language, each 'a'
| character must precede every 'b' character.
|
| Thus, the language a^n b^n is not regular.
|
| -----
|
| (2) a^n, where n is prime
|
| The pumping lemma tells us that for some string in this
| language, we can divide the string into three substrings _x
| y z_ such that _x_ _y_ ^ _n_ _z_ remains in the language
| for all _n_. [same as above]
|
| But the string _x_ _y_ ^| _xz_ | _z_ has length | _xz_ | +
| | _y_ |*| _xz_ |, which is not prime.
|
| Thus, the language a^n (where n is prime) is not regular.
|
| -----
|
| Which of those was easier?
| Yajirobe wrote:
| > Since cubic is bigger than linear, this won't work.
|
| So it actually 'works', it's just slower.
| laszlokorte wrote:
| No, the point is that the article does not use a real regular
| expression: 1. Real regular expressiona can always be checked
| in linear time 2. primes can not be checked in linear time
| Ergo: primes can not be checked with regular expressions,
| otherwise there would be a way to check primes in linear
| time, which would contratict premise 2
___________________________________________________________________
(page generated 2022-03-05 23:02 UTC)