[HN Gopher] Waking up science's sleeping beauties (2023)
       ___________________________________________________________________
        
       Waking up science's sleeping beauties (2023)
        
       Author : bookofjoe
       Score  : 56 points
       Date   : 2024-10-21 14:34 UTC (8 hours ago)
        
 (HTM) web link (worksinprogress.co)
 (TXT) w3m dump (worksinprogress.co)
        
       | jessriedel wrote:
       | I think studying this stuff is always going to seem mysterious
       | unless you account for the concept of fashion in science.
       | Specifically what I mean is that two papers (or ideas or
       | approaches) X and Y can have equal "objective scientific merit"
       | but X is more popular than Y because of random initial conditions
       | (e.g., a famous researcher happened upon X first and started
       | mentioning it in their talks) that are self-reinforcing. The root
       | cause of this phenomenon is that most/all researchers can't
       | justify what they work on from first principles; for both good
       | and bad reasons, they ultimately rely on the wisdom of the crowd
       | to make choices about what to study and cite. This naturally
       | leads to big "flips" when a critical mass of people realize that
       | Y is better than X, and then suddenly everyone switches en mass.
        
         | ahazred8ta wrote:
         | Granted. I'm still trying to find out what led up to several
         | people revisiting Mendel's work in 1900.
         | https://en.wikipedia.org/wiki/Mendelian_inheritance
        
       | Componica wrote:
       | The Yann LeCun paper 'Gradient-Based Learning Applied to Document
       | Recognition' specified the modern implementation of a
       | convolutional neural network and was published in 1998. AlexNet,
       | which woke up the world to CNNs, was published in 2012.
       | 
       | Between that time in the early 2000s I was selling
       | implementations of really good object classifiers and OCRs.
        
         | jonas21 wrote:
         | It's not like people had been ignoring Yann LeCun's work prior
         | to AlexNet. It received quite a few citations and was famously
         | used by the US Postal Service for reading handwritten digits.
         | 
         | AlexNet happened in 2012 because the conditions necessary to
         | scale it up to more interesting problems didn't exist until
         | then. In particular, you needed:
         | 
         | - A way to easily write general-purpose code for the GPU (CUDA,
         | 2007).
         | 
         | - GPUs with enough memory to hold the weights and gradients
         | (~2010 - and even then, AlexNet was split across 2 GPUs).
         | 
         | - A popular benchmark that could demonstrate the magnitude of
         | the improvement (ImageNet, 2010).
         | 
         | Additionally, LeCun's early work in neural networks was done at
         | Bell Labs in the late 80s and early 90s. It was patented by
         | Bell Labs, and those patents expired in the late 2000s and
         | early 2010s. I wonder if that had something to do with CNNs
         | taking off commercially in the 2010s.
        
           | Componica wrote:
           | My take during that era was neural nets were considered taboo
           | after the second AI winter of the early 90s. For example, I
           | once proposed a start-up to consider a CNN as an alternative
           | to their handcrafted SVM for detecting retina lesions. The
           | CEO scoffed, telling me neural networks were dead only to
           | acknowledge they were wrong a decade later. Younger people
           | today might not understand, but there was a lot of pushback
           | if you even considered using a neural network during those
           | years. At the time, people knew that multi-layered neural
           | networks had potential, but we couldn't effectively train
           | them because machines weren't fast enough, and key
           | innovations like ReLU, better weight initializations, and
           | optimizers like Adam didn't exist yet. I remember it taking
           | 2-3 weeks to train a basic OCR model on a desktop pre-GPU. It
           | wasn't until Hinton's 2006 work on Restricted Boltzmann
           | Machines that interest in what we now call deep learning
           | started to grow.
        
             | mturmon wrote:
             | > My take during that era was neural nets were considered
             | taboo after the second AI winter of the early 90s.
             | 
             | I'm sure there is more detail to unpack here (more than one
             | paragraph, either yours or mine, can do). But as written
             | this isn't accurate.
             | 
             | The key thing missing from "were considered taboo ..." is
             | _by whom_.
             | 
             | My graduate studies in neural net learning rates
             | (1990-1995) were supported by an NSF grant, part of a
             | larger NSF push. The NeurIPS conferences, then held in
             | Denver, were very well-attended by a pretty broad community
             | during these years. (Nothing like now, of course - I think
             | it maybe drew ~300 people.) A handful of major figures in
             | the academic statistics community would be there -- Leo
             | Breiman of course, but also Rob Tibshirani, Art Owen, Grace
             | Wahba (e.g., https://papers.nips.cc/paper_files/paper/1998/
             | hash/bffc98347...).
             | 
             | So, not taboo. And remember, many of the people in that
             | original tight NeurIPS community (exhibit A, Leo Breiman;
             | or Vladimir Vapnik) were visionaries with enough
             | sophistication to be confident that there was something
             | actually _there_.
             | 
             | But this was very research'y. The application of ANNs to
             | real problems was not advanced, and a lot of the people
             | trying were tinkerers who were not in touch with what
             | little theory there was. Many of the very good reasons NNs
             | weren't reliably performing well are (correctly) listed in
             | your reply starting with "At the time".
             | 
             | If you can't _reliably_ get decent performance out of a
             | method that has such patchy theoretical guidance, you 'll
             | have to look elsewhere to solve your problem. But that's
             | not taboo, that's just pragmatic engineering consensus.
        
               | Componica wrote:
               | You're probably right in terms of the NN research world,
               | but I've been staring at a wall reminiscing for a 1/2
               | hour and concluded... Neural networks weren't widely used
               | in the late 90s and early 00s in the field of computer
               | vision.
               | 
               | Face detection was dominated by Viola-Jones and Haar
               | features, facial feature detection relied on active shape
               | and active appearance models (AAMs), with those iconic
               | Delaunay triangles becoming the emblem of facial
               | recognition. SVMs were used to highlight tumors, while
               | kNNs and hand-tuned feature detectors handled tumors and
               | lesions. Dynamic programming was used to outline CTs and
               | MRIs of hearts, airways, and other structures, Hough
               | transforms were used for pupil tracking, HOG features
               | were popular for face, car, and body detectors, and
               | Gaussian models & Hidden Markov Models were standard in
               | speech recognition. I remember seeing a few papers
               | attempting to stick a 3-layer NN on the outputs of AAMs
               | with limited success.
               | 
               | The Yann LeCun paper felt like a breakthrough to me. It
               | seemed biologically plausible, given what I knew of the
               | Neocognitron and the visual cortex, and the shared
               | weights of the kernels provided a way to build deep
               | models beyond one or two hidden layers.
               | 
               | At the time, I felt like Cassandra, going from past
               | colleagues and computer vision-based companies in the
               | region, trying to convey to them just how much of a game
               | changer that paper was.
        
       | Jun8 wrote:
       | This is fascinating and I think is one of the major areas that
       | new AI systems will impact humanity, ie by combing through
       | millions of papers to make connections and discover such sleeping
       | beauties.
       | 
       | BTW, I noticed a similar phenomenon on HN submissions (on much
       | shorter timescales): sometimes they just for a few hours with 2-3
       | points and then shoot up.
        
       | leoc wrote:
       | It's not the case that Bell etc. were simply overlooked: the
       | whole question of experimental tests of interpretations of
       | quantum mechanics was actively stigmatised and avoided by
       | physicists until well into the '70s, at least. Clauser couldn't
       | get a job in the area. https://arxiv.org/abs/physics/0508180
        
       | QuesnayJr wrote:
       | An interesting example of someone who managed to produce two
       | unrelated "sleeping beauties" in different fields is the German
       | mathematician Grete Hermann. Her Ph.D. thesis in the 20s gave
       | effective algorithms for many questions in abstract algebra. I
       | think her motivation was philosophical, that an effective
       | algorithm is better than an abstract existence result, but it
       | wasn't considered that interesting of a question until computers
       | were invented and computer algebra developed, and then
       | immediately several of her algorithms became the state-of-the-art
       | at the time.
       | 
       | Unrelatedly, she wrote on the foundations of quantum mechanics,
       | and showed that a "theorem" of John von Neumann, which would have
       | ruled out later research by Bohm and Bell if it were correct, was
       | false three years after he published it. Bohm and Bell had to
       | independently rediscover that the result was false years later.
        
       | dang wrote:
       | Related. Others?
       | 
       |  _The World Is Full of Sleeping Beauties_ -
       | https://news.ycombinator.com/item?id=35975866 - May 2023 (1
       | comment)
        
       | whatshisface wrote:
       | This would not happen as often if professors had time to read,
       | instead of being under pressure to write. The only external
       | incentive to read is so that you won't be turned down for lack of
       | novelty, and relative to that metric a paper the field does not
       | remember is better left nonexistent, and forgetting is as good as
       | creating a future discovery.
       | 
       | In an environment where the only effect of going around
       | popularizing information from the previous decade is interfering
       | with other people's careers, it is no wonder that it does not
       | happen. How did we end up with an academic system functioning as
       | the only institution in the world with a reward for ignorance?
        
         | psb217 wrote:
         | I figure a reasonable rule of thumb is that if someone got to
         | the top of some system by maximizing some metric X, where X is
         | the main metric of merit in that system, then they're unlikely
         | to push for the system to prefer some other metric Y, even if Y
         | is more aligned with the stated goals of the system. Pushing
         | for a shift from X-based merit to Y-based merit would
         | potentially imply that they're no longer sufficiently
         | meritorious to rule the system.
         | 
         | To your last point, I think a lot of systems reward ignorance
         | in one way or another. Eg, plausible denial, appearance of good
         | intent, and all other sorts of crap that can be exploited by
         | the unscrupulous.
        
           | godelski wrote:
           | While that's true, there's always exceptions. So I wouldn't
           | say this with a defeated attitude. But I think it is _also_
           | important to recognize that you can never measure things
           | directly, it is always a proxy. Meaning that there 's always
           | a difference between what you measure and your actual goals.
           | But I don't think it is just entrenched people that don't
           | want to recognize this. The truth is that this means problems
           | are much more complex and uncertain than we like. But there's
           | actually good reason to believe the simple story, because the
           | person selling that has clear evidence to their case while
           | the nature of the truth is "be careful" or "maintain
           | skepticism" is not only less exciting, it is, by nature, more
           | abstract.
           | 
           | Despite this, I think getting people to recognize that
           | measures are proxies, that they are things that must be
           | interpreted rather than read, is a powerful force when it
           | comes to fixing these issues. After all, even if you remove
           | those entrenched and change the metrics, you'll later end up
           | again with entrenchment. This isn't all bad, as time to
           | entrenchment matters, but we should try to make that take as
           | long as possible and try to fix things before entrenchment
           | happens. It's much easier to maintain a clean house than to
           | clean a dirty one. It's the small subtle things that add up
           | and compound.
        
         | mistermann wrote:
         | World class guerilla marketing might have something to do with
         | it, it is arguably the most adored institution in existence.
         | 
         | If you're the best, resting on one's laurels is not an uncommon
         | consequence.
        
         | godelski wrote:
         | > so that you won't be turned down for lack of novelty
         | 
         | I think this is also a reason for lots of fraud. It can be flat
         | out fraud, it can be subtle exaggerations because you might
         | know or have a VERY good hunch something is true but can't
         | prove or have the resources to prove (but will if you get this
         | work through), or the far more common obscurification. The
         | latter happens a lot because if something is easy to
         | understand, it is far more likely to be seen as not novel and
         | if communicated too well it may be even viewed as obvious or
         | trivial. It does not matter if no one else has done it or how
         | many people/papers you quote that claim the opposite.
         | 
         | On top of this, novelty scales extremely poorly. As we progress
         | more, what is novel becomes more subtle. As we see more ideas
         | the easier it is to relate one idea to another.
         | 
         | But I think the most important part is that the entire
         | foundation of science is replication. So why do we have a
         | system that not only does not reward the most important thing,
         | but actively discourages it? You cannot confirm results by
         | reading a paper (though you can invalidate by reading). You can
         | only confirm results by repeating. But I think the secret is
         | that you're going to almost learn something new, though
         | information gain decreases with number of replications.
         | 
         | We have a very poor incentive system which in general relies
         | upon people acting in good faith. It is a very hard system to
         | solve but the biggest error is to not admit that it is a noisy
         | process. Structures can only be held together by high morals
         | when the community is small and there is clear accountability.
         | But this doesn't hold at scale, because there are always
         | incentives to cut corners. But if you have to beat someone who
         | cuts corners it is much harder to do so without cutting more
         | corners. It's a slow death, but still death.
        
           | schmidtleonard wrote:
           | > the entire foundation of science is replication. So why do
           | we have a system that...
           | 
           | Because science is just like a software company that has
           | outgrown "DIY QA": even as the problem becomes increasingly
           | clear, nobody on the ground wants to be the one to split off
           | an "adversarial" QA team because it will make their immediate
           | circumstances significantly worse, even though it's what the
           | company needs.
           | 
           | I wouldn't extrapolate all the way to death, though. If there
           | are enough high-profile fraud busts that funding agencies
           | start to feel political heat, they will suddenly become
           | willing to fund QA. Until that point, I agree that nothing
           | will happen and the problem will get steadily worse until it
           | does.
        
             | godelski wrote:
             | I think I would say short term rewards heavily outweigh
             | long term rewards. This is even true when long term rewards
             | are much higher and even if the time to reward is not much
             | longer than the short version. Time is important, but I
             | think greatly over valued.
        
       | shae wrote:
       | I wish I had access to papers for free, I'd read more and do more
       | things.
       | 
       | For example, the earliest magnetic gears papers were $25 each and
       | I needed about ten that cited each other. That's why I didn't try
       | to create a magnetic hub for cycling. Att the time I thought I
       | could make a more compact geared hub, but needed the torque
       | calculations to be sure. I was a college student, my university
       | did not have access to those journals, and I had no money.
        
         | whatshisface wrote:
         | You do, they're on SciHub.
        
           | jessriedel wrote:
           | Also, ~every physics paper since 1993 is on the arXiv. The
           | same is true for math and CS with later cutoffs.
        
             | bonoboTP wrote:
             | Interesting that some communities (apparently physics) use
             | "arXiv" with the definite article ("the arXiv"), but in
             | machine learning / CS we always say simply "arXiv". I went
             | and checked, and the official site doesn't use an article
             | (https://info.arxiv.org/about/index.html)
        
       | fuzzfactor wrote:
       | Tip, meet iceberg.
       | 
       | Science is like music, _most of it_ is never recorded to begin
       | with.
       | 
       | Much less achieves widespread popularity.
       | 
       | When you restrict it to academic journals the real treasure-trove
       | can not even be partially contained in the vessel which you are
       | searching within.
        
       | m3kw9 wrote:
       | Just looking at the headline, I was expecting to see a few 10s
       | after the link.
        
       | BenFranklin100 wrote:
       | I'm skeptical of LLM's ability to reason, but trawling through
       | the vast research literature is an area where they can shine.
       | They can both summarize and serve as a superior search engine.
        
       ___________________________________________________________________
       (page generated 2024-10-21 23:00 UTC)