fsebugoutzone.org:9999

       Posts by thserra@sigmoid.social
 (DIR) Post #AUZdcDYJ1RBNsysWKO by thserra@sigmoid.social
       2023-04-12T11:04:35Z
       
       0 likes, 0 repeats
       
       We revisit the theory of linear regions of fully-connected neural networks with ReLU activationsThese networks model piecewise linear functions, each piece corresponding to a different activation pattern; the inputs associated with each piece are the the linear regions2/N
       
 (DIR) Post #AUZdgJOVFCGPA8FHJQ by thserra@sigmoid.social
       2023-04-12T11:05:19Z
       
       0 likes, 0 repeats
       
       Geometrically, each neuron partitions its inputs into half-spaces in which the neuron is active or notThe maximum number of regions produced by the arrangement depends on the dimension of the space partitionedThat&#39;s all you need for shallow networks3/N
       
 (DIR) Post #AUZdknmCE4OmY7T7q4 by thserra@sigmoid.social
       2023-04-12T11:06:08Z
       
       0 likes, 0 repeats
       
       In prior work with Christian Tjandraatmadja and @srikumarRam, we have seen that the number of neurons active in each linear region affects the dimension of the output, and therefore how much it can be partitioned by the next layer🔗https://arxiv.org/abs/1711.021144/N
       
 (DIR) Post #AUZdqQZ9R0KkIj38SW by thserra@sigmoid.social
       2023-04-12T11:07:09Z
       
       0 likes, 0 repeats
       
       In this new work, we consider what happens if we start sparsifying the weight matrices by pruning the neural networkUnder extreme circumstances, such as leaving all but one weight per layer, it is possible but very unlikely that the dimension of the output is not affected5/N
       
 (DIR) Post #AUZdwxu4Oqi7Qsx3nE by thserra@sigmoid.social
       2023-04-12T11:08:20Z
       
       0 likes, 0 repeats
       
       We account for this effect on the rank of sparsifying weight matrices to decide how to prune, so that we can perhaps prune one layer more than anotherWe apply that to magnitude pruning, one of the simplest but very effect among pruning strategies6/N
       
 (DIR) Post #AUZdzvvsBTTyBlD69I by thserra@sigmoid.social
       2023-04-12T11:08:52Z
       
       0 likes, 0 repeats
       
       Curiously, our upper bound on expectation is a better proxy for accuracy after pruning than actually counting the actual number of linear regionsIn retrospect, that makes sense given that the upper bound tells us about the potential of the architecture7/N
       
 (DIR) Post #AUZe4WP6Wfc2Sf4S8m by thserra@sigmoid.social
       2023-04-12T11:09:42Z
       
       0 likes, 0 repeats
       
       By judiciously choosing how much to prune from each layer to prevent the upper bound from dropping as much as before, we observe a considerable gain in accuracyI believe this is one of the first applications of theoretical results on the number of linear regions8/N
       
 (DIR) Post #AUZe8cDsphYjfPm4um by thserra@sigmoid.social
       2023-04-12T11:10:26Z
       
       0 likes, 0 repeats
       
       If you would like to hear more about it, this paper is on tour:📅 April 13: AI4OPT Seminar at Georgia Tech 📅 April 19: US Naval Academy📅 April 26: Syracuse University📅 May 5: SNN workshop (ICLR)📅 May 29-June 1: CPAIOR 9/9
       
 (DIR) Post #AVH6HvlTpX3kLTU0iO by thserra@sigmoid.social
       2023-05-03T10:18:26Z
       
       0 likes, 0 repeats
       
       In case you missed Calvin Tsay&#39;s tweet, we have just released a survey along with Gonzalo Munoz and Joey Huchette on polyhedral theory in deep learning: https://arxiv.org/abs/2305.00241This thread covers some of the main points, why we did this, and why you should care about it. Read along!
       
 (DIR) Post #AVH6KgAAIHoepGL7nE by thserra@sigmoid.social
       2023-05-03T10:18:57Z
       
       0 likes, 0 repeats
       
       First, this is a topic that we the authors have all explored in recent years, in one way or another.Polyhedral theory can help us understand what neural networks with ReLU activations can model, and also how to train them and to optimize over trained networks more efficiently.  2/N
       
 (DIR) Post #AVH6OodmiRcUHGAOVE by thserra@sigmoid.social
       2023-05-03T10:19:42Z
       
       0 likes, 0 repeats
       
       Second, we have learned a lot in process of writing about neural networks from the very basics. The introduction starts from the most basic models, includes a historical perspective, and argues about why we are where we are with these networks today.
       
 (DIR) Post #AVH6TOOhiOIwKjIMQS by thserra@sigmoid.social
       2023-05-03T10:20:31Z
       
       0 likes, 0 repeats
       
       (I recall Mike Trick saying that you do a new PhD every 5 years in academia, and indeed this is how I feel about this survey; especially because the idea was born in 2019 and developed very slowly!) 4/N
       
 (DIR) Post #AVH6Um80yC3QDs9FLs by thserra@sigmoid.social
       2023-05-03T10:20:44Z
       
       0 likes, 0 repeats
       
       Third, this survey allows us to go back to our own research and related scholarship and talk about them in a longer and more didactic format.For me, that means explaining the concept of linear regions and what we make out of them in much more detail than in research papers.5/N
       
 (DIR) Post #AVH6YFWE9YAgFE87RQ by thserra@sigmoid.social
       2023-05-03T10:21:22Z
       
       0 likes, 0 repeats
       
       We are talking about neural networks that model piecewise linear functions, and the number of these pieces can quickly grow very large. Now I had room to talk about that in a more accessible way. 6/N
       
 (DIR) Post #AVH6be9r0AeBWwHjMW by thserra@sigmoid.social
       2023-05-03T10:21:59Z
       
       0 likes, 0 repeats
       
       Not to mention the geometry of these pieces - or linear regions - produced by each layer of the neural network and how they affect the number of pieces that can be produced by the next layers as well.7/N
       
 (DIR) Post #AVH6csspjpmhURuj1U by thserra@sigmoid.social
       2023-05-03T10:22:15Z
       
       0 likes, 0 repeats
       
       Ultimately, we can think about the union of these pieces as a disjunctive program, which is indeed one of the topics from my PhD that ended up attracting me to do theoretical work in deep learning.8/N
       
 (DIR) Post #AVH6hZLDp4o0KTQUXA by thserra@sigmoid.social
       2023-05-03T10:23:05Z
       
       0 likes, 0 repeats
       
       Conveniently, that takes us to the next topic of the survey: optimization over a trained neural network. This has many relevant applications, including robustness against adversarial attacks (nicely illustrated with a picture of Calvin Tsay&#39;s dog)9/N
       
 (DIR) Post #AVH6lhfymBWJKz6gQi by thserra@sigmoid.social
       2023-05-03T10:23:47Z
       
       0 likes, 0 repeats
       
       Strengthening these MILP formulations is a topic in which both Joey Huchette and @CalvinTsay have done great contributions, which they contextualize in the survey with the current state of this topic.10/N
       
 (DIR) Post #AVH6r4aqzuHbrWGnAm by thserra@sigmoid.social
       2023-05-03T10:24:48Z
       
       0 likes, 0 repeats
       
       Up to this point, the survey is about ways in which discrete optimization and related theoretical tools can complement deep learning.In the last stride, Gonzalo Munoz discussed at length how we may (and sometimes should) train neural networks using discrete optimization!11/N
       
 (DIR) Post #AVH6sGSaMZdexIyEYi by thserra@sigmoid.social
       2023-05-03T10:25:01Z
       
       0 likes, 0 repeats
       
       This was a long and ambitious undertaking, and it is nevertheless very likely that we may have missed important work besides the 329 references we cite. Hence, your feedback to any of us is deeply appreciated!12/12