Post AUZe4WP6Wfc2Sf4S8m by thserra@sigmoid.social
(DIR) More posts by thserra@sigmoid.social
(DIR) Post #AUZdULraEqdrLS91cm by thserra@sigmoid.social
2023-04-12T11:03:09Z
0 likes, 0 repeats
📝Getting Away with More Network Pruning: From Sparsity to Geometry and Linear RegionsIn this #orms paper accepted at #cpaior2023, we extended and applied the theory of linear regions to better prune neural networks🔗https://arxiv.org/abs/2301.07966
(DIR) Post #AUZdcDYJ1RBNsysWKO by thserra@sigmoid.social
2023-04-12T11:04:35Z
0 likes, 0 repeats
We revisit the theory of linear regions of fully-connected neural networks with ReLU activationsThese networks model piecewise linear functions, each piece corresponding to a different activation pattern; the inputs associated with each piece are the the linear regions2/N
(DIR) Post #AUZdgJOVFCGPA8FHJQ by thserra@sigmoid.social
2023-04-12T11:05:19Z
0 likes, 0 repeats
Geometrically, each neuron partitions its inputs into half-spaces in which the neuron is active or notThe maximum number of regions produced by the arrangement depends on the dimension of the space partitionedThat's all you need for shallow networks3/N
(DIR) Post #AUZdknmCE4OmY7T7q4 by thserra@sigmoid.social
2023-04-12T11:06:08Z
0 likes, 0 repeats
In prior work with Christian Tjandraatmadja and @srikumarRam, we have seen that the number of neurons active in each linear region affects the dimension of the output, and therefore how much it can be partitioned by the next layer🔗https://arxiv.org/abs/1711.021144/N
(DIR) Post #AUZdqQZ9R0KkIj38SW by thserra@sigmoid.social
2023-04-12T11:07:09Z
0 likes, 0 repeats
In this new work, we consider what happens if we start sparsifying the weight matrices by pruning the neural networkUnder extreme circumstances, such as leaving all but one weight per layer, it is possible but very unlikely that the dimension of the output is not affected5/N
(DIR) Post #AUZdwxu4Oqi7Qsx3nE by thserra@sigmoid.social
2023-04-12T11:08:20Z
0 likes, 0 repeats
We account for this effect on the rank of sparsifying weight matrices to decide how to prune, so that we can perhaps prune one layer more than anotherWe apply that to magnitude pruning, one of the simplest but very effect among pruning strategies6/N
(DIR) Post #AUZdzvvsBTTyBlD69I by thserra@sigmoid.social
2023-04-12T11:08:52Z
0 likes, 0 repeats
Curiously, our upper bound on expectation is a better proxy for accuracy after pruning than actually counting the actual number of linear regionsIn retrospect, that makes sense given that the upper bound tells us about the potential of the architecture7/N
(DIR) Post #AUZe4WP6Wfc2Sf4S8m by thserra@sigmoid.social
2023-04-12T11:09:42Z
0 likes, 0 repeats
By judiciously choosing how much to prune from each layer to prevent the upper bound from dropping as much as before, we observe a considerable gain in accuracyI believe this is one of the first applications of theoretical results on the number of linear regions8/N