https://petewarden.com/2021/12/11/the-death-of-feature-engineering-is-greatly-exaggerated/ Search [ ] [Search] Pete Warden's blog Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. Menu Skip to content * Home * About The Death of Feature Engineering is Greatly Exaggerated December 11, 2021 By Pete Warden in Uncategorized Leave a comment [zombie_han]Image by OfSmallThings One of the most exciting aspects of deep learning's emergence in computer vision a few years ago was that it didn't appear to require any feature engineering, unlike previous techniques like histograms-of-gradients or Haar cascades. As neural networks ate up other fields like NLP and speech, the hope was that feature engineering would become unnecessary for those domains too. At first I fully bought into this idea, and saw any remaining manually-engineered feature pipelines as legacy code that would soon be subsumed by more advanced models. Over the last few years of working with product teams to deploy models in production I've realized I was wrong. I'm not the first person to raise this idea, but I have some thoughts I haven't seen widely discussed on exactly why feature engineering isn't going away anytime soon. One of them is that even the original vision case actually does rely on a *lot* of feature engineering, we just haven't been paying attention. Here's a quote from a typical blog post discussing image models: "a deep learning system is a fully trainable system beginning from raw input, for example image pixels" (Emphasis added by me) I spent over a decade working on graphics and image processing, so the implicit assumption that the kinds of images we train networks on are at all "raw" always bothered me a bit. I was used to starting with truly RAW image files to preserve as much information from the original scene as possible. These formats reflect the output of the camera's CCD hardware pretty closely. This means that the values for each pixel correspond roughly linearly to the number of photons hitting the detector at that point, and the position of each measured value is actually in a Bayer pattern, rather than a simple grid of pixels. [350px-Baye]Image from Wikipedia So, even to get to the kind of two-dimensional array of evenly spaced pixels with RGB values that ML practitioners expect an image to contain, we have to execute some kind of algorithm to resample the original values. There are deep learning approaches to this problem, but it's clear that this is an important preprocessing step, and one that I'd argue should count as feature engineering. There's a whole world of other transformations like this that have to be performed before we get what we'd normally recognize as an image. These include some very complex and somewhat arbitrary transformations like white balancing, which everyday camera users might only become aware of during an apocalypse. There are also steps like gamma correction, which take the high dynamic ranges possible for the CCD output values (which reflect photon counts) and scale them into numbers which more closely resemble the human eye's response curve. Put very simplistically, we can see small differences in dark areas with much more sensitivity than differences in bright parts, so to represent images in an eight-bit byte it's convenient to apply a gamma curve so that more of the codes are used for darker values. I don't want this to turn into an image processing tutorial, but I hope that these examples illustrate that there's a lot of engineering happening before ML models get an image. I've come to think of these steps as feature engineering for the human visual system, and see deep learning as piggy-backing on all this work without realizing it. It makes intuitive sense to me that models benefit from the kinds of transformations that help us recognize objects in the world too. My instinct is that gamma correction makes it a lot easier to spot things in natural scenes, because you'd hope that the differences between two materials would remain roughly constant regardless of lighting conditions, and scaling the values keeps the offsets between the colors from varying as widely as they would with the raw measurements. I can easily believe that neural networks benefit from this property just like we do. If you accept that there is a lot of hidden feature engineering happening behind the scenes even for the classic vision models, what does this mean for other applications of deep networks? My experience has been that it's important to think explicitly about feature engineering when designing models, and if you believe your inputs are raw, it's worth doing a deep dive to understand what's really happening before you get your data. For example, I've been working with a team that's using accelerometer and gyroscope data to interpret gestures. They were getting good results in their application, but thanks for supply-chain problems they had to change the IMU they were using. It turned out that the original part included sensor fusion to produce estimates of the device's absolute orientation and that's what they were feeding into the network. Other parts had different fusion algorithms which didn't work as well, and even trying software fusion wasn't effective. Some problems included significant lag responding to movement and biases that sent the orientation way off over time. We switched the model to using the unfused accelerometer and gyroscope values, and were able to get back a lot of the accuracy we'd lost. In this case, deep learning did manage to eat that part of the feature engineering pipeline, but because we didn't have a good understanding of what was happening to our input data before we started we ended up spending extra time having to deal with problems that could have been more easily handled in the design and prototyping phase. Also, I don't have the knowledge of accelerometer hardware but I wouldn't be at all surprised if the "raw" values we're now using have actually been through some significant processing. Another area that feature engineering has surprised me with its usefulness is around labeling and debugging data problems. When I was working on building a more reliable magic wand gesture model, I was getting very frustrated with my inability to tell if the training data I was capturing from people was good enough. Just staring at six curves of the acceleration and gyroscope X, Y, Z values over time wasn't enough for me to tell if somebody had actually performed the expected gesture or not. I thought about trying to record video of the contributors, but that seemed a lot to ask. Instead, I put some work into reconstructing the absolute position and movement from the "raw" values. This effectively became an extremely poor man's version of sensor fusion, but focused on the needs of this particular application. I was not only able to visualize the data to check its quality, I started feeding the rendered results into the model itself, improving the accuracy. It also had the side-benefit that I could display an intuitive visualization of the gesture as seen by the model back to the user, so that they could gain an understanding of why it failed to recognize some attempts and learn to adapt their movements to be clearer from the model's perspective! [image]From Colab notebook I don't want to minimize deep learning's achievements in reducing the toil involved in building feature pipelines, I'm still constantly amazed at how effective they are. I would like to see more emphasis put on feature engineering in research and teaching though, since it's still an important issue that practitioners have to wrestle with to successfully deploy ML applications. I'm hoping this post will at least spark some curiosity about where your data has really been before you get it! Share this: * Twitter * Facebook * Like this: Like Loading... Related Post navigation << One weird trick to shrink convolutional networks for TinyML Leave a Reply Cancel reply Enter your comment here... [ ] Fill in your details below or click an icon to log in: * * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Google photo You are commenting using your Google account. ( Log Out / Change ) Twitter picture You are commenting using your Twitter account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] Follow @petewarden on Twitter My Tweets * RSS - Posts Recent Posts * The Death of Feature Engineering is Greatly Exaggerated * One weird trick to shrink convolutional networks for TinyML * How to write to flash on an Arduino Nano BLE * How to transfer files over BLE * How screen scraping and TinyML can turn any dial into an API Recent Comments [c2986] Derek Elkins on One weird trick to shrink conv... How do the leading d... on An Engineer's Guide to... Hello, Puppy -... on Correlation, Causation and Tho... [998df] Nick on How to transfer files over... AI Weekly: AI resear... on The Machine Learning Reproduci... Archives * December 2021 * August 2021 * April 2021 * February 2021 * November 2020 * August 2020 * May 2020 * February 2020 * January 2020 * August 2019 * July 2019 * April 2019 * March 2019 * October 2018 * July 2018 * June 2018 * May 2018 * April 2018 * March 2018 * February 2018 * January 2018 * December 2017 * November 2017 * October 2017 * August 2017 * July 2017 * June 2017 * May 2017 * April 2017 * January 2017 * December 2016 * September 2016 * May 2016 * April 2016 * March 2016 * February 2016 * November 2015 * October 2015 * September 2015 * August 2015 * May 2015 * April 2015 * March 2015 * January 2015 * December 2014 * November 2014 * October 2014 * September 2014 * August 2014 * July 2014 * June 2014 * May 2014 * April 2014 * March 2014 * February 2014 * January 2014 * December 2013 * November 2013 * October 2013 * September 2013 * August 2013 * July 2013 * June 2013 * May 2013 * April 2013 * March 2013 * February 2013 * January 2013 * November 2012 * October 2012 * August 2012 * July 2012 * June 2012 * May 2012 * April 2012 * March 2012 * February 2012 * January 2012 * December 2011 * November 2011 * October 2011 * September 2011 * August 2011 * July 2011 * June 2011 * May 2011 * April 2011 * March 2011 * February 2011 * January 2011 * December 2010 * November 2010 * October 2010 * September 2010 * August 2010 * July 2010 * June 2010 * May 2010 * April 2010 * March 2010 * February 2010 * January 2010 * December 2009 * November 2009 * October 2009 * September 2009 * August 2009 * July 2009 * June 2009 * May 2009 * April 2009 * March 2009 * February 2009 * January 2009 * December 2008 * November 2008 * October 2008 * September 2008 * August 2008 * July 2008 * June 2008 * May 2008 * April 2008 * March 2008 * February 2008 * January 2008 * December 2007 * November 2007 * October 2007 * September 2007 * August 2007 * July 2007 * June 2007 * May 2007 * April 2007 * March 2007 * December 2006 * November 2006 * October 2006 * September 2006 * August 2006 Pete Warden's blog Footer menu * Home * About Blog at WordPress.com. | * Follow Following + [2a5ea0] Pete Warden's blog Join 1,828 other followers [ ] Sign me up + Already have a WordPress.com account? Log in now. * + [2a5ea0] Pete Warden's blog + Customize + Follow Following + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar Loading Comments... Write a Comment... [ ] Email (Required) [ ] Name (Required) [ ] Website [ ] [Post Comment] %d bloggers like this: [b]