https://petewarden.com/2021/12/11/the-death-of-feature-engineering-is-greatly-exaggerated/

Search [                    ] [Search]
Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail
better.

Menu
Skip to content

  * Home
  * About

 

The Death of Feature Engineering is Greatly Exaggerated

December 11, 2021 By Pete Warden in Uncategorized Leave a comment
[zombie_han]Image by OfSmallThings

One of the most exciting aspects of deep learning's emergence in
computer vision a few years ago was that it didn't appear to require
any feature engineering, unlike previous techniques like
histograms-of-gradients or Haar cascades. As neural networks ate up
other fields like NLP and speech, the hope was that feature
engineering would become unnecessary for those domains too. At first
I fully bought into this idea, and saw any remaining
manually-engineered feature pipelines as legacy code that would soon
be subsumed by more advanced models.

Over the last few years of working with product teams to deploy
models in production I've realized I was wrong. I'm not the first
person to raise this idea, but I have some thoughts I haven't seen
widely discussed on exactly why feature engineering isn't going away
anytime soon. One of them is that even the original vision case
actually does rely on a *lot* of feature engineering, we just haven't
been paying attention. Here's a quote from a typical blog post
discussing image models:

    "a deep learning system is a fully trainable system beginning
    from raw input, for example image pixels"

    (Emphasis added by me)

I spent over a decade working on graphics and image processing, so
the implicit assumption that the kinds of images we train networks on
are at all "raw" always bothered me a bit. I was used to starting
with truly RAW image files to preserve as much information from the
original scene as possible. These formats reflect the output of the
camera's CCD hardware pretty closely. This means that the values for
each pixel correspond roughly linearly to the number of photons
hitting the detector at that point, and the position of each measured
value is actually in a Bayer pattern, rather than a simple grid of
pixels.

[350px-Baye]Image from Wikipedia

So, even to get to the kind of two-dimensional array of evenly spaced
pixels with RGB values that ML practitioners expect an image to
contain, we have to execute some kind of algorithm to resample the
original values. There are deep learning approaches to this problem,
but it's clear that this is an important preprocessing step, and one
that I'd argue should count as feature engineering. There's a whole
world of other transformations like this that have to be performed
before we get what we'd normally recognize as an image. These include
some very complex and somewhat arbitrary transformations like white
balancing, which everyday camera users might only become aware of
during an apocalypse. There are also steps like gamma correction,
which take the high dynamic ranges possible for the CCD output values
(which reflect photon counts) and scale them into numbers which more
closely resemble the human eye's response curve. Put very
simplistically, we can see small differences in dark areas with much
more sensitivity than differences in bright parts, so to represent
images in an eight-bit byte it's convenient to apply a gamma curve so
that more of the codes are used for darker values.

I don't want this to turn into an image processing tutorial, but I
hope that these examples illustrate that there's a lot of engineering
happening before ML models get an image. I've come to think of these
steps as feature engineering for the human visual system, and see
deep learning as piggy-backing on all this work without realizing it.
It makes intuitive sense to me that models benefit from the kinds of
transformations that help us recognize objects in the world too. My
instinct is that gamma correction makes it a lot easier to spot
things in natural scenes, because you'd hope that the differences
between two materials would remain roughly constant regardless of
lighting conditions, and scaling the values keeps the offsets between
the colors from varying as widely as they would with the raw
measurements. I can easily believe that neural networks benefit from
this property just like we do.

If you accept that there is a lot of hidden feature engineering
happening behind the scenes even for the classic vision models, what
does this mean for other applications of deep networks? My experience
has been that it's important to think explicitly about feature
engineering when designing models, and if you believe your inputs are
raw, it's worth doing a deep dive to understand what's really
happening before you get your data. For example, I've been working
with a team that's using accelerometer and gyroscope data to
interpret gestures. They were getting good results in their
application, but thanks for supply-chain problems they had to change
the IMU they were using. It turned out that the original part
included sensor fusion to produce estimates of the device's absolute
orientation and that's what they were feeding into the network. Other
parts had different fusion algorithms which didn't work as well, and
even trying software fusion wasn't effective. Some problems included
significant lag responding to movement and biases that sent the
orientation way off over time. We switched the model to using the
unfused accelerometer and gyroscope values, and were able to get back
a lot of the accuracy we'd lost.

In this case, deep learning did manage to eat that part of the
feature engineering pipeline, but because we didn't have a good
understanding of what was happening to our input data before we
started we ended up spending extra time having to deal with problems
that could have been more easily handled in the design and
prototyping phase. Also, I don't have the knowledge of accelerometer
hardware but I wouldn't be at all surprised if the "raw" values we're
now using have actually been through some significant processing.

Another area that feature engineering has surprised me with its
usefulness is around labeling and debugging data problems. When I was
working on building a more reliable magic wand gesture model, I was
getting very frustrated with my inability to tell if the training
data I was capturing from people was good enough. Just staring at six
curves of the acceleration and gyroscope X, Y, Z values over time
wasn't enough for me to tell if somebody had actually performed the
expected gesture or not. I thought about trying to record video of
the contributors, but that seemed a lot to ask. Instead, I put some
work into reconstructing the absolute position and movement from the
"raw" values. This effectively became an extremely poor man's version
of sensor fusion, but focused on the needs of this particular
application. I was not only able to visualize the data to check its
quality, I started feeding the rendered results into the model
itself, improving the accuracy. It also had the side-benefit that I
could display an intuitive visualization of the gesture as seen by
the model back to the user, so that they could gain an understanding
of why it failed to recognize some attempts and learn to adapt their
movements to be clearer from the model's perspective!

[image]From Colab notebook

I don't want to minimize deep learning's achievements in reducing the
toil involved in building feature pipelines, I'm still constantly
amazed at how effective they are. I would like to see more emphasis
put on feature engineering in research and teaching though, since
it's still an important issue that practitioners have to wrestle with
to successfully deploy ML applications. I'm hoping this post will at
least spark some curiosity about where your data has really been
before you get it!

Share this:

  * Twitter
  * Facebook
  * 

Like this:

Like Loading...

Related

Post navigation

<< One weird trick to shrink convolutional networks for TinyML

Leave a Reply Cancel reply

Enter your comment here...
[                    ]

Fill in your details below or click an icon to log in:

  *  
  *  
  *  
  *  

Gravatar
Email (required) (Address never made public)
[                    ]
Name (required)
[                    ]
Website
[                    ]
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out / 
Change )

Google photo

You are commenting using your Google account. ( Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. ( Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

[ ] Notify me of new comments via email.

[ ] Notify me of new posts via email.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

Follow @petewarden on Twitter

My Tweets

  * RSS - Posts

Recent Posts

  * The Death of Feature Engineering is Greatly Exaggerated
  * One weird trick to shrink convolutional networks for TinyML
  * How to write to flash on an Arduino Nano BLE
  * How to transfer files over BLE
  * How screen scraping and TinyML can turn any dial into an API

Recent Comments

[c2986] Derek Elkins on One weird trick to shrink conv...
        How do the leading d... on An Engineer's Guide to...
        Hello, Puppy -... on Correlation, Causation and Tho...
[998df] Nick on How to transfer files over...
        AI Weekly: AI resear... on The Machine Learning Reproduci...

Archives

  * December 2021
  * August 2021
  * April 2021
  * February 2021
  * November 2020
  * August 2020
  * May 2020
  * February 2020
  * January 2020
  * August 2019
  * July 2019
  * April 2019
  * March 2019
  * October 2018
  * July 2018
  * June 2018
  * May 2018
  * April 2018
  * March 2018
  * February 2018
  * January 2018
  * December 2017
  * November 2017
  * October 2017
  * August 2017
  * July 2017
  * June 2017
  * May 2017
  * April 2017
  * January 2017
  * December 2016
  * September 2016
  * May 2016
  * April 2016
  * March 2016
  * February 2016
  * November 2015
  * October 2015
  * September 2015
  * August 2015
  * May 2015
  * April 2015
  * March 2015
  * January 2015
  * December 2014
  * November 2014
  * October 2014
  * September 2014
  * August 2014
  * July 2014
  * June 2014
  * May 2014
  * April 2014
  * March 2014
  * February 2014
  * January 2014
  * December 2013
  * November 2013
  * October 2013
  * September 2013
  * August 2013
  * July 2013
  * June 2013
  * May 2013
  * April 2013
  * March 2013
  * February 2013
  * January 2013
  * November 2012
  * October 2012
  * August 2012
  * July 2012
  * June 2012
  * May 2012
  * April 2012
  * March 2012
  * February 2012
  * January 2012
  * December 2011
  * November 2011
  * October 2011
  * September 2011
  * August 2011
  * July 2011
  * June 2011
  * May 2011
  * April 2011
  * March 2011
  * February 2011
  * January 2011
  * December 2010
  * November 2010
  * October 2010
  * September 2010
  * August 2010
  * July 2010
  * June 2010
  * May 2010
  * April 2010
  * March 2010
  * February 2010
  * January 2010
  * December 2009
  * November 2009
  * October 2009
  * September 2009
  * August 2009
  * July 2009
  * June 2009
  * May 2009
  * April 2009
  * March 2009
  * February 2009
  * January 2009
  * December 2008
  * November 2008
  * October 2008
  * September 2008
  * August 2008
  * July 2008
  * June 2008
  * May 2008
  * April 2008
  * March 2008
  * February 2008
  * January 2008
  * December 2007
  * November 2007
  * October 2007
  * September 2007
  * August 2007
  * July 2007
  * June 2007
  * May 2007
  * April 2007
  * March 2007
  * December 2006
  * November 2006
  * October 2006
  * September 2006
  * August 2006

Pete Warden's blog

Footer menu

  * Home
  * About

Blog at WordPress.com.
|

  * Follow Following
      + [2a5ea0] Pete Warden's blog
        Join 1,828 other followers
        [                    ]
        Sign me up
      + Already have a WordPress.com account? Log in now.
  * 
      + [2a5ea0] Pete Warden's blog
      + Customize
      + Follow Following
      + Sign up
      + Log in
      + Copy shortlink
      + Report this content
      + View post in Reader
      + Manage subscriptions
      + Collapse this bar

 

  

Loading Comments...
 
Write a Comment... [                    ]
Email (Required) [                    ] Name (Required)
[                    ] Website [                    ]
[Post Comment]

%d bloggers like this:

[b]