https://judyye.github.io/ihoi/

What's in your hands?
3D Reconstruction of Generic Objects in Hands

CVPR 2022

---------------------------------------------------------------------

Yufei Ye, Abhinav Gupta, Shubham Tulsiani
Carnegie Mellon University, Meta AI

Paper Colab Notebook Code Slides
[teaser]
---------------------------------------------------------------------

Our work aims to reconstruct hand-held objects given a single RGB
image. In contrast to prior works that typically assume known 3D
templates and reduce the problem to 3D pose estimation, our work
reconstructs generic hand-held object without knowing their 3D
templates. Our key insight is that hand articulation is highly
predictive of the object shape, and we propose an approach that
conditionally reconstructs the object based on the articulation and
the visual input. Given an image depicting a hand-held object, we
first use off-the-shelf systems to estimate the underlying hand pose
and then infer the object shape in a normalized hand-centric
coordinate frame. We parameterized the object by signed distance
which are inferred by an implicit network which leverages the
information from both visual feature and articulation-aware
coordinates to process a query point. We perform experiments across
three datasets and show that our method consistently outperforms
baselines and is able to reconstruct a diverse set of objects. We
analyze the benefits and robustness of explicit articulation
conditioning and also show that this allows the hand pose estimation
to further improve in test-time optimization.

Video

---------------------------------------------------------------------

Comparison with Baselines

View more results on:
ObMan HO3D MOW
---------------------------------------------------------------------

Our approach differs from two baselines (HO[Hasson et al., 19], GF
[Karunratanakul et al. 20]) in three main aspects:
1) We formulate hand-held object reconstruction as conditional
inference.
2) We additionally use pixel-aligned local features.
3) We predict object in a normalized wrist frame with
articulation-aware positional encoding.

Input

[00005501]

Hasson et al, 19

Karunratanakul et al, 20

Our method

[train_MDF10_0151]
[train_BB10_0396]
[boardgame_v_OqaL1YP0pNM_frame000147]
[drink_v_hqBzbRsv8E0_frame000092]
[study_v_LDQPZ8ZeEec_frame000431]
[study_v_9LvWHppZVPU_frame000247]

Zero-shot cross-dataset generalization

More Results
---------------------------------------------------------------------

We directly evaluate models that are only trained on ObMan and MOW
datasets and report their reconstruction results on HO3D dataset.
Both models without finetuning still outperform baselines trained on
HO3D dataset. Interestingly, even though the MOW dataset only
consists of 350 training images, which is significantly less compared
to 21K images from the synthetic dataset, learning from MOW still
helps cross-dataset generalization. It indicates the importance of
diversity for in-the-wild training.

Input

[train_MC1_0068]

Ground Truth

Trained on Obman

Trained on MOW

[train_ABF10_0306]

Ablations on Hand Articulation

More Results
---------------------------------------------------------------------

We ablate with or without explicitly considering hand pose. The
comparison suggests that hand information provides complementary cue
to visual inputs to alleviate depth ambguity.

Input

[test_115_inp]

Ground Truth

W/O Articulation

Our Method

Test-Time Refinement

More Results
---------------------------------------------------------------------

At test time, we optimize the articulation pose parameters to
encourage contact and to discourage intersection, which can be
naturally incorporated with an SDF representation. The test-time
optimization increases physical plausibility.

Input

[55]

Before Refinement

After Refinement

Paper

[thumbnail]

Bibtex

---------------------------------------------------------------------
@inproceedings{ye2022hand, author = {Ye, Yufei and Gupta, Abhinav and
Tulsiani, Shubham}, title = {What's in your hands? 3D Reconstruction
of Generic Objects in Hands}, booktitle = {CVPR}, year = {2022} }
---------------------------------------------------------------------

Send feedback and questions to Yufei Ye. The website template is
borrowed from SIREN.