Humans perceive the world through their bodies. The theory of object affordances suggests that when encountering an object, our brain encodes it not only based on its physical properties but also according to how we intend to use it. Decades of foundational research in neuroscience indicate that object properties are associated with distinct regions of the sensorimotor cortex, depending on the grasp type they tend to activate. In this study, we trained a Conditional Variational Autoencoder (CVAE) on the HO-3D_v3 dataset to reconstruct hand poses conditioned on object properties. Principal Component Analysis (PCA), clustering, and visualization of the model’s latent space revealed structured patterns for the abstract representation of the hand, which were distinctly organized according to object associations. This bears a notable resemblance to neural strategies observed in the human sensorimotor cortex for representing object-grasp relationships. This finding supports the notion that artificial intelligence systems can develop brain-like latent representations of object affordances. Such representations could significantly enhance robotic control in the future by enabling real-time motor planning for high-degree-of-freedom humanoid hand actions in an abstract latent space, bypassing the need for low-level pixel- and joint-level computations.

Disentangling Grasp-Object Representations in the Latent Space: Toward Brain-Like Affordances for Machines

Di Nuovo A.
Ultimo
2026-01-01

Abstract

Humans perceive the world through their bodies. The theory of object affordances suggests that when encountering an object, our brain encodes it not only based on its physical properties but also according to how we intend to use it. Decades of foundational research in neuroscience indicate that object properties are associated with distinct regions of the sensorimotor cortex, depending on the grasp type they tend to activate. In this study, we trained a Conditional Variational Autoencoder (CVAE) on the HO-3D_v3 dataset to reconstruct hand poses conditioned on object properties. Principal Component Analysis (PCA), clustering, and visualization of the model’s latent space revealed structured patterns for the abstract representation of the hand, which were distinctly organized according to object associations. This bears a notable resemblance to neural strategies observed in the human sensorimotor cortex for representing object-grasp relationships. This finding supports the notion that artificial intelligence systems can develop brain-like latent representations of object affordances. Such representations could significantly enhance robotic control in the future by enabling real-time motor planning for high-degree-of-freedom humanoid hand actions in an abstract latent space, bypassing the need for low-level pixel- and joint-level computations.
2026
9783032074478
9783032074485
Conditional Variational Autoencoders (CVAEs)
Grasp Embeddings
Latent Space Analysis
Object Affordances
Principal Component Analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/707272
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact