Cross-Modal Latent Dynamics
We propose a cross-modal dynamics model that learns how proprioceptive and semantic transitions jointly evolve under actions via asymmetric cross-attention, interpreting a semantic transition through a proprioceptive transition cue.