Action anticipation from multimodal data