Understanding how we synchronize our actions with stimuli from different sensory modalities plays a central role in helping to establish how we interact with our multisensory environment. Recent research has shown better performance with multisensory over unisensory stimuli; however, the type of stimuli used has mainly been auditory and tactile. The aim of this article was to expand our understanding of sensorimotor synchronization with multisensory audio-visual stimuli and compare these findings to their individual unisensory counterparts. This research also aims to assess the role of spatio-temporal structure for each sensory modality. The visual and/or auditory stimuli had either temporal or spatio-temporal information available and were presented to the participants in unimodal and bimodal conditions. Globally, the performance was significantly better for the bimodal compared to the unimodal conditions; however, this benefit was limited to only one of the bimodal conditions. In terms of the unimodal conditions, the level of synchronization with visual stimuli was better than auditory, and while there was an observed benefit with the spatio-temporal compared to temporal visual stimulus, this was not replicated with the auditory stimulus.