Collective behavior refers to the group-level organization arising from interaction among individuals. To exchange information the individuals rely on their sensing modalities. Experimental studies provide evidence that the integration of information from multiple sensory modalities can influence animal navigation and social communication. In this paper, we present a modified Vicsek model with a composite sensing scheme that combines both auditory and visual sensing cues through the set of sensory neighbors. We investigate the combined effect of auditory and visual sensing on the group behavior compared to pure vision and audition using numerical simulation. We observe that taking the advantage of composite modality, the particles get access to more information that enables them to form a single large, cohesive, and perfectly aligned group using a narrow sensing region, which is possible in either vision or audition only using a wider sensing region.