| dc.description.abstract | Multimodal recommendation systems (MMRS) aim to capture user preferences accurately by integrating users’ historical interaction behaviors with the rich multimodal features of recommended items. Prior research has primarily focused on enriching item-side representations by embedding modality features into item vectors. However, user-side modeling has remained underexplored, as existing methods typically treat each modality as a monolithic entity and fail to capture the nuanced structure of user interests within modalities, potentially limiting the model’s ability to represent intricate user preferences. To address this challenge, we propose a novel framework named USER (User-Side modality representation Enhancement for multimodal Recommendation). Specifically, our approach constructs a unified cross-modal preference representation that captures users’ co-perception behaviors across modalities. Building upon this representation, we propose a fine-grained preference mining module that extracts users’ fine-grained preferences and selectively emphasizes the most relevant preference factors for each modality at the token level, thereby refining the unified cross-modal preference representation to be more discriminative and modality-aware. Extensive experiments on three real-world datasets reveal that USER achieves notable improvements, with performance gains 3.24 %, 5.76 %, and 7.08 % across these datasets, respectively, underscoring the effectiveness of USER in enhancing user-side modality representation within multimodal recommendation systems. The source code and data are available at https://github.com/brave-child/USER | es |