CROSS-INTERACTION-BASED MULTIMODAL FEATURE COMPARISON FOR MOVING OBJECT IDENTIFICATION IN CROWDED VIDEO SCENES

Begmatov Shohruh; Arabboev Mukhriddin; Nishanov Akhram

doi:10.5281/zenodo.20341226

Авторы

Shohruh Begmatov Tashkent University of Information Technologies named after Muhammad al-Khwarizmi Doctor of Philosophy (PhD) in Technical Sciences, Doctoral (DSc) student Автор
Mukhriddin Arabboev Tashkent University of Information Technologies named after Muhammad al-Khwarizmi Doctor of Philosophy (PhD) in Technical Sciences, Doctoral (DSc) student Автор
Akhram Nishanov Tashkent University of Information Technologies named after Muhammad al-Khwarizmi Doctor of Science in Technical Sciences, Professor Автор

DOI:

https://doi.org/10.5281/zenodo.20341226

Ключевые слова:

moving object identification, multimodal comparison, cross-interaction, object tracking, re-identification, crowded scenes.

Аннотация

Identifying moving objects in crowded video scenes is difficult because appearance information alone may be unreliable. Different people or objects may have similar visual appearances, while the same object may appear differently due to pose variation, scale changes, partial occlusion, illumination variation, or low visibility. To address this problem, this paper presents a cross-interaction-based multimodal feature comparison method for moving object identification. The proposed method represents each moving object using several complementary modalities, including appearance, geometry, spatial position, context, reliability, and clothing-color features. These heterogeneous features are projected into a common latent space before comparison. For two candidate detections, modality-wise comparison features are constructed using element-wise multiplication and absolute difference. Then, a cross-interaction function learns relationships between modalities, and an MLP estimates the final similarity probability. The proposed method is especially useful in difficult cases such as occlusion, lost track recovery, candidate ambiguity, and object reappearance. Compared with simple feature concatenation, the cross-interaction approach enables the model to learn conditional relationships across modalities and improves the reliability of moving-object identification in crowded scenes.

Библиографические ссылки

N. Wojke, A. Bewley, and D. Paulus, “Simple online and real-time tracking with a deep association metric,” in Proceedings of the IEEE International Conference on Image Processing, pp. 3645–3649, 2017.

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 2872–2893, 2022.

H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.

X. Zheng, J. Zhu, Y. Sun, and Z. Zheng, "Multimodal person re-identification based on transformer relation regularisation," Information Fusion, vol. 104, article 102128, 2024.

K. Jiang, T. Zhang, X. Liu, B. Qian, Y. Zhang, and F. Wu, “Cross-modality transformer for visible-infrared person re-identification,” in Proceedings of the European Conference on Computer Vision, pp. 480–496, 2022.

CROSS-INTERACTION-BASED MULTIMODAL FEATURE COMPARISON FOR MOVING OBJECT IDENTIFICATION IN CROWDED VIDEO SCENES

Авторы

DOI:

Ключевые слова:

Аннотация

Библиографические ссылки

Загрузки

Опубликован

Выпуск

Раздел

Лицензия

Как цитировать

Innovative Academy RSC