Author:
Priya BalasubramanianPublished in
Journal of Science Technology and Research( Volume 6, Issue 1 )
1. M. Billinghurst, A. Clark, and G. Lee, “A Survey of Augmented Reality,” Foundations and Trends® in Human–Computer Interaction, vol. 8, no. 2-3, pp. 73-272, 2015.
2. C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, pp. 55-60.
3. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
4. T. Nguyen, D. Pham, H. Tran, and S. Venkatesh, “Spatial Audio Rendering for Virtual Reality Applications: A Review,” IEEE Access, vol. 8, pp. 134645-134663, 2020.
5. A. Dosovitskiy et al., “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv preprint arXiv:2010.11929, 2020.
6. J. T. Barron, “A Generalization of NeRFs for View Synthesis,” in Advances in Neural Information Processing Systems (NeurIPS), 2021.
7. Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436-444, 2015.
8. M. C. Azuma, “A Survey of Augmented Reality,” Presence: Teleoperators and Virtual Environments, vol. 6, no. 4, pp. 355-385, 1997.
9. M. Slaney, “Audio Spatialization in Virtual Reality,” in Proceedings of the 1st International Workshop on Immersive Mixed and Virtual Environment Systems, 2018.
10. I. Goodfellow et al., “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems (NeurIPS), 2014.
11. J. Rekimoto, “GestureWrist and GesturePad: Unobtrusive Wearable Interaction Devices,” in Proceedings of the 5th International Symposium on Wearable Computers, 2001, pp. 21-27.
12. S. Guttormsen, D. Halpern, and R. A. Overhage, “Edge-Cloud Architecture for Real-Time Immersive Multimedia Applications,” IEEE Transactions on Multimedia, vol. 22, no. 11, pp. 2925-2938, 2020.
13. B. R. Jones and M. Slater, “Sense of Presence within a Virtual Reality Environment: An Experimental Investigation,” Presence: Teleoperators and Virtual Environments, vol. 17, no. 1, pp. 93-110, 2008.
14. P. Ekstrand, D. Heckerman, and E. Horvitz, “Collaborative Filtering Recommender Systems,” Annual Review of Information Science and Technology, vol. 42, pp. 291-328, 2008.
15. H. W. Park, D. J. Park, and S. H. Lee, “AI-Based Multimodal Interaction for Immersive VR Applications,” IEEE Access, vol. 9, pp. 65048-65059, 2021.
Immersive Multimedia Intelligence AI
Immersive Multimedia Intelligence AI The goal of immersive multimedia is no longer just to display content—it is to engage, interpret,
and respond to user interactions in real-time. Technologies such as 360° video offer panoramic
visual experiences that simulate real-world environments, while spatial audio adds directional
soundscapes that align with a user’s head movements and position. XR technologies further
blur the boundary between the physical and digital worlds, enabling users to manipulate and
explore multimedia elements in three-dimensional, context-aware settings. However, achieving
seamless interaction, personalization, and adaptability in such environments presents a
significant computational and cognitive challenge—one that can be effectively addressed
through intelligent systems powered by AI.
Related Works
This research investigates the application of AI and GenAI in creating, managing, and enhancing
immersive multimedia experiences. Unlike conventional multimedia systems that rely on static
content delivery, this framework incorporates AI models for context-aware media generation,
predictive interaction, and adaptive content modulation. Using deep learning and multi-modal
fusion techniques, the proposed system interprets user behavior, preferences, and
environmental inputs to deliver customized, real-time multimedia experiences. For example, a
360° learning module can adjust the complexity of visual and auditory elements based on the
learner’s focus and pace, while a VR-based collaboration tool can dynamically reconfigure
virtual spaces for optimal engagement.
At the heart of this study lies the integration of Generative AI models, such as Transformer-
based architectures and diffusion models, which enable the automatic creation of realistic
environments, avatars, narratives, and ambient effects. When combined with spatial computing
and computer vision algorithms, these models empower multimedia systems to become
intelligent and generative rather than merely reactive. Additionally, natural language
processing (NLP) and AI emotion recognition components are introduced to refine user-system
interaction, making communication more fluid and contextually appropriate Ai-Driven approach.
Download
