African Journal of Biological Sciences

Strategies for Teaching Vocabulary Effectively to ESL Students: Identifying and Resolving Challenges : An Emperical Study
Volume 7 | Issue - 1 articles in press

Perceiving the Self and Others: A Phenomenological Analysis of Character Consciousness in Anita Nair's Fiction
Volume 7 | Issue - 1 articles in press

Soft Skills: The Catalyst for Critical Thinking in Engineering Education
Volume 7 | Issue - 1 articles in press

A Comprehensive Framework for Exploring and Evaluating Learning Digital Resources in English Language Classrooms
Volume 7 | Issue - 1 articles in press

EXPLORATION OF INDIGOFERA PROSTRATE AGAINST OXIDATIVE STRESS AND EVALUATION FOR NEUROPROTECTION IN CHEMICALLY INDUCED NEUROTOXIC RATS
Volume 7 | Issue - 1 articles in press

Required files to be uploaded

Copyright

Cross-Lingual Visual Understanding: A Transformer-Based Approach for Bilingual Image Caption Generation

PDF

Keywords:

Bilingual image captioning, cultural sensitivity in AI, transformer models, visual feature extraction, multilingual ai systems.

Emran Al-Buraihy, Dan Wang
» doi: 10.48047/AFJBS.6.8.2024.2990-3002

Abstract

In the evolving landscape of artificial intelligence, the capability to automatically generate image captions that are not only accurate but also culturally and linguistically nuanced remains a significant challenge, especially across diverse languages like Arabic and English. This research addresses the gap in bilingual image captioning by developing a transformer-based model designed to handle the complexities of cultural and linguistic diversity effectively. The proposed model integrates Convolutional Neural Networks (CNNs) for robust visual feature extraction with a dual-language transformer architecture that incorporates a novel cultural context embedding layer. This methodology ensures the generation of culturally sensitive and linguistically accurate captions. Employing a meticulously curated dataset featuring culturally diverse images annotated in both target languages, the model was trained and evaluated, demonstrating superior performance over existing models. Quantitative results show remarkable CIDEr scores of 60.2 for English and 58.7 for Arabic, underscoring its efficacy in generating contextually and culturally coherent captions. This study not only advances the field of multilingual image captioning but also sets a new standard for integrating cultural sensitivity into AI, proposing significant implications for future applications in global digital content accessibility.

Issue

Volume 6 , Issue - 8 (2024 )

Submit Your Article Here.

Cross-Lingual Visual Understanding: A Transformer-Based Approach for Bilingual Image Caption Generation

Abstract

POLICIES & JOURNAL LINKS

Contact Us

Cross-Lingual Visual Understanding: A Transformer-Based Approach for Bilingual Image Caption Generation

Article Sidebar

Main Article Content

Abstract

Article Details

POLICIES & JOURNAL LINKS

Contact Us