Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114025
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studiesen_US
dc.creatorGenta, Ien_US
dc.creatorHudi, Fen_US
dc.creatorIrawan, PAen_US
dc.creatorAnugraha, Den_US
dc.creatorPutri, RAen_US
dc.creatorYutong, Wen_US
dc.creatorNohejl, Aen_US
dc.creatorPrathama, UAen_US
dc.creatorOusidhoum, Nen_US
dc.creatorAmriani, Aen_US
dc.creatorRzayev, Aen_US
dc.creatorDas, Aen_US
dc.creatorPramodya, Aen_US
dc.creatorAdila, Aen_US
dc.creatorWilie, Ben_US
dc.creatorMawalim, COen_US
dc.creatorLam, CLen_US
dc.creatorAbolade, Den_US
dc.creatorChersoni, Een_US
dc.creatorSantus, Een_US
dc.creatorIkhwantri, Fen_US
dc.creatorKuwanto, Gen_US
dc.creatorZhao, Hen_US
dc.creatorWibowo, HAen_US
dc.creatorLovenia, Hen_US
dc.creatorCruz, JCBen_US
dc.creatorPutra, JWGen_US
dc.creatorMyung, Jen_US
dc.creatorSusanto, Len_US
dc.creatorMachin, MARen_US
dc.creatorZhukova, Men_US
dc.creatorAnugraha, Men_US
dc.creatorAdilazuarda, MFen_US
dc.creatorSantosa, Nen_US
dc.creatorLimkonchotiwat, Pen_US
dc.creatorDabre, Ren_US
dc.creatorAudino, RAen_US
dc.creatorCahyawijaya, Sen_US
dc.creatorZhang, SXen_US
dc.creatorSalim, SYen_US
dc.creatorZhou, Yen_US
dc.creatorGui, Yen_US
dc.creatorAdelani, DIen_US
dc.creatorLee, EAen_US
dc.creatorOkada, Sen_US
dc.creatorPurwarianti, Aen_US
dc.creatorAji, Aen_US
dc.creatorWatanabe, Ten_US
dc.creatorWijaya, DTen_US
dc.creatorOh, Aen_US
dc.creatorNgo, CWen_US
dc.date.accessioned2025-07-10T01:31:45Z-
dc.date.available2025-07-10T01:31:45Z-
dc.identifier.isbn979-8-89176-189-6en_US
dc.identifier.urihttp://hdl.handle.net/10397/114025-
dc.description2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Albuquerque, New Mexico, April 29- May 4, 2025en_US
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.rights©2025 Association for Computational Linguisticsen_US
dc.rightsACL materials are Copyright © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.en_US
dc.rightsThe following publication Winata, G. I., Hudi, F., Irawan, P. A., Anugraha, D., Putri, R. A., Wang, Y., ... & Ngo, C. W. 2025. WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3242–3264, Albuquerque, New Mexico. Association for Computational Linguistics is available at https://doi.org/10.18653/v1/2025.naacl-long.167.en_US
dc.titleWorldCuisines : a massive-scale benchmark for multilingual and multicultural visual question answering on global cuisinesen_US
dc.typeConference Paperen_US
dc.identifier.spage3242en_US
dc.identifier.epage3264en_US
dc.identifier.doi10.18653/v1/2025.naacl-long.167en_US
dcterms.abstractVision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (v. 1: Long Papers), p. 3242–3264. Albuquerque, New Mexico: Association for Computational Linguistics, 2025en_US
dcterms.issued2025-
dc.relation.ispartofbookProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)en_US
dc.relation.conferenceAnnual Conference of the Nations of the Americas Chapter of the Association for Computational Linguisticsen_US
dc.description.validate202507 bcwhen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera3877-
dc.identifier.SubFormID51501-
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
2025.naacl-long.167.pdf5.79 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.