Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117478
PIRA download icon_1.1View/Download Full Text
Title: Cross-lingual keyword extraction for pesticide terminology in Brazilian Portuguese and English
Authors: de Souza, JV
Amamou, H
Chen, R 
Salari, E
Gubelmann, R
Niklaus, C
Serpa, T
de Freitas Lima, MM
Pinto, PT
Kshirsagar, S
Davoust, A
Handschuh, S
Avila, AR
Issue Date: 17-Jan-2025
Source: Journal of the Brazilian Computer Society, 17 Jan. 2025, v. 31, no. 1, p. 972-989
Abstract: Agriculture plays a crucial role in Brazil's economy. As the country intensifies its activities in the sector, the use of pesticides also increases. Hence, the risks associated with pesticide-laden food consumption have become a concern for chemistry researchers. An issue affecting regulatory standardization of pesticides in Brazil is the difficulty in translating pesticide names, particularly from English. For example, the word malathion can be translated from English to Portuguese as malatiom or malatião, resulting in inconsistent labeling. This issue extends to the broader problem of translating highly technical terms between languages, in particular for low-resource languages. In this work, we investigate terminological variation in the chemistry of organophosphorus pesticides. Our goal is to study strategies for domain-specific multilingual keyword extraction. To that end, two corpora were built based on pesticide-related scientific documents in Brazilian Portuguese and English, which led to a total of 84 and 210 texts, respectively, representing the low- and high-resource languages in this study. We then assessed 6 methods for keyword extraction: Simple Maths, TF-IDF, YAKE, TextRank, MultipartiteRank, and KeyBERT. We relied on a multilingual contextual BERT embedding to retrieve corresponding pesticide names in the target language. Fine-tuning was also explored to improve the multilingual representation further. Moreover, we evaluated the use of large language models (LLMs) combined with the recent retrieval-augmented generation (RAG) framework. As a result, we found that the contextual approach, combined with fine-tuning, provided the best results, contributing to enhancing Pesticide Terminology Extraction in a multilingual scenario.
Keywords: BERT embeddings
Multilingual extraction
Pesticides
Word alignment
Publisher: SpringerOpen
Journal: Journal of the Brazilian Computer Society 
ISSN: 0104-6500
EISSN: 1678-4804
DOI: 10.5753/jbcs.2025.5815
Rights: Copyright (c) 2025 José Victor de Souza, Hazem Amamou, Rubing Chen, Elmira Salari, Reto Gubelmann, Christina Niklaus, Talita Serpa, Marcela Marques de Freitas Lima, Paula Tavares Pinto, Shruti Kshirsagar, Alan Davoust, Siegfried Handschuh, Anderson Raymundo Avila
This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
The following publication de Souza, J. V., Amamou, H., Chen, R., Salari, E., Gubelmann, R., Niklaus, C., Serpa, T., Lima, M. M. de F., Pinto, P. T., Kshirsagar, S., Davoust, A., Handschuh, S., & Avila, A. R. (2025). Cross-Lingual Keyword Extraction for Pesticide Terminology in Brazilian Portuguese and English. Journal of the Brazilian Computer Society, 31(1), 972-989 is available at https://doi.org/10.5753/jbcs.2025.5815.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
5815-Article Text-32536-1-10-20251009.pdf1.08 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.