Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113613
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorGuo, Hen_US
dc.creatorChan, YHen_US
dc.creatorLaw, NFen_US
dc.date.accessioned2025-06-16T00:36:48Z-
dc.date.available2025-06-16T00:36:48Z-
dc.identifier.isbn979-8-3503-6733-1en_US
dc.identifier.urihttp://hdl.handle.net/10397/113613-
dc.description2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 3-6 Dec. 2024, Macau, Chinaen_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.titleDeep learning-based intraoperative video analysis for cataract surgery instrument identificationen_US
dc.typeConference Paperen_US
dc.identifier.spage1en_US
dc.identifier.epage7en_US
dc.identifier.doi10.1109/APSIPAASC63619.2025.10848777en_US
dcterms.abstractSurgical instrument detection and classification is a critical task for enhancing surgical procedures monitoring, assisting surgical operations, supporting medical education, and enabling the development of intelligent surgical systems. However, there are a few challenges in this domain. The foremost concern is the impact of varying background conditions. Additionally, class imbalance presents another challenge, potentially leading to biased classification results. To solve these challenges, this study proposes a deep learning-based system consisting of two key components: an attention region detection module and a ResNet50 classification model. The attention region detection employs an optical flow-based method to incorporate both temporal and spatial information from the surgical video so that critical attention regions covering surgical instruments are identified. Our experimental results show that the classification accuracy can be improved from 58.7% to 81.9% by using the attention region detection component. To deal with the challenge of class imbalance, we use focal loss and interleaved sampling strategy as solutions. Interleaved sampling uses both the spatial and temporal information of surgical videos to balance the number of samples across different instrument classes, through which some scarce surgical instrument classes are expanded, thus preventing biased learning of the model. And the validation accuracy on the balanced dataset achieves 87.1%. This study demonstrates the effectiveness of deep learning techniques in addressing challenges in cataract surgery video analysis.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitation2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Macau, Macao, 2024, p. 1-7, https://doi.org/10.1109/APSIPAASC63619.2025.10848777en_US
dcterms.issued2024-
dc.relation.conferenceAsia-Pacific Signal and Information Processing Association Annual Summit and Conference [APSIPA ASC]en_US
dc.description.validate202506 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera3693, a3693-
dc.identifier.SubFormID50742, 50742-
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
Guo_Deep_Learning-based_Intraoperative.pdfPre-Published version2.63 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.