Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/114426
| Title: | A study on explainable end-to-end autonomous driving | Authors: | Feng, Yuchao | Degree: | Ph.D. | Issue Date: | 2025 | Abstract: | In recent years, end-to-end networks have emerged as a promising approach to achieving advanced autonomous driving in self-driving vehicles. Unlike modular pipelines, which divide autonomous driving into separate modules, this approach learns to drive by directly mapping raw sensory data to driving decisions (or control outputs). Compared to modular systems, end-to-end networks can avoid the accumulation of errors across different modules and are more scalable to complex scenarios. Despite these advantages, a major limitation of this approach is its lack of explainability. The outputs of end-to-end networks are generally not interpretable, making it difficult to understand why a specific input produces a given output. This limitation raises significant concerns about the safety and reliability of such systems, hindering their broader application and acceptance in real-world traffic environments. Within this context, this study develops three methods to enhance the explainability of end-to-end autonomous driving networks. First, natural-language explanations are proposed to improve explainability. A novel explainable network, named the Natural-Language Explanation for Decision Making (NLE-DM), is designed to jointly predict driving decisions and natural-language explanations. While natural-language explanations serve as an effective way to explain driving decisions, they often fall short of revealing the internal processes of the network. In contrast, visual explanations can provide insights into the network's inner workings. Therefore, to further enhance explainability, we propose combining natural-language and visual explanations as a multimodal approach. An explainable end-to-end network, named Multimodal Explainable Autonomous Driving (Multimodal-XAD), is designed to jointly predict driving decisions and multimodal environment descriptions. Finally, we revisit the concept of visual explanations and introduce an innovative Bird's-Eye-View (BEV) perception method, named PolarPoint-BEV. This method leverages a polar coordinate-based approach to better illustrate how the network perceives spatial relationships in the driving environment. The three methods proposed in this study not only enhance the explainability of end-to-end networks but also address distinct key scientific problems in autonomous driving. For NLE-DM, the effect of natural-language explanations on driving decision prediction performance is investigated. The results demonstrate that the existence of natural-language explanations improves the accuracy of driving decision predictions. For Multimodal-XAD, the issue of error accumulation in downstream tasks of vision-based BEV perception is addressed by incorporating both context and local information before predicting driving decisions and environment descriptions. Experimental results show that combining context and local information enhances the prediction performance of both tasks. For PolarPoint-BEV, the limitations of traditional BEV maps are identified and effectively addressed. Specifically, traditional BEV maps treat all regions equally, risking oversight of critical safety details, and use dense grids, resulting in high computational costs. To overcome these limitations, PolarPoint-BEV prioritizes regions closer to the ego vehicle, ensuring greater attention is given to critical areas while providing a more lightweight representation due to its sparse structure. To evaluate the impact of PolarPoint-BEV on explainability and driving performance, a multi-task end-to-end driving network, XPlan, is proposed to jointly predict control commands and polar point BEV maps. |
Subjects: | Self-driving cars Automated vehicles Hong Kong Polytechnic University -- Dissertations |
Pages: | xix, 128 pages : color illustrations |
| Appears in Collections: | Thesis |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/13711
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


