Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/84481
Title: Cross domain data analytics for urban computing
Authors: Wang, Yuqi
Degree: Ph.D.
Issue Date: 2018
Abstract: With the rapid development of information technologies, we are entering the era of big data. Large amount of data in urban spaces are collected from various domains such as transportation, logistics, Point of Interests (POI), etc. The data reflect different aspects of cities in various ways, offering great opportunities for better understanding of the city's operation, and optimization of the infrastructure. Effective data analytics is the key to unlock the power of these big data. Although previous works mostly focus on data from single domain, Cross Domain Data Analytics is attracting increasing attention and lies at the core of many urban problems and applications. Cross domain data analytics offers two additional opportunities than traditional single domain data analytics. First, it provides a more comprehensive picture about the studied problems based on the information from different angles, which helps gain new insights by discovering the correlations among cross-domain datasets. Second, it improves decision making by complementing data sources for joint analysis, especially for the cases where data are insufficient in some domains. Meanwhile, urban computing aims at utilizing urban big data, typically from different domains, to facilitate important urban operations such as traffic management, energy reduction and so on. In this way, urban computing offers a perfect application scenario for cross domain data analytics. Thus, this thesis focuses on Cross Domain Data Analytics for Urban Computing, studies the problem of jointly analyzing data from different domains to generate hidden insights and enable intelligent decision-making, and proposes effective solutions to three important applications in urban computing for demonstration. First, we study the problem of traffic congestion, and show how to jointly utilize data from three domains, namely GPS trajectories, road network and POI data to generate insights. Previous work mainly focuses on the prediction of congestion and analysis of traffic flows, while the congestion correlation between road segments has not been studied yet. In this work, we propose a three-phase framework to explore the congestion correlation between road segments from multiple real world data. In the first phase, we extract congestion information on each road segment from GPS trajectories of over 10,000 taxis, define congestion correlation and propose a corresponding mining algorithm to find out all the existing correlations. In the second phase, we extract various features on each pair of road segments from road network and POI data. In the last phase, the results of the first two phases are input into several classifiers to predict congestion correlation. We further analyze the important features and evaluate the results of the trained classifiers through experiments. We found some important patterns that lead to a high/low congestion correlation, and they can facilitate building various transportation applications. In addition, we found that traffic congestion correlation has obvious directionality and transmissibility.
Second, we study the problem of order response time prediction to enable intelligent decision-making in logistics services by jointly considering both order historical records and driver GPS trajectories from two different domains. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but also pave ways for applications such as supply-demand analysis and driver scheduling, leading to high system efficiency. In this work, we forecast order response time on current day by fusing data from order history and driver historical locations. Specifically, we propose Coupled Sparse Matrix Factorization (CSMF) to deal with the heterogeneous fusion and data sparsity challenges raised in this problem. CSMF jointly learns from multiple heterogeneous sparse data through the proposed weight setting mechanism therein. Experiments on real-world datasets demonstrate the effectiveness of our approach, compared to various baseline methods. The performances of many variants of the proposed method are also presented to show the effectiveness of each component. Third, we extend the previous method to incorporate more context information by proposing a Coupled Weighted Tensor-matrix Factorization (CWTF) for accurate prediction on order accepting probabilities of van drivers, which would facilitate efficient order dispatching and improve user experience. However, it is difficult to handle the inherent heterogeneous data fusion, sparsity and efficiency challenges simultaneously. In this work, we propose a three-stage framework with a Coupled Weighted Tensor-matrix Factorization method for order accepting probability prediction in logistics services. Specifically, orders are first grouped into clusters to enrich the sparse interactions between orders and drivers; then an accepting probability tensor with the three dimensions of driver, order cluster, and time is generated by a tensor-matrix factorization method that fuses order characteristics and driver behaviors in an efficient way; finally given a new order, the accepting probability of each driver is efficiently predicted by directly retrieving from the learned tensor. The experiment results on a large dataset from a famous app-based logistics platform, demonstrate the superiority of the proposed method against various baseline methods.
Subjects: Hong Kong Polytechnic University -- Dissertations
Big data
Data mining
Pages: xviii, 119 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

50
Last Week
0
Last month
Citations as of Apr 14, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.