Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/4934
Title: Design of scalable and efficient information retrieval systems
Authors: Sun, Xiaocui
Keywords: Information storage and retrieval systems -- Design.
Information retrieval.
Ethernet (Local area network system)
Hong Kong Polytechnic University -- Dissertations
Issue Date: 2011
Publisher: The Hong Kong Polytechnic University
Abstract: As the fast growing of the Internet, scalability and efficiency become the critical issues for future Internet design. Caching is an effective way to improve the scalability and efficiency for information retrieval systems. As the media applications such as YouTube and Facebook explosively increase, video caching has gained significant attention recently. By caching video chunks at the proxy servers close to end users, the server bandwidth requirement can be significantly reduced, and hence it can greatly improve the system performance. Ethernet is a widely used layer 2 technology for information retrieval system. Re-cently, Ethernet has gained popularity to be deployed in Metropolitan Area Networks (MANs). Metro Ethernet is highly attractive mainly due to its cost effectiveness, minimal management and maintenance, easy interoperability and convenience. Clearly, deploying the same technology, such as Ethernet, in both MAN and Local Area Networks (LAN) segments can potentially reduce the complexity and cost in network design and management, and hence improve packet forwarding performance. Even though Ethernet has many advantages, to be deployed in MANs, it has to solve the scalability issues. Ethernet has poor scalability due to using a flat addressing scheme (i.e., non-hierarchical Media Access Control (MAC) addresses) and broadcasting based address resolution scheme. In a MAN composing of a large number of LAN segments, a node in the provider network needs to keep a potential large number of MAC-to-port-mapping entries for frame forwarding. This may either cause MAC learning table explosion or excessive frame flooding, thus significantly degrading the system performance. Moreover, a broadcast-based address resolution scheme in Metro Ethernet introduces a large amount of broadcast messages into the provider network. Many protocols such as Address Resolution Protocol (ARP) and Dynamic Host Configuration Protocol (DHCP) use the broadcast service as a service discovery mechanism. The broadcast based address resolution schemes make Ethernet extremely convenient and easily accomplished. However, for MANs with millions of end users, high frequency broadcast messages waste a lot of bandwidth for address resolution. Frequently broadcast frames also accelerate the replacement of the table entries in Ethernet switches, which may lead to forwarding table explosion and hence trigger excessive frame flooding. Moreover, every end user has to take resource to handle every broadcast message.
This thesis aims at developing efficient and scalable information retrieval systems: to achieve this goal, an optimized cache replacement/group algorithm for video system has been designed to improve the scalability of the video system; To make Metro Ethernet scalable and efficient, an enabled Cache effect on Forwarding Table (CFT) scheme, an End user enabled Mac-in-Mac (EMiM) encapsulation scheme and two distributed Registration based address resolution Protocol (RP) are proposed in this thesis. Firstly, an Optimized Cache Replacement (OCR) scheme for video streaming is designed. OCR groups the users into different cache groups based on their request patterns. It calculates the user density among all possible intervals, and then selects the maximized user density group to cache. Based on the optimized scheme in a single cache, we also extend OCR for cooperative cache. The simulation results show that OCR can increase the hit ratio and reduce the server load. Secondly, we describe the CFT scheme in Metro Ethernet. CFT learns the IP and MAC mapping pair in a frame and eliminates the subsequent broadcast frames asking for this mapping. To further reduce the forwarding table size of Provider Edge (PE) node, CFT can be cooperated with EMiM. The forwarding table entries in Customer Edge (CE) node and PE node in this scheme are modified to learn both the IP and MAC addresses. By receiving a frame, the CE and PE nodes cache the IP-MAC address mapping carried in the frame. Once the mapping is recorded, it can be served for its subsequent requests by searching it in the cache. Hence, the CE and PE nodes not only response for forwarding but also answer the ARP request. The proposed architecture is easy to be accomplished and fully backward compatible. The simulation results show that the proposed scheme can decrease both the communication messages for address resolution and forwarding table size in PE nodes. Thirdly, an EMiM encapsulation scheme is proposed. In the proposed scheme, a user's MAC address as well as its PE node's MAC address are associated with its ARP entry. The modified ARP entry allows an end user to do Mac-in-Mac (MiM) encapsulation directly by adding the destination user's MAC address and PE node's MAC address in the frame when it initiates a session. Hence a PE node does not need to maintain the entries of mapping end user's MAC address to PE node's MAC address, thus significantly reducing the forwarding table size. The simulation results show that the proposed scheme could provide high scalability by reducing up to 65% maximum forwarding table size in PE nodes. Finally, we propose two RPs. In a RP, multiple ARP registers are allocated to support address resolution. Each IP address has a home register which stores its ARP entry. When an end user moves to another location but keeps its IP address, its current PE or CE node is considered to be its foreign register. A foreign register temporally caches the ARP entry for an immigrated user and is in charge of the ARP entry updating in the home register. The IP address is used as an index to locate the corresponding home register through unicast, thus eliminating the broadcast to solve an unknown address. The proposed schemes can save more than 60% messages for address resolution and reduces up to 80% forwarding table size in PE nodes.
Description: xx, 153 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2011 Sun
URI: http://hdl.handle.net/10397/4934
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b24625656_link.htmFor PolyU Users162 BHTMLView/Open
b24625656_ir.pdfFor All Users (Non-printable) 2.81 MBAdobe PDFView/Open
Show full item record

Page view(s)

463
Last Week
5
Last month
Checked on Oct 15, 2017

Download(s)

396
Checked on Oct 15, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.