CSEE FRIDAY SEMINAR SERIES FRIDAY NOVEMBER 6, 2009

 

 

Title: A Data Parallel Algorithm for XML DOM Parsing 

Abstract:

The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme ­ each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. In this talk, we will present the design of ParDOM and its evaluation on commodity multicore processors. This work appeared in the Proceedings of the 6th International XML Database Symposium (XSym '2009). 

Biography:

Prof. Praveen Rao is an Assistant Professor of Computer Science and Electrical Engineering at UMKC. His research interests include XML indexing and query processing in centralized and peer-to-peer and cloud environments, indexing for large-scale data centers, parallel XML processing, and graph indexing and query processing. More information about his research can be found at http://r.web.umkc.edu/raopr.

 

Bhavik Shah, is a graduate student at UMKC's School of Computing and Engineering, Department of Computer Science and Electrical Engineering. He earned his Bachelor's degree in Computer Science at The Walchand Institute of Technology in India. Currently, he is pursuing his Masters in Computer Science with his thesis under Prof. Praveen Rao. His research area is XML DOM parsing on multicore processors. He presented a paper titled "A Data Parallel Algorithm for XML DOM Parsing" at the XSym'09 conference in Lyon, France.