Journal of Engineering and Applied Sciences

Year: 2018
Volume: 13
Issue: 9
Page No. 2722 - 2731

Deep Data Mining: An Approach to Convert Semistructure Data to Structured Data Using Improved Tree Matching

Authors : K. Jayamalini and M. Ponnavaikko

Abstract: Large amount of web data available on web pages are used by various advanced applications now a days. Web data can be obtained, when we search through query interface provided on the website. The information retrieved as result of query is called ‘deep data’ or ‘hidden data’. These data are enwrapped in HTML pages in the form of data records. These dynamically generated web pages are special pages called Search Engine Resultant Pages (SERPs). SERPs are presented to users in the form of web documents along with other content. These web pages contain enormous amount of data and are virtual gold mine for business if they mined effectively. Web wrappers or Web Data Extraction Systems (WDES) are software applications used to interact with web sources like web pages and extract information stored in it. The data extracted from the web source are converted into structured format and stored in database for future usage. This study explains the development of a web wrapper which takes SERs as input and extracts deep data stored in it and converts them into structured format. Secondly, this study explains the steps used for wrapper generation. Thirdly, this paper explains the implementation of a tree structure algorithm called ‘Improved Tree Matching (ITM)’ algorithm, used for data extraction. Towards the end of this study, explained how the extracted data is stored in database. Implementation results show that compared to other existing algorithms ITM works with less complexity.

How to cite this article:

K. Jayamalini and M. Ponnavaikko, 2018. Deep Data Mining: An Approach to Convert Semistructure Data to Structured Data Using Improved Tree Matching. Journal of Engineering and Applied Sciences, 13: 2722-2731.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved