Journal of Engineering and Applied Sciences

Year: 2017

Volume: 12

Issue: 6 SI

Page No. 7771 - 7775

DOI: 10.36478/jeasci.2017.7771.7775

Sampling Assortment Approach for Huge Range Deduplication for Web Data Exploration

Authors : R. Lavanya and Harika Rallapalli

References

Arasu, A., C. Re and D. Suciu, 2009. Large-scale deduplication with constraints using dedupalog. Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE'09), March 29- April 2, 2009, IEEE, Shanghai, China, ISBN:978-1-4244-3422-0, pp: 952-963.

Arasu, A., M. Gotz and R. Kaushik, 2010. On active learning of record matching packages. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, June 6-10, 2010, ACM, Indianapolis, Indiana, USA., ISBN:978-1-4503-0032-2, pp: 783-794.

Bayardo, J.R., Y. Ma and R. Srikant, 2007. Scaling up all pairs similarity search. Proceedings of the 16th International Conference on World Wide Web. Banff, Alberta, Canada, May 8-12, ACM Press, New York, pp: 131-140.

Bellare, K., S. Iyengar, A.G. Parameswaran and V. Rastogi, 2012. Active sampling for entity matching. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12-16, 2012, ACM, Beijing, China, ISBN:978-1-4503-1462-6, pp: 1131-1139.

Beygelzimer, A., S. Dasgupta and J. Langford, 2009. Importance weighted active learning. Proceedings of the 26th Annual International Conference on Machine Learning, June 14-18, 2009, ACM, Montreal, Quebec, Canada, ISBN:978-1-60558-516-1, pp: 49-56.

Bianco, G.D., R. Galante, C.A. Heuser and M.A. Goncalves, 2013. Tuning large scale deduplication with reduced effort. Proceedings of the 25th International Conference on Scientific and Statistical Database Management, July 29-31, 2013, ACM, Baltimore, Maryland, USA., ISBN:978-1-4503-1921-8, pp: 1-12.

Bilenko, M. and R.J. Mooney, 2003. On evaluation and training-set construction for duplicate detection. Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage and Object Consolidation, August 24-27, 2003, ACM, Washington DC., pp: 7-12.

Chaudhuri, S., V. Ganti and R. Kaushik, 2006. A primitive operator for similarity joins in data cleaning. Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), April 3-7, 2006, IEEE, Atlanta, Georgia, USA., pp: 5-5.

Christen, P. and T. Churches, 2002. Febrl-freely extensible biomedical record linkage. MSc Thesis, Department of Computer Science, Australian National University, Canberra, Australia.

Christen, P., 2008. Automatic record linkage using seeded nearest neighbour and support vector machine classification. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2008, ACM, Las Vegas, Nevada, USA., ISBN:978-1-60558-193-4, pp: 151-159.

Christen, P., 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE. Trans. Knowl. Data Eng., 24: 1537-1555.
CrossRef  |  Direct Link  |

Cohn, D., L. Atlas and R. Ladner, 1994. Improving generalization with active learning. Mach. Learn., 15: 201-221.
Direct Link  |

Elmagarmid, A.K., P.G. Ipeirotis and V.S. Verykios, 2007. Duplicate record detection: A survey. IEEE Trans. Knowledge Data Eng., 19: 1-16.
CrossRef  |

Related Links

Journals By Subject

Journal of Engineering and Applied Sciences

Sampling Assortment Approach for Huge Range Deduplication for Web Data Exploration

References