Journal of Engineering and Applied Sciences

Year: 2018
Volume: 13
Issue: 6
Page No. 1499 - 1505

Web Documents Similarity Using K-Shingle Tokens and MinHash Technique

Authors : Mehdi Ebady Manaa and Ghufran Abdulameer

References

Al-Anazi, S., H. AlMahmoud and I. Al-Turaiki, 2016. Finding similar documents using different clustering techniques. Procedia Comput. Sci., 82: 28-34.
Direct Link  |  

Baraglia, R., G.D.F. Morales and C. Lucchese, 2010. Document similarity self-join with MapReduce. Proceedings of the 2010 IEEE 10th International Conference on Data Mining (ICDM), December 13-17, 2010, IEEE, Sydney, NSW, Australia, ISBN:978-1-4244-9131-5, pp: 731-736.

Bhaya, W. and M.E. Manaa, 2014. Review clustering mechanisms of distributed denial of service attacks. J. Comput. Sci., 10: 2037-2046.
Direct Link  |  

Chauhan, S.S. and S. Batra, 2014. Finding similar items using LSH and bloom filter. Proceedings of the International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), May 8-10, 2014, IEEE, Ramanathapuram, India, ISBN:978-1-4799-3915-2, pp: 1662-1666.

Deng, F., S. Siersdorfer and S. Zerr, 2012. Efficient jaccard-based diversity analysis of large document collections. Proceedings of the 21st ACM International Conference on Information and Knowledge Management, October 29-November 02, 2012, ACM, Maui, Hawaii, USA., ISBN:978-1-4503-1156-4, pp: 1402-1411.

Hamilton, H., E. Gurak, L. Findlater and W. Olive, 2013. Computer science 831: Knowledge discovery in databases. Master Thesis, Department of Computer Science, University of Oxford, Oxford, England, UK.

Han, J. and M. Kamber, 2006. Data Mining: Concepts and Techniques. 2nd Edn., Morgan Kaufmann Publisher, San Fransisco, USA., ISBN-13: 978-1558609013, Pages: 800.

Manning, C.D., P. Raghavan and H. Schutze, 2008. An Introduction to Information Retrievel. Cambridge University Press, Cambridge, UK., ISBN:9780521865715, Pages: 482.

Niwattanakul, S., J. Singthongchai, E. Naenudorn and S. Wanapu, 2013. Using of jaccard coefficient for keywords similarity. Proceedings of the International MultiConference on Engineers and Computer Scientists (IMECS 2013) Vol. 1, March 13-15, 2013, MECS Publisher, Hong Kong, ISBN:978-988-19251-8-3, pp: 1-5.

Parziale, L., W. Liu, C. Matthews, N. Rosselot and C. Davis et al., 2006. TCP/IP Tutorial and Technical Overview. 8th Edn., IBM Redbooks, India, Pages: 963.

Peshave, M. and K. Dezhgosha, 2005. How search engines work: And a web crawler application. Ph.D Thesis, University of Illinois Springfield, Springfield, Illinois.

Phan, T.N., M. Jager, S. Nadschlager, J. Kung and T.K. Dang, 2015. An efficient document indexing-based similarity search in large datasets. Proceedings of the 2nd International Conference on Future Data and Security Engineering (FDSE 2015), November 23-25, 2015, Springer, Ho Chi Minh City, Vietnam, pp: 16-31.

Rajaraman, A. and J.D. Ullman, 2011. Mining of Massive Datasets. Cambridge University Press, UK., ISBN-13: 978-1107015357, Pages: 326.

Spasojevic, N. and G. Poncin, 2011. Large scale page-based book similarity clustering. Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), September 18-21, 2011, IEEE, Beijing, China, ISBN:978-1-4577-1350-7, pp: 119-125.

Thella, P.P. and G. Sridevi, 2013.. A novel clustering method for similarity measuring in text documents. Intl. J. Modern Eng. Res., 3: 2823-2826.

Wang, C., Y. Song, H. Li, M. Zhang and J. Han, 2015. Knowsim: A document similarity measure on structured heterogeneous information networks. Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM), November 14-17, 2015, IEEE, Atlantic City, New Jersey, USA., ISBN:978-1-4673-9504-5, pp: 1015-1020.

Zamora, J., M. Mendoza and H. Allende, 2016. Hashing-based clustering in high dimensional data. Expert Syst. Appl., 62: 202-211.
Direct Link  |  

Zhang, Q., H. Ma, W. Qian and A. Zhou, 2013. Duplicate detection for identifying social spam in microblogs. Proceedings of the 2013 IEEE International Congress on Big Data (BigData Congress), June 27-July 2, 2013, IEEE, Santa Clara, California, USA., ISBN:978-1-4799-0182-1, pp: 141-148.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved