Graph-based Techniques for Topic Classification of Tweets in Spanish

Cordobés de la Calle, Héctor; Fernández Anta, Antonio; Chiroque, Luis F.; Pérez, Fernando; Redondo, Teófilo; Santos, Agustín

doi:DOI:10.9781/ijimai.2014.254

dc.contributor.author	Cordobés de la Calle, Héctor
dc.contributor.author	Fernández Anta, Antonio
dc.contributor.author	Chiroque, Luis F.
dc.contributor.author	Pérez, Fernando
dc.contributor.author	Redondo, Teófilo
dc.contributor.author	Santos, Agustín
dc.date.accessioned	2021-07-13T10:08:52Z
dc.date.available	2021-07-13T10:08:52Z
dc.date.issued	2014-03
dc.identifier.citation	References [1] Aseervatham, Sujeevan. 2007. Apprentissage à base de Noyaux Sémantiques pour le traitement de données textuelles. Ph.D. thesis, Université Paris-Nord-Paris XIII. [2] Blanco, Roi and Christina Lioma. 2012. Graph-based term weighting for information retrieval. Information retrieval, 15(1):54-92. [3] Brin, Sergey and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107-117, April. [4] Dadachev, Boris, Alexander Balinsky, Helen Balinsky, and Steven Simske. 2012. On the helmholtz principle for data mining. In Emerging Security Technologies (EST), 2012 Third International Conference on, pages 99-102. IEEE. [5] Fernández Anta, Antonio, Luis Núñez Chiroque, Philippe Morere, and Agustín Santos. 2013. Sentiment analysis and topic detection of Spanish tweets: A comparative study of of NLP techniques. Procesamiento del Lenguaje Natural, 50:45-52. [6] Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10-18. [7] Hassan, Samer, Rada Mihalcea, and Carmen Banea. 2007. Random walk term weighting for improved text classification. International Journal of Semantic Computing, 1(04):421-439. [8] Kleinberg, Jon M. 1999. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604-632, September. [9] Lewis, David D. 1997. Reuters-21578 text categorization test collection. [10] Mihalcea, R. and P. Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, July. [11] Nagao, Makoto and Shinsuke Mori. 1994. A new method of n-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese. In Proceedings of the 15th conference on Computational Linguistics, COLING 1994, Volume 1, pages 611-615. Association for Computational Linguistics. [12] Padró, Lluís, Samuel Reese, Eneko Agirre, and Aitor Soroa. 2010. Semantic services in freeling 2.1: Wordnet and ukb. In Principles, Construction, and Application of Multilingual Wordnets, pages 99-105, Pushpak Bhattacharyya, Christiane Fellbaum, and Piek Vossen, editors, Mumbai, India, February. Global Wordnet Conference 2010, Narosa Publishing House. [13] Porta, Jordi and José Luis Sancho. 2013. Word normalization in twitter using finite-state transducers. Proc. of the Tweet Normalization Workshop at SEPLN 2013. IV Congreso Espa nol de Informática. [14] Salton, Gerard and Michael J McGill. 1983. Introduction to moderm information retrieval. [15] Shuyo, Nakatani. 2010. Language detection library for java. http://code.google.com/p/language-detection/. [16] Thakkar, Khushboo S, Rajiv V Dharaskar, and MB Chandak. 2010. Graph-based algorithms for text summarization. In Emerging Trends in Engineering and Technology (ICETET), 2010 3rd International Conference on, pages 516-519. IEEE. [17] Vilares, David, Miguel A. Alonso, and Carlos Gómez-Rodríguez. 2013. Una aproximación supervisada para la minería de opiniones sobre tuits en español en base a conocimiento lingüístico. Procesamiento del Lenguaje Natural, 51:127-134.
dc.identifier.issn	ISSN 1989 - 1660
dc.identifier.uri	http://hdl.handle.net/20.500.12761/1287
dc.description.abstract	Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag-of-words approach, in which the classifier uses (and is trained with) selected terms from the input texts. In this work we present techniques based on graph similarity to classify short texts by topic. In our classifier we build graphs from the input texts, and then use properties of these graphs to classify them. We have tested the resulting algorithm by classifying Twitter messages in Spanish among a predefined set of topics, achieving more than 70% accuracy.
dc.language.iso	eng
dc.publisher	IMAI Research Group
dc.title	Graph-based Techniques for Topic Classification of Tweets in Spanish	en
dc.type	journal article
dc.journal.title	IJIMAI International Journal of Interactive Multimedia and Artificial Intelligence (Special issue: AI Techniques to Evaluate Economics and Happiness)
dc.type.hasVersion	VoR
dc.rights.accessRights	open access
dc.volume.number	2
dc.issue.number	5
dc.identifier.doi	DOI:10.9781/ijimai.2014.254
dc.page.final	37
dc.page.initial	31
dc.subject.keyword	Classification
dc.subject.keyword	Graphs
dc.subject.keyword	Happiness
dc.subject.keyword	NLP
dc.subject.keyword	Text Classification
dc.subject.keyword	Topic Classification
dc.description.status	pub
dc.eprint.id	http://eprints.networks.imdea.org/id/eprint/723

Ficheros en el ítem

Nombre:: ijimai20142_5_4_pdf_17528.pdf
Tamaño:: 585.1Kb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

IMDEA Networks

Mostrar el registro sencillo del ítem