{"id":235,"date":"2014-01-12T07:53:01","date_gmt":"2014-01-12T07:53:01","guid":{"rendered":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/?p=235"},"modified":"2015-05-03T10:38:46","modified_gmt":"2015-05-03T10:38:46","slug":"ecir-2014-paper-on-predicting-new-concepts-in-social-streams-online","status":"publish","type":"post","link":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/ecir-2014-paper-on-predicting-new-concepts-in-social-streams-online\/","title":{"rendered":"ECIR 2014 paper on predicting new concepts in social streams online"},"content":{"rendered":"<p>&#8220;Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams\u201d by David Graus, Manos Tsagkias, Lars Buitinck and Maarten de Rijke is\u00a0<a href=\"http:\/\/staff.science.uva.nl\/~mdr\/content\/publications\/ecir2014-fp-ookb.pdf\" rel=\"self\">available<\/a>\u00a0online now.<\/p>\n<p>The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams\u201d by David Graus, Manos Tsagkias, Lars Buitinck and Maarten de Rijke is\u00a0available\u00a0online now. The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/posts\/235"}],"collection":[{"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/comments?post=235"}],"version-history":[{"count":1,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/posts\/235\/revisions"}],"predecessor-version":[{"id":236,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/posts\/235\/revisions\/236"}],"wp:attachment":[{"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/media?parent=235"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/categories?post=235"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staff.fnwi.uva.nl\/m.derijke\/wp-json\/wp\/v2\/tags?post=235"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}