Research: A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)
Applicability: “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” offers unique insight into the temporal dynamics of term distribution which may hold implications the design of search systems. As the growing importance of real-time search brings with it several information retrieval challenges; this paper frames one such challenge, that of rapid changes to term distributions, particularly for queries.
Abstract: The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon “churn”. Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, machine-learned or otherwise, depend on term statistics such as term frequency, document frequency, as well as query frequencies. In the real-time context, how do we compute these statistics, considering that the underlying distributions change rapidly? In this paper, we present an analysis of tweet and query churn on Twitter, as a first step to answering this question. Analyses reveal interesting insights on the temporal dynamics of term distributions on Twitter and hold implications for the design of search systems.
Analysis: Summarized analysis from this paper includes observations on:
Authors: Prepared by Jimmy Lin and Gilad Misne of Twitter, Inc., “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” is a prepared paper submitted and accepted by the 6th International AAAI Conference on Weblogs and Social Media (ICWSM 2012).
This entry was posted on Tuesday, June 5th, 2012 at 2:39 pm. It is filed under chronology, discover and tagged with research, social media. You can follow any responses to this entry through the RSS 2.0 feed.
Comments are closed.
Taken from a combination of public market sizing estimations as shared in leading electronic discovery reports, publications and posts over time, the following eDiscovery Market Size Mashup shares general worldwide market sizing considerations for both the software and service areas of the electronic discovery market for the years between 2013 and 2018.
Provided as a non-comprehensive overview of key and publicly announced eDiscovery related mergers, acquisitions and investments to date in 2014, the following listing highlights key industry activities through the lens of announcement date, acquired company, acquiring or investing company and acquisition amount (if known).
Cormack and Grossman set up an ingenious experiment to test the effectiveness of three machine learning protocols. It is ingenious for several reasons, not the least of which is that they created what they call an “evaluation toolkit” to perform the experiment. They have even made this same toolkit, this same software, freely available for use by any other qualified researchers. They invite other scientists to run the experiment for themselves. They invite open testing of their experiment. They invite vendors to do so too, but so far there have been no takers.
I want to talk about an issue that is attracting attention at the moment: how to select documents for training a predictive coding system. The catalyst for this current interest is “Evaluation of Machine Learning Protocols for Technology Assisted Review in Electronic Discovery”, recently presented at SIGIR by Gord Cormack and Maura Grossman.
Maura Grossman and Gordon Cormack just released another blockbuster article, “Comments on ‘The Implications of Rule 26(g) on the Use of Technology-Assisted Review,’” 7 Federal Courts Law Review 286 (2014). The article was in part a response to an earlier article in the same journal by Karl Schieneman and Thomas Gricks, in which they asserted that Rule 26(g) imposes “unique obligations” on parties using TAR for document productions and suggested using techniques we associate with TAR 1.0.