Research: A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)
Applicability: “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” offers unique insight into the temporal dynamics of term distribution which may hold implications the design of search systems. As the growing importance of real-time search brings with it several information retrieval challenges; this paper frames one such challenge, that of rapid changes to term distributions, particularly for queries.
Abstract: The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon “churn”. Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, machine-learned or otherwise, depend on term statistics such as term frequency, document frequency, as well as query frequencies. In the real-time context, how do we compute these statistics, considering that the underlying distributions change rapidly? In this paper, we present an analysis of tweet and query churn on Twitter, as a first step to answering this question. Analyses reveal interesting insights on the temporal dynamics of term distributions on Twitter and hold implications for the design of search systems.
Analysis: Summarized analysis from this paper includes observations on:
Authors: Prepared by Jimmy Lin and Gilad Misne of Twitter, Inc., “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” is a prepared paper submitted and accepted by the 6th International AAAI Conference on Weblogs and Social Media (ICWSM 2012).
This entry was posted on Tuesday, June 5th, 2012 at 2:39 pm. It is filed under chronology, discover and tagged with research, social media. You can follow any responses to this entry through the RSS 2.0 feed.
Comments are closed.
As part of the eDiscovery process, legal professionals typically use a combination of talent, technology, and techniques to do tasks ranging from preservation to production of electronically stored information. Many of the most often used technologies in this process come in the form of eDiscovery software.
Taken from a combination of public market sizing estimations as shared in leading electronic discovery reports, publications and posts over time, the following eDiscovery Market Size Mashup shares general worldwide market sizing considerations for software in the electronic discovery market for the years between 2014 and 2019.
Emerging technologies often represent new and innovative approaches to solving difficult problems. They also may have a significantly positive impact on the time, money, and resources required to complete previously daunting tasks. Yet until emerging technologies are effectively commercialized, they may offer users as much peril as promise.
Daily we read, see, and hear more and more about the tension corporate legal departments face as they decide how to source technology and talent for their eDiscovery efforts. Balancing cost, time, and complexity is a continual challenge and what is the right balance today may be out of balance tomorrow. This week our cartoon and clip provides one look at the impact of technology on outsourcing (cartoon), and shares considerations for right sourcing eDiscovery (clip).
Since the advent of Technology Assisted Review (aka TAR, predictive coding or computer-assisted review), one of the open questions is whether you have to run a separate TAR process for each item in a document request. As litigation professionals know, it is rare to have only one numbered request in a Rule 34 pleading. Rather, you can expect to see scores of requests (typically as many as the local rules allow).
ComplexDiscovery | Creative Commons Attribution 4.0 International