Research: A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)
Applicability: “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” offers unique insight into the temporal dynamics of term distribution which may hold implications the design of search systems. As the growing importance of real-time search brings with it several information retrieval challenges; this paper frames one such challenge, that of rapid changes to term distributions, particularly for queries.
Abstract: The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon “churn”. Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, machine-learned or otherwise, depend on term statistics such as term frequency, document frequency, as well as query frequencies. In the real-time context, how do we compute these statistics, considering that the underlying distributions change rapidly? In this paper, we present an analysis of tweet and query churn on Twitter, as a first step to answering this question. Analyses reveal interesting insights on the temporal dynamics of term distributions on Twitter and hold implications for the design of search systems.
Analysis: Summarized analysis from this paper includes observations on:
Authors: Prepared by Jimmy Lin and Gilad Misne of Twitter, Inc., “A Study of “Churn” in Tweets and Real-Time Search Queries (Extended Version)” is a prepared paper submitted and accepted by the 6th International AAAI Conference on Weblogs and Social Media (ICWSM 2012).
This entry was posted on Tuesday, June 5th, 2012 at 2:39 pm. It is filed under chronology, discover and tagged with research, social media. You can follow any responses to this entry through the RSS 2.0 feed.
Comments are closed.
Based on an informal review of research from technology providers, industry analyst firms, and industry expert reports in the data discovery arena, the following short list of enablers highlights companies and technologies that may be useful to technology providers as legal discovery professionals seek to move “to the left of the EDRM” and closer to the point of data creation in their data discovery efforts.
In 2017, the challenge for technology providers in the legal and data discovery spaces appears to be less about defining offering requirements and validating market needs and more about developing and delivering solutions that focus on specific tasks and processes that streamline the discovery of data and the conduct of eDiscovery.
The Victorian Supreme Court will issue a practice note about the use of TAR on 1 January 2017. We understand it will be the first court in Australia to do so. We expect that other Australian courts will follow suit in issuing a practice note, and it will be interesting to follow the approaches taken by other Australian courts.
The biggest takeaway of the joint research project by nonprofit Electronic Discovery Institute and tech giant Oracle Corp. is that TAR is often faster and cheaper when identifying relevant documents. But when it comes to isolating privileged or sensitive information, human reviewers outperformed machines.
ComplexDiscovery | Creative Commons Attribution 4.0 International