Fri. Apr 19th, 2024
ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.
 

With the increasing pulse rate of articles, investments, and activities that feature technologies supporting information governance, auto-classification, and textual analytics, it is important to understand some of the similarities and differences between structured and unstructured data. The following short extract highlights several similarities and differences in these types of data and also shares a reference link to additional considerations and approaches for classifying data.

Extract from an article by Bill Inmon of Forrest Rim Technology

There are at least two schools of thought that are very different about what constitutes the meaning of what is and what is not structured data. One school of thought, as stated previously, is that everything not in a standard DBMS is unstructured. Another definition is that something is unstructured only if there is not a rational way to explain the structure. These are two very different interpretations of what is meant by unstructured. And both viewpoints are perfectly rational and valid. However, they are in conflict with each other. These are just two viewpoints, and there are undoubtedly others on what constitutes the meaning of structured and unstructured data.

Based on some recent research, another less-confusing way for classifying data exists. That classification involves looking at the repetition of data occurrences. Data that occurs frequently, repetitive data, is data in a record that appears very similar to data in every other record. The records are similar in terms of size and structure, and in many cases, even their content is the same. Examples of repetitive data—and there are many—include metering data; click-stream data; telephone call records data, such as time of call, the caller’s telephone number, and the call’s length; analog data; and so on.

The converse of repetitive data, nonrepetitive data, is data in which each occurrence is unique in terms of content—that is, each nonrepetitive record is different from the others. Any similarity of record content, size, or structure that may exist among nonrepetitive data is strictly a matter of chance. There are many different forms of nonrepetitive data, and examples include emails, call center conversations, corporate contracts, warranty claims, insurance claims, and so on.

The many distinctions between repetitive and nonrepetitive data are important. But perhaps the most important distinction is the pattern of business value. Many occurrences of repetitive data in which only a few records are of real business value fall into a typical situation category.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.