Big Data 101

  • 7 May 2021
  • 3 Mins Read
  • 〜 by Acha Ouma

The term ‘big data’ refers to a collection of data that is large and complex. It is data that grows drastically with time and is difficult to process or store using the traditional data management systems (relational database and data warehouses). Although the storage of large sects of information can be traced to earlier decades, the term ‘big data’ was first mentioned in late 1990s and only gained momentum in the early 2000s. 

Dimensions of Data

In 2001, Doug Laney, an analyst at Meta Group coined what is now known as the three dimensions of big data. That is, volume, velocity and variety (the 3Vs) that have become the generally acceptable definitions and characteristics of big data. 

Over the years, the volume and speed of data generated has become immeasurable and beyond the intellectual capacity of human beings.  For instance, in 2010, International Data Corporation (IDC) estimated that 1.2 zettabytes (ZB) of new data were generated. In 2018, it was estimated that new data generated would grow from 33 ZB to 175 ZB in 2025. 

Currently, more than 5 billion consumers interact with data on their day to day lives. IDC estimates that by 2025, this will grow to about 75% of the world’s population, creating over 90ZB of data in 2025. At this point, it is estimated that each person will have at least a data interaction every 18 seconds. This is attributed to devices that connect wirelessly to a network including computer devices and wireless sensors among others.

There are three types of big data. That is Structured, unstructured and semi-structured. In structured, any data can be stored, processed and accessed in a fixed format. Unstructured data is huge and possesses a number of challenges in processing it. The semi-structured data can contain both the structured and unstructured forms of data.

Data & The 3Vs

Volume

Data in organizations is collected from various sources such as business transactions, smart devices, social media and industrial equipment among others. The size of the data is enormous, hence the term ‘big data’. 

Size of the data plays an important role in determining the significance and potential insight of the data as well as whether it is big data or not. In the past, storage of this kind of data would have been challenging. It is however easier to store this data with new technologies such as Data Lakes that have eased the burden.

Velocity

The speed of data from various sources is fast and unprecedented. It is massive and continuous. It must also be dealt with in a timely fashion, almost in real-time in order to cater for the demands and challenges.  How fast the data is streamed and processed, determines the potential of the data.

Variety

This refers to the diverse sources and nature of data from structured – numeric data to unstructured – email texts, financial transactions and videos. The diverse nature of the unstructured data pose challenges for storage, mining and analyzing. 

Basically, the data comes in all types and formats. While earlier technologies had the capacity to deal with structured data efficiently, the change of structured data to semi-structured and unstructured present a challenge to the earlier technologies. The modern big data technologies are thus developed and advanced to handle –store, analyze and process the data that is generated at a higher velocity, volume and that is in all types and formats.  

Other Vs

Veracity – this describes the value of the data. As the data originates from different sources, it is a challenge to link and match the data across systems. Businesses dealing with Big data need to link and show a correlation otherwise, the data may run out of control.

 Variability – this describes the changing nature of the data. It is unpredictable and changes often and is sometimes inconsistent. Businesses need to know what is trending on social media and how they can manage daily, seasonal or event triggered trending matters that are characterized with high peak data loads.

Importance of Big Data

The importance of big data is not pegged on how much data one has but what they do with it. Analysing and processing big data may lead to finding solutions that lead to cost and time reduction, product development and enhanced offerings. A business that processes big data effectively may leap the following benefits:

  • Detecting fraudulent behaviour
  • Improving customer service
  • Better operational efficiency
  • Identification of potential risks and determining the root cause of failures and issues.
  • Better decision making

Although big data has its good side, it poses some challenges in representativeness, harmonization, generalizability and data overload among others.

The Point of Big Data

In essence, it’s not the amount of data that a business has that matters but what the organization does with the data. Big data can be analyzed and processed for insights that lead to efficient and effective strategic moves.  The ability to harness big data is a key determinant of how successful a business will be in the digital era.