What's Big Data?

Unlike many terms in the computer science and engineering world, the term big data is rather self-explanatory. It describes incredibly large data sets. These data sets then become the subject of analytics software that attempts to draw inferences about the data. In other words, they try to find patterns.

The question then becomes, who uses big data, and what is it useful for?


Today, many businesses are using big data as a means of tracking market signals and customer interactions in order to better market their goods and services. While not every data scientist will agree on what particularly constitutes relevant data, most believe that the more you have, the better your analysis is.


So big data channels tend to necessarily include information gleaned from search engines, social media, and web pages, but also real-world signals like product purchases, point-of-sale information, and financial records.


How is Big Data Collected?


There are two key players in the process of organizing and interpreting big data. Those are data engineers and data scientists. The major difference between data engineering and data science is that data engineers are responsible for creating the pipelines to gather and collect large amounts of data turning it into a format that data scientists can use.


Data scientists are responsible for developing algorithms that organize, interpret, analyze, and visualize big data.


Big Data Analogy


A good analogy for understanding this dynamic is to imagine a hoarder’s home. A hoarder has huge piles of stuff and it’s all over the place. There are piles of clothing in the kitchen. There’s dinnerware in the bathroom. It’s all one great big mess.


The job of the data engineer is to train the hoarder on how to distribute their stuff to the various rooms in their home. You want clothing in the bedroom, kitchenware in the kitchen and not the two mixed together.


In this case, the hoarder is the pipeline through which data gathered. The different rooms in the hoarder’s home are databases into which data is stored. Engineering big data is thus about organizing it for the easiest possible access. Data engineering makes it easier to find. This way, the data scientists do not have to look through cupboards in order to find his winter gloves.


How is Big Data Analyzed?


Data engineers create ways of sorting information in real-time. That means as soon as a new object comes into the hoarder’s home, the hoarder knows exactly where it goes. The data engineer has programmed the hoarder to put it in the proper place.


The data scientist is responsible for the next phase of the process. This can involve adding another layer of the organization so that the objects are even more accessible.


For instance, let’s say that a hoarder’s neighbor wants to know the average cost of all the purple sweaters in the hoarder’s home. The data scientist must first determine what is and is not a purple sweater. This process was made easier because a purple sweater is a piece of clothing, and all the clothing can now be found in the bedroom. But there are two more categorical qualities that need to be accounted for: purple clothing and sweaters.


The Job of the Data Scientist


The job of the data scientist is to create an algorithm that sorts sweaters from other clothing as well as an algorithm that separates purple clothes from clothes that are different colors. Supposing that part of the information that the purple sweater holds is the price that the hoarder paid for it, that information can be extracted. The algorithm will also know precisely how many purple sweaters the hoarder owns. All of the price tags are added up and averaged. And now you have the average cost of a purple sweater!


Now the reason why the neighbor asked the hoarder about the average cost of a purple sweater, is because the neighbor himself happens to be in the market for a purple sweater of his own. The information the data scientist has provided him can thus be used to make a prediction concerning the cost of a purple sweater.


Big Data and Predictive Models


Let’s say that another neighbor drops by and she is a fashion designer whose online business sells and distributes top of the line purple sweaters all over the world. She notices that the neighbor has thousands of purple sweaters, but that she only sees him wearing five or six of them.


It’s obvious which ones he wears a lot because those sweaters are at the top of the massive pile in the hoarder’s bedroom, while the sweaters the hoarder doesn’t like so much are way at the bottom. By analyzing the texture, quality, and design of the hoarder’s favorite purple sweaters, the fashion designer will have a better idea of which purple sweaters will be big hits with her customers.


Unique Industries that Use Big Data


For centuries, economics and marketing professionals have been attempting to determine what drives consumer decision making. While much of this has been caught up in theoretical models that make and test assumptions about your average consumer, the trend of big data has allowed marketing specialists to make deductions and inferences based on statistics.


While this tactic isn’t necessarily new, the speed at which this information can be processed is what separates big data from simple data. Computers can now process transactions from online websites immediately and determine, given the customer’s interaction with a website, what a potential customer is likely to be motivated by that persuades them to purchase something.


Politics and Big Data


This information isn’t only valuable to retail businesses. It’s also valuable to politicians. Consider that politics is nothing more than one large marketing campaign. Politicians that can process real-time signals from social media and the news will have a distinct advantage over those that don’t. Campaign managers can tailor their candidate’s campaign to untapped demographics or attempt to cross enemy lines by appealing to the values of their opponent.


Law Enforcement and Big Data


Remember the scandal that Edward Snowden caused when he blew the whistle on the NSA? It turned out that the NSA was collecting information indiscriminately from US citizen’s phones and computers. Setting the morality or necessity of this aside for a moment, imagine how useless all that information would be if it were to come in one amorphous blob.


This is where engineering big data comes into play. The NSA needs a way to categorize and sort the information as it comes in. Otherwise, the sheer size of the information would render it practically useless. Data scientists can then go over this information devising algorithm that makes predictions based on the content in messages sent and make predictions based on the likelihood that a terrorist attack is imminent, or so we’re being led to believe.


The Role of Big Data in the Future


There are many other industries that employ big data as well. Those include investment traders who analyze financial markets, scientists, city planners, athletic coaches, healthcare professionals, efficiency experts, and even the engineers that are perfecting Google’s self-driving car.


As the potential for big data evolves, data scientists and engineers will streamline the processes by which computers collect and analyze information. By so doing, they make this information more accessible to humans. This concept was unheard of up until 20 years ago.


Big data may sound as if it’s geared toward working with massive data sets, and it is to a certain extent. But it’s the speed at which this information can be processed that sets big data apart from other forms of computational analysis.


With as much information as we have at our fingertips and the processing speed to use that data in real time, it won’t be long before machine learning supersedes that of human learning in key areas.


Search R-Algo Engineering Big Data’s blog for big data articles.

R-ALGO Engineering Big Data provides articles on Artificial Intelligence, Algorithms, Big Data, Data Science, and Machine Learning. Also, R-ALGO Engineering Big Data provides R Tutorials on how to implement Machine Learning Algorithms with provided datasets.

Sign Up Now