
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
–Dan Ariely
I heard this quote again this week at the BigData Conference and Exhibition in Toronto (1). Manmit Shrimali from Dextro Analytics included the quote in one of his slides when he was presenting about Bridging Gap between Data, Analytics and Decision Making.
Originally, Dan Ariely published this sentence in 2013.
Do we see enough changes in Big Data area that makes this expression not applicable anymore?
My strong impression is that the interest in applying Big Data in business practice is increased significantly in comparison with 2013.
The fact that over 1100 participants attended Big Data Toronto 2016 conference just on the first day shows extremely high curiosity in Big Data topics.
Many attendees were standing and listening to the presentations because there were not enough chairs in the auditorium.
Here are my other observations after reviewing the conference vendor’s offerings and listening to the conference presentations:
- The gap is growing between leading businesses who realize benefits from Big Data and the rest who don’t.
- More presenters and vendors are trying to relabel their traditional (small) data and analytics initiatives as applications of Big Data.
- We have far more people with Data Scientist titles. One of the presenters mentioned that some of the Fortune 500 companies employ hundreds of Data Scientists.
Why do we fool ourselves by using new terms to the old concepts: the traditional data initiatives labeled as Big Data and Statistical Analyst job title converted to Data Scientist?
Big Data Characteristics and Challenges
There are few essential characteristics of Big Data that create new business opportunities and the basis for potential disruptions in multiple industries.
The Big Data articles in Wikipedia (2) provides summary of differences between BigData and business intelligence:
- Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends, etc..
- Big data uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density to reveal relationships and dependencies, or to perform predictions of outcomes and behaviors.
My simplified version of this comparison is:
- Business Intelligence helps us to understand past.
- Big Data help us to predict the future, make informed decisions and automate actions
What are the barriers to adopting Big Data by businesses?
I believe that one of the main reasons that many leaders of traditional businesses are failing to leverage Big Data is in the complexity of translating numbers, and complex statistical concepts to a story that a person without scientific mindset can understand.
When we are trying to simplify the Big Data complexity we often eliminate some of the essential elements that can make the real difference.
Data Science and Data Scientists
The role of data scientists was another hot topic of the conference panel discussions. The main questions were: what do they do? How to find and hire them?
Many people try to define a data scientist as an individual who combines multiple skills: such as advanced knowledge in statistics, ability to program in R and Python with expertise in a business domain.
Yes, a data scientist needs those skills to perform their job. However, these skills do not make a person a data scientist.
The essence of Data Science
Wikipedia defines Data Science as following: (3)
“Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics similar to Knowledge Discovery in Databases (KDD).”
The essence of Data Science is the ability to create new knowledge from data rather than in specific technologies and skills applied to do this.
People often focus on the first word “data” and forget the second part “scientist” in the data scientist definition.
The essential qualification criteria to be considered, as a data scientist is the ability to produce new knowledge and insights from data.
Different branches of science have the common pattern of the knowledge creation process. This process called the scientific method. Data Science is not an exception. The scientific method cycle has four steps that continually produce new evolutionary knowledge and eventually breakthroughs: Observation, Theory, Prediction, and Experiment.

These four steps match the four skills that a person must have to be able to be qualified as Data Scientist:
- Plan and conduct an experiment that produces data
- Collect observational data and be able to extract signal from noise
- Come with initial hypothesis or theory (insights and story)
- Apply the theory (model) to predict future and plan actions
How the Big Data technologies change the data science role?
- The advancements in machine learning (ML) and artificial intelligence (AI) technologies help data scientists to shorten the cycle and deliver insights in a much short period.
- We can develop a predictive model today in hours or minutes instead of days.
- New technologies help us to move faster but do not replace the data science role.
- The art of data since is still vital in extracting practical knowledge and insights from data.
- A data scientist should also be able to communicate a story to others that help to make decisions and actions plans.
The continuous evolutionary scientific sequence is critical to producing new insights.
Inability to integrate the data science process within business operations is one of the main reasons that old generation of corporations failed to compete. In contrast, the new cohorts of post-internet companies have been able to apply the data science cycle within the core of their decision making.
Big Data Toronto 2016 Key Lessons
- Big data is still like teenage sex for many businesses.
- Business Intelligence with larger volumes of data is not yet Big Data
- Data Scientist is not a new name for a statistical analyst. Don’t try to substitute a data scientist with a statistician.
- The goal of Big Data and Data Science is to acquire knowledge and actionable insights from the data and predict future rather than understand past.
- Investments in technology help us to accelerate the data science cycle, but we need actions and experiments to realize business benefits.
- The old formula is relevant – we require the combination of three elements: process, people, and technology to succeed in Big Data.
- Companies that learned how to utilize Cloud computing and partnership with data science professional services are moving much faster with Big Data adoption.
- If you are still not sure how to approach your Big Data challenge, then IBM and its Watson technology are ready to help you.
References
- Big Data Toronto, http://www.bigdata-toronto.com/
- Big Data, https://en.wikipedia.org/wiki/Big_data
- Data Science, https://en.wikipedia.org/wiki/Data_science