Since social media platforms expanded through our lives, the amount of data exchanged across them has sharply upsurged. We write texts describing an idea, an opinion, a fact; we upload images and videos; we share our preferences by using simple buttons ("like", "favorite", "follow", "share", "pin" etc.); we accept in the network people we know very well in our real life and people we have never met before and probably never will - … and everything goes in the network almost in real time!
Suddenly, we realize that the unit measure of the data handled in a given amount of time reaches the order of exabytes. This data is not only big in volume, but is also extremely diverse and it moves at incredible speeds. The information contained in it is relatively incommensurable. Fact is Facebook, Twitter, Pinterest can see when you fall in love, what is your mood, where you are and many other behaviors that you decide to show.
The question is: what can we do with this massive amount of data created through social media?
According to the information gathered by IBM in a reported based on sources provided by Mc Kinsey Global Institute, Twitter, Cisco, EMC, SAS, MEPTEC, QAS the following interesting facts worth paying attention to:
At first sight we can describe Big Data as very large and complex data sets, impossible or hard to handle with classic data processing tools. The expression itself is being used as it originated from English; we must note that French specialists are currently translating it as "grosses données" (big data) or "données massives" (massive data) or even "datamasse" (datamass) as in "biomass". The novelty of the concept and the blurred definition lines prevent the localization of the term.
In 2012, Gartner (that has somehow contoured the term in the early 2000"s) has updated the definition: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."
The above definition outlines the dimensions of Big Data - the well-known 3Vs - volume, velocity, variety. Yet, the great thing about this formulation is that it opens multiple perspectives on the Big Data concept. Recently a 4th V has been attached to the above definition: Veracity. We may note a technology view, a process view and a business view.
Since one of the essential characteristics of Big Data originated from social media is that it is real-time or near-real-time. This gives to the exploratory analysis a wide perspective on what is happening and what is about to happen at a certain time in a certain area.
Each fundamental trait of Big Data can be understood as a parameter for quantitative, qualitative and exploratory information analysis.
One of biggest challenges at the time being is to build the proper tools and systems to manage big data. As real-time ore near-real time information delivery is one of the key features of big data analytics, the research aim to set-up data base management systems able to correspond to the new requirements.
The technology in progress involves the following:
Storage: For the storage and retrieval of data, the underlying NoSQL developments are best represented by MongoDB, DynamoDB, CouchBase, Cassandra, Redis and Neo4j. Currently they are known as the most performing document, key value, column, graph and distributed databases.
Software: The Apache Hadoop set counts Cloudera, HortonWorks and MapR. Their main goal is to expand the usage of big data platforms to a more diverse and capacious user range. Secondly these technologies focus on increasing the reliability of big data platforms, to enhance the capability of managing them and their performance features.
Data Exploration and Discovery: Big data analytic discovery is a hot research and innovation topic. Major developments have been done by Datameer, Hadapt, Karmasphere, Platfora or Splunk.
When dealing with a completely new size level, the capture, the storage, the research, the distribution, the analysis and the visualization of data must be redefined. The perspective of handling big data are enormous and yet unsuspected!
It is often recalled the possibility to explore information shared in the media, to acquire knowledge and to assess, to analyze trends and to issue forecasts, to manage risks of all kind (commercial, of insurance, industrial, natural) and phenomena of all kind (social, political, religious, etc.). In geodynamics, meteorology, medicine and other explorative fields - big data is ought to improve the way the processes are being deployed and the data interpreted.
In order to answer our initial question, the best thing we can do with this data mass is to EXPLORE it.
As simple as it may seem, this statement has deep implications on the way we see data analysis in the nearest future. The model is shifting from the traditional model in which we plan, collect data then analyze to the new model where we collect all and after we try to find significant patterns.
The new analysis model has its own risks, but it also opens the way for a new generation of data analysts and scientists. At this point, I consider that this is the main impact that social media had upon the way we see Big Data.
by Larisa Gota
by Ambrus Oszk
by Alpar Torok
by Sorin Pânca
by Tudor Trișcă
by Chris Frei
by Bogdan Oros