Big Data Challenges

 Although data mining provides the means to deal efficiently with large amounts of, usually, peculiar data, healthcare data analysis in the Internet of Things era is far from an uncomplicated task. The term “big data” is very popular nowadays and is used to describe the explosively extended collections of data, which carry significantly more peculiarities. The popularity of the “big data” concept is signified by the fact that a lot of challenges regarding the management and analysis of big data have been raising, whereas the benefits and opportunities ensuing from the exploitation of these data seem to be manifold and very attractive. The conventional data analysis tools, including machine learning and data mining, are not efficient anymore and have to be scaled in order to deal with the obstacles posed by the big data paradigm. The basic peculiarities that figure the challenging character of big data are described by the “5-V’s”. Laney (2001) introduced the term “3-V’s” in big data community, namely volume, variety, and velocity. However, since then there have been proposed even more “V’s”, but two of them, variability and value, are the most popular . These “5-V’s” are described in the following lines :

Velocity: The velocity of data streams that arrive continuously and need to be analyzed pose challenging real-time constraints. 

Variety: There are many different types of data (e.g. text, sensor data, images, audio, video, graph), various degrees of structure in data (structured, semi-structured, unstructured), and often different types and structures of data are mixed. 

Variability: The structure of the data as well as the way users are willing to interpret these data usually changes with time providing extra data and knowledge management challenges. 

Value: The value refers to the quality of the extracted knowledge that may give the ability of making better decisions and answering more questions.