What is Big data
Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information. Big data can be characterized by 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be must processed. Although big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data, much of which cannot be integrated easily.
Because big data takes too much time and costs too much money to load into a traditional relational database for analysis, new approaches to storing and analyzing data have emerged that rely less on data schema and data quality. Instead, raw data with extended metadata is aggregated in a data lake and machine learning and artificial intelligence (AI) programs use complex algorithms to look for repeatable patterns.
Big Data Analytics:
Big data analytics is often associated with cloud computing because the analysis of large data sets in real-time requires a platform like Hadoop to store large data sets across a distributed cluster and MapReduce to coordinate, combine and process data from multiple sources.
Although the demand for big data analyticsis high, there is currently a shortage of data scientists and other analysts who have experience working with big data in a distributed, open source environment. In the enterprise, vendors have responded to this shortage by creating Hadoop appliances to help companies take advantage of the semi-structured and unstructured data they own. Big data can be contrasted with small data, another evolving term that's often used to describe data whose volume and format can be easily used for self-service analytics. A commonly quoted axiom is that "big data is for machines; small data is for people."
Understanding Big Data Analytics
The most common source of confusion results from the conflation of Big Data storage with Big Data analytics. The term “Big Data” originated from within the open source community, where there was an effort to develop analytics processes that were faster and more scalable than traditional data warehousing, and could extract value from the vast amounts of unstructured data produced daily by web users. Big Data storage is related in that it also aims to address the vast amounts of unstructured data fueling data growth at the enterprise level. But the technologies underpinning Big Data storage, such as scale-out NAS and object-based storage, have existed for a number of years and are relatively well understood. At a very simplistic level, Big Data storage is nothing more than storage that handles a lot of data for applications that generate huge volumes of unstructured data. This includes high-definition video streaming, oil and gas exploration, genomics -- the usual suspects. A marketing executive at a large storage vendor that has yet to make a statement and product introduction told me his company was considering “Huge Data” as a moniker for its Big Data storage entry. Big Data analytics is more emergent and multifaceted, but less understood by the IT generalist. Development of Big Data analytics processes has been driven historically by the web. However, the rapid growth of applications for Big Data analytics is taking place in all major vertical industry segments and now represents a growth opportunity to vendors that's worth all the hype. Big Data analytics is an area of rapidly growing diversity. Therefore, trying to define it is probably not helpful. What is helpful, however, is identifying the characteristics that are common to the technologies now identified with Big Data analytics. These include: The perception that traditional data warehousing processes are too slow and limited in scalability The ability to converge data from multiple data sources, both structured and unstructured The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies In addition, there are at least four major developmental segments that underline the diversity to be found within Big Data analytics. These segments are MapReduce, scalable database, real-time stream processing and Big Data appliance.