How does Big Data relate to the recent major scientific discovery of the Higgs boson particle? Consider how the particle was finally coaxed into existence for less than a nanosecond and the effort it took – 10,000 scientists and engineers from 600 institutions in more than 100 countries (hurrah for grid computing!). It was all coordinated at the European Organization for Nuclear Research near Geneva, better known as CERN. (For those of you unfamiliar with particle physics and this big leap in our understanding of forces in the universe, the Higgs boson particle is important because it gives matter mass – helping to explain the mysteries of the cosmos by unmasking the smallest possible entity.) Then consider how each institution involved has its own massive computing power with its own plethora of data storage devices. Collecting all of this data tied to CERN’s particle accelerator, analyzing the data, theorizing on what it all means, and producing volumes of scientific conclusions and research – all leads to one of the largest grid computing projects in human history. And in Geneva, CERN’s Big Data processing relies on the Hadoop Distributed File System for storing massive amounts of data. Some stats: CERN’s 17-mile-long collider generates hundreds of millions of particle collisions each second. Recording, storing and analyzing these collisions represents a massive challenge; the collider produces roughly 20 million gigabytes of data each year. CERN stores that data partly on the premises in Geneva, but has to distribute roughly 80% to data centers all around the world. So far, CERN has amassed about 200 petabytes of data. This also points out some guiding principles for managing “Big Connectivity” in the context of the trends of Big Data projects such as the CERN example. CERN was able to draw from many different data elements extreme collaboration – seeing further by collecting data and information from processes, applications, Web services, rule sets, social networks, active content, and activities and using all of that Big Data to trigger appropriate changes and actions. It’s all about adapting quickly based on as much intelligence and analytics as possible. The growth of Big Data is illustrated simply below. As the data grows and merges into the cloud, and as the needs to mine that data grow and become increasingly important to business and customer analyses, we see the rise of business intelligence and varieties of data analytics. And just as in the CERN example, we see some significant trends in data connectivity that are integral to the Progress DataDirect strategy:
- With the rise of grid computing and Big Data, “connectivity” grids are essential for weaving together the myriad types of data stores required for collaborative research.
- With the migration of Enterprise Application Integration to the cloud to facilitate data integration with back-end systems in the cloud, broad connectivity is required for tying in both on-premise and cloud data sources.
- Data as a service is an emerging area to aggregate and manage large data set sources from multiple sources to make this information more easily available and usable to user communities and businesses.
- As more companies move their back-end business processes to the cloud, the need for data security in the cloud and in motion between on-premise and cloud-based systems and data repositories correspondingly increases to lower risk and meet compliance requirements.
As these trends bear fruit, new ecosystems and markets are being created for broad cross-enterprise data connectivity. Use cases like the CERN accelerator project provide us with greater insight into how important Big Data connectivity and analytics are in the scientific community as well as to businesses – enabling us all to “see further.”