What’s the deal with “Big Data?” Part 1

Part I: The History

Big data. You hear about it at every turn. From Google to Facebook, Amazon to Twitter, it appears that everywhere you look, it Terminalsseems that someone is tracking your online activities. There is no doubt that this information can be very valuable and, in some cases, very embarrassing to the person involved if it were to come to light. Few people realize just how much these companies can learn about you just by watching what you click, what you search and where you go on the Internet. Further, there is not a lot of protection for the data that is collected. For many, it is no wonder that this is a great cause of concern.

Yet, there can be a very positive side to “Big Data,” particularly in the realm of higher education.

Before we get to what “Big Data” is today and how it can be used positively in higher education, let’s first take a look at history and see how we got here:

In the not-so-distant past, once computers were first created, databases were soon developed to manage transactional operations. Instead of using vast numbers of people to do such things, we found that computers could do these tasks not only more efficiently, but more accurately. The cost savings were also not insubstantial. For example, in the higher education context, we developed systems to manage the registration for classes. Instead of doing this task by hand as was done in the past, we created data systems to manage these tasks. Over the y ears, these systems were aggregated and bundled into cohesive wholes and marketed together under various brand names such as Datatel, Banner and PeopleSoft. In time, these new systems were collectively named ERPs or Enterprise Resource Planning systems. This is how all these systems are categorized today.

Almost as soon as these systems came into being, people began to want to generate reports from them. This turned out not to be so easy. A system that was designed to run operational transactions efficiently and accurately might not have data in a format that is conducive to longitudinal reporting.

Thus was born the concept of a “Data Warehouse.” In general, the purpose of a data warehouse is to aggregate, normalize and transform data from various transactional system sources and move it into an authoritative central source to make reporting easy. To turn that complex definition into something more like English, it means, at least in the university context, that the information is taken from all these various transactional systems and then reformed. This reformation is done in such a way that it now makes it possible to run simple queries or searches across the data and get meaningful results without the need for complex programming. In other words, it makes the data easily accessible. At least it does in theory.

With such a promise, many universities in the 1990s began ambitious projects to develop data warehouses for reporting purposes. Many spent millions on the endeavors with the idea of getting great insight from the transactional data that they had been collecting for years. Unfortunately, many of these projects did not quite turn out as planned. Indeed, many turned out to be absolute disasters.

The reasons for the failures were many. However, amongst the biggest reasons were the sheer number of different data sources that needed to be brought together. At the time, universities often did not yet have aggregated ERP systems, but a collection of different, smaller systems that were linked together through something called “batch processing.” This meant that normalizing the data (making the data match across systems so that it meant the same thing regardless of the system being used) was incredibly complex. Because the definitions of data are often used in reporting, changing that definition to get it to match across systems was no easy task and often took a great deal of negotiation amongst various departments. For example, what is an applicant? For a university, that definition can have huge consequences.

Another challenge was that the technology used to build data warehouses was relatively new at the time. The tools, processes and procedures to build these systems were simply not all that well developed. In addition, in a lot of cases, university IT departments were ill equipped to develop such systems with the often limited resources available to them.

Thus, a great many of these data warehouse projects were abandoned or dramatically scaled backMoney Pit. The promise that they offered remained just that, a promise. Many an IT career was derailed because of the money-pit many of these projects had become. Thus for years, at any gathering of university IT professionals, you could still illicit cringed looks with the mere mention of the term “data warehouse.”

But, as is the case with many things in technology, what might seem like an old or even bad idea, often gets reborn in some new way. So it has come to be with data warehouses. Today, they have been redefined and reconstituted as “Big Data” with all the same promises and opportunities and yet with some of the same challenges too. In some ways, “Big Data” may even have new, more significant issues.

In Part II, we will take a close look at this modern take on the data warehouse and how “Big Data” can make a large, positive difference in higher education while also presenting some daunting challenges.

Photo Credits:

Photo 1:

http://www.flickr.com/photos/8399025@N07/3300683055/in/photolist-62ES5x-4xjF6L-9tmQrm-4xjZVu-6ZhcKK-88XKCJ-717Wbe-9tiNNe-51CVki-51CVUg-9tmMqQ-aBchPi

Photo 2:

http://www.flickr.com/photos/60057912@N00/4404695769/in/photolist-7HedAv-uvWfh-51Ys8K-7bYqCN-2kmuW2-uvWdP-uvWeP-uvWd9-7vpAUr-9A37wt-78yDYU-e6BPLD-dKLm3U-d3F8j3-d3FcJy-d3F947-d3F7eE-d3F1Ej-d3F9tG-d3F2T5-d3F8HU-d3F5BG-dqMxba-2FU8tq-jXtiMZ-daWsnN-cqkBfw-9MKokD-dqMx4V-dqMGCA-dqMwPT-dqL9zJ-dtzYYF-dqMwT6-523CXd-523KLC-51YrzP-51YtC6-51YAhX-51YqLv-523Goy-51YzJB-8w7E4V-51YwSH-51Yqh6-51Yy5R-523Eg1-523PUj-523HVb-523Hoj-51YzmB