Top Big Data Challenges
The path to the successful application of Big Data to educational institutions is going to face at least six major Big Data challenges or road blocks that will have to be addressed one at a time:
Integration across institutional boundaries – K-12 schools are generally organized around academic disciplines. Universities are organized as separate schools, faculties, and departments. Each of these units operates somewhat independently of the others and share real estate as a matter of convenience. Integrating data across these organizational boundaries is going to be a major challenge. No organizational unit is going to surrender any part of its power base easily. Data is power.
Self-service analytics and data visualization -- It is going to be a piece of cake to give planners and decision makers the technology based tools they need to do their own analytics and visualize the results of their studies graphically. It is going to be a genuine challenge to create a culture that requires them to do their own studies using those tools. An even greater challenge will be to create a climate that informs their decision making with the results of their own studies because they are so accustomed to making decisions intuitively.
Privacy – There is a great deal of concern – perhaps even excessive concern – about the privacy of the information collected about each student and her family. The concern is that this data could fall into the wrong hands or be abused by those who have been given responsibility for safeguarding the information. To some extent, this is a technological and management issue. However, the fundamental issue is fear that the technical and management safeguards either won’t work or will be abused. Lisa Shaw, a parent in the New York City public school system said, “It’s really invasive. There’s no amount of monetary funds that could replace personal information that could be used to hurt or harm our children in the future.”
Correlation vs ‘cause and effect’ – Purists in rational argument want to see arguments that clearly spell out cause-and-effect relationships before blessing them as a basis for decision making. The fact that two factors may be highly correlated does not satisfy this demand for cause-and-effect. Nevertheless, real world experience in other areas of Big Data have shown that high correlations are sufficient by themselves to make decisions that are either lucrative or achieve the objectives the players in mind. This means they have been able to realize significant benefits based on correlation without being able to argue the underlying mechanics.
Money – Nearly all educational institutions are strapped for money. When they make decisions to invest in the hardware, software, staff, and training to exploit Big Data, they are making decisions not to hire another professor, equip a student lab, or expand an existing building. That can be a tough call.
Numbers game – Some argue – perhaps rightfully so – that Big Data reduces interactions with students to a numbers game. Recommendations and assessments are based entirely on analytics. This means that compassion, personal bonding, and an understanding of the unique circumstances of every student gets lost in the mix. Others argue that Big Data is an assist to the human process. In any event, this is unquestionably a stumbling block.
Privacy vs. Evidence Based Research
There is a great deal of concern about student privacy as we mentioned above, and it is one of the top Big Data challenges that must be resolved. One of the key reasons for this concern focuses on the process of growing up itself. It’s not unusual for students to participate in activist organizations in their youth that they reject later in life. Or they drank too much in university but sobered up once they had the responsibilities of jobs and families. Or a teacher may have given a student a negative evaluation that should not have survived his graduation or departure from the school. In the past, we simply forgot these things. Life moves on and we don’t give a great deal of attention to what happened 25 years ago. But permanent records that can be pulled up and viewed decades later may cast shadows on job candidates that are completely unwarranted at that time. In other words, we lose the ability to forget.
There is an even greater threat, though. Although there is general agreement about the value of predictive analytics, no one pretends that the predictions are inevitable. Nevertheless, a computer-generated prediction can take on the aura of truth. A prediction that a student is not suitable for a particular line of work may prevent hiring managers from hiring her for a position she is perfectly well suited to handle. These predictions can severely limit her opportunities in life forever.
One way of dealing with this is to pass legislation that limits access to student information, protects the identity of individuals, and yet still makes it available to those conducting legitimate educational research. Unfortunately, this ideal is better served in rhetoric than in reality.
Consider stripping student information of any identifying information and releasing it, along with records of other students in the same cohort, for general access for educational research. Yes, the school has taken all the required and appropriate steps to protect the students’ identity. But, no, it doesn’t work. That’s because Big Data practitioners generally access large data sets from a wide variety of sources. Some of those other sources (viz. Facebook) make no attempt to protect the individual’s identity. Those secondary sources have enough unique identifying characteristics that can be accurately correlated with the de-identified school records to re-identify those school records. The best laid plan of mice and men …………
There is no shortage of legislation in the US to protect student information. The most relevant legislation includes:
- The Family Educational Rights and Privacy Act of 1974 (FERPA). This act prohibits the unauthorised disclosure of educational records. FERPA applies to any school receiving federal funds and levies financial penalties for non-compliance.
- The Protection of Pupil Rights Amendment (PPRA) of 1978. This act regulates the administration of surveys soliciting specific categories of information. It imposes certain requirements regarding the collection and use of student information for marketing purposes.
- The Children’s Online Privacy Protection Act of 1998 (COPPA). This act applies specifically to online service providers that have direct or actual knowledge of users under 13 and collect information online.
Unfortunately, this legislation is outdated and somewhat useless today. For example, it applies to schools but not to third party companies operating under contract to the schools. This legislation was enacted before the era of Big Data and doesn’t address the issues that this current technology raises. Further, the acts don’t include a “right of action.” This means there is no way to enforce the law.
In light of this, there are ongoing legislative attempts to deal with the need to protect the privacy of student information. Up until September 2015, 46 states introduced 162 laws dealing with student privacy; 28 of those pieces of legislation have been enacted in 15 states. There have been ongoing initiatives at the federal level as well. Relevant pieces of federal legislation that have been introduced include:
- Student Digital Privacy and Parental Rights Act (SDPPRA)
- Protecting Student Privacy Act (PSPA)
- Student Privacy Protection Act (SPPA)
These acts are primarily concerned with protecting student data that schools pass along to third party, private sector companies for processing. In spite of the fact that these companies have generally built in their own data protection policies and procedures that already meet the requirements of this legislation, there is still considerable fear that the companies will use the data for nefarious purposes such as tailoring marketing messages to particular students – something that is clearly outside the scope of providing education or conducting educationally related research.
The US is not alone in its concern. The European Union has developed regulations that apply throughout the EU. This is in contrast to the fragmented American approach. To be fair to the Americans, however, the Constitution specifically provides that education is a state concern, not a federal one.
The EU 1995 Directive 95/46/EC is the most important EU legal instrument regarding personal data protection of individuals. Rather than discourage the use of third parties storing and processing student information, the EU prefers to regulate it. The EU recognizes that private sector companies provide a valuable service.
The Directive gives parents the option of opting out data sharing arrangements for their children. However, doing so would likely jeopardize the educational opportunities their children would enjoy otherwise. In other words, while parents have the right to opt out, it would be imprudent in practice to do so.
After considerable discussion and consultation, the EU Parliament approved the General Data Protection Regulation (GDPR or Regulation). This Regulation is set to go into effect in May 2018.This Regulation pays particular attention to requiring schools to communicate “in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child.”
Unfortunately, this is problematical. Big Data and Machine Learning develop algorithms that are quite opaque. Even the professionals who operate Big Data systems don’t know the inner workings of the algorithms their systems develop. Interestingly, they don’t even know which pieces of input are pivotal to the output and recommendations of those systems. In this context, it is reasonable that the general public sees EdTech companies as a threat to students’ autonomy, liberty, freedom of thought, equality and opportunity.
On the other hand, when you visit these EdTech websites, it certainly appears that they are driven by a sense of enlightenment. Their websites clearly suggest that they have the best interests of the students and their client schools in mind. Aside from the opaque nature of Big Data and Machine Learning algorithms, it is not clear – to this author at least – that EdTech companies deserve to be treated as skeptically as they are. It’s quite possible that the nub of the issue is not the stated objectives and current operations of these companies, but rather the uses that this data might be put to in the future and have not been foreseen today. In other words, the way the data might be used in the future is unpredictable. The unpredictable uses of the data could lead to unintended consequences.
In both Europe and the US, when we look at the furor about the importance of the privacy of student information, it often boils down to pedagogical issues.
Here is the nub of the conundrum in a nutshell. There is clearly a potential benefit of conducting educational research using student information. There is good reason to believe that tracking students over the course of their academic years – and perhaps even into their working careers – would allow scholars to identify early indicators of eventual success or failure. However, if scholars are prohibited from conducting that research by placing restrictions on student identification or restrictions on the length of time data can be stored, then that sort of research could not be conducted. This could conceivably lead to a loss of value to both individual students who could benefit from counseling informed by reliable research as well as to benefits to society at large.
How Is the Future of Big Data in Education Likely to Unfold?
Here are the trends to look for – in no particular order. These trends are instrumental in informing the schools’ policy development, strategic planning, tactical operations, and resource allocation, and overcoming the Big Data challenges in Education.
Focus student recruitment – Historically, colleges and universities have had student recruitment programs that were fairly broad in terms of geography and demographics. This led to a large number of student applications for admission. Unfortunately, many of the students the institutions accepted did not enrol in those schools. Colleges are now using Big Data to find those geographic areas and demographics where their promotional efforts not only generate large numbers of high caliber applicants, but also applicants who, if accepted into the college, will actually enrol.
Student retention and graduation – Universities need to do more than attract high caliber students. They need to attract students who will stay in school and graduate. Big Data coupled with Machine Learning can help identify those students. In parallel with student recruitment, the schools will increasingly use Big Data to identify at risk students at the moment they show signs of falling behind. This will enable the schools to assist the students, help ensure their success, retain them in school, and increase the chances they will graduate.
Construction planning and facility upgrades – Educational institutions at all levels have more demands to add or expand their buildings and upgrade their facilities than their budgets will permit. They need to establish priorities. Big Data will help planners sort through the data to identify those areas that are likely to be in highest demand and provide the greatest benefit to the students and the institutions.
Data centralization – At the moment, nearly all data in educational institutions is held in organizational silos. That means that each department or organizational unit collects, stores, and manages the data it needs for its own purposes. That is a natural result of the need for each function to get its work done. However, it is counterproductive if we wish to apply Big Data. In the future, we can expect these siloed data stores to be integrated or linked virtually. Integration means that the data will be moved to a central repository and managed by a central function – like the IT department. Virtual integration means that the functional units will remain where they are at the moment but the IT department will have read access to each of these repositories. Quite likely, we will see both options in practice for the foreseeable future.
Data based decision making and planning – Although Education has enjoyed the benefit of quantitative studies for centuries, the practice of education is generally driven by the philosophical views of educators more than data or evidence based studies. In fact, this approach has been enshrined in our commitment to academic freedom at the university level and has trickled down, to some extent, to public and private K-12 schools. Big Data will enable a data-rich culture that will inform policy development and operational planning to an extent we’ve never seen in the past.
Greater use of predictive analytics – Machine Learning applied to Big Data will become increasingly successful at predicting students’ future success based on their past performance. Schools of all stripes will rely on these predictive analytics more and more in the future. This is likely to lead to two types of outcomes. On the one hand, schools will allocate more resources to those students most likely to succeed and, as a result, graduate more high-performing students who will deliver significant benefits to their communities and the world. On the other hand, predictive analytics will restrict the academic opportunities of failing students or those who show little promise – like Albert Einstein. Predictive analytics will also help institutions develop counter-intuitive insights that will challenge long cherished values and lead to better student and institutional results.
Local adoption of analytics tools – Older readers will remember the days when word processing was handled by a pool of word processing typists. Over time, word processing migrated from the pool to executives’ assistants and, eventually, to the desks of the executives themselves. Once word processing reached the desks of the executives and other knowledge workers, word processing shifted from being a mechanical function to being a creative one. Knowledge workers crafted their messages as they took form on their screens. The same will be true of predictive analytics. We are going to see the hands-on management of predictive analytics studies migrate from Big Data specialists to the desktops (and laptops) of executives who need to think through, propose, and defend policy statements, strategic plans, and operational or tactical initiatives.
User experience – Educators often don’t know a student is having a problem until they see the student failing (or just barely passing) quizzes and tests. But, even when they recognize the problem, they don’t know the reasons any given student is falling behind. Big Data will help students by recognizing the problems they have as those problems occur. Then it can offer tutorials that address those problems as they occur – not days or weeks later when it may be too late to affect the students’ learning trajectories.
Real time quiz evaluations and corrective action. -- As computers and tablets become ever more pervasive in classrooms, schools at all levels will be better able to collect digital breadcrumbs about how students perform on quizzes and determine what corrective action is required. This is going to eventually become the norm. Seven Ross, a professor at the Center for Research and Reform in Education at Johns Hopkins University agrees. He said, “Most of us in research and education policy think that for today’s and tomorrow’s generation of kids, it’s probably the only way.”
Privacy, privacy, privacy – The privacy of student and family data will continue to be a hot issue. Over time, however, the benefits of sharing data with student identification data will outweigh the concerns of the general public. Sharing this data among qualified research professionals will become more socially acceptable not only as technological safeguards are put into place, but as they are accepted as being appropriate. In practice, society will discover that the student data they thought was secure, is not. Witness the data breach at Equifax that spilled confidential data about 143 million people. Do you remember the data breaches at Target and Home Depot? Again, tens of millions of people who trusted these companies with their credit card information were affected.
Learning Analytics and Educational Data Mining – We are seeing a new professional discipline emerge. The professionals in this field will have both the professional and technical skills to sort through the masses of unstructured educational data being collected on a wholesale basis, know what questions to ask, and then drill through the data to find useful, defensible insights that make a genuine difference in the field of Education. The demand for these specialists is likely to outstrip the supply for many years to come.
Games – We are likely to see far more games introduced into the educational curriculum than we’ve ever seen before. Games are not only proven to be instrumental in the learning process, they also lend themselves to data acquisition for immediate or later analyses.
Flipped classrooms – The Kahn Academy has reversed the historical process of delivering course material during class time and assigning homework to be handled out of class. It their flipped classrooms, students watch streaming videos at their leisure out of class. Class time is dedicated to providing students a forum where they can work through their problem sets and ask for – and get – help as they need it. This flipped classroom is going to become far more widespread because our technologies today enable it – and it just makes a lot of sense.
Adaptation on steroids – Adaptation is nothing new. It’s been going on for thousands of years. The idea is that course material or explanations or problem sets or tutoring is tailored to the individual needs of the student. But when we put that adaptation on steroids, we see a shift in “kind.” In other words, we see something that was not present before. Today we can monitor every move students make, not just count the right and wrong answers they give to a quiz question. By analyzing facial expressions, delays in responding, and a myriad of other variables, we can tailor make and deliver a tutorial specifically suited to a student’s learning problem at the moment the problem occurs.
Institutional evaluation – Schools have always presumed to grade their students. Until relatively recently, it was presumptuous for students to grade their teachers or their schools. Now it is becoming common practice. In fact, Big Data will play an ever-growing role in assessing the performance of individual instructors. More importantly, Big Data will rank order universities, colleges, and high schools on a wide range of variables that can be supported through empirical evidence. True, some of that evaluation will be based on “sentiment” – but much of it will be based on hard analytics that would have been too time consuming or too expensive to collect and analyze in a holistic manner.
The Jury Is Still Out
In spite of all the investment, the excitement, and the promise of Big Data in Education, we still don’t have enough experience to make categorical claims about its value. We are still struggling the top Big Data challenges we face.
In an article in The Washington Post last year, Sahlberg and Hasak claimed that the promised benefits of Big Data have not been delivered. As a visiting professor at The Harvard Graduate School of Education, Sahlberg is an authority we should listen to. He claims that our preoccupation with test results reveal nothing about the emotions and relationships that are pivotal in the learning process. Our commitment to judging teachers by their students’ test scores has the effect of steering top performing teachers away from low performing schools – exactly where they are most needed. There are extensive efforts to evaluate both teachers and students. However, according to Sahlberg, this has NOT led to any improvement in teaching in the US.
The most that Big Data can offer is an indication of a high correlation between one factor and another. It cannot tell about cause and effect. In fact, cause and effect argments are difficult for people to make – and yet they are instrumental in building compelling arguments. Having said that, it is revealing to recognize that finding high correlations in other fields – even without a demonstrated cause and effect relationship – have proven to be quite beneficial.