The Top Six Big Data Challenges in Education

education challenges

Top Big Data Challenges

The path to the successful application of Big Data to educational institutions is going to face at least six major Big Data challenges or road blocks that will have to be addressed one at a time:

Integration across institutional boundaries – K-12 schools are generally organized around academic disciplines. Universities are organized as separate schools, faculties, and departments. Each of these units operates somewhat independently of the others and share real estate as a matter of convenience. Integrating data across these organizational boundaries is going to be a major challenge. No organizational unit is going to surrender any part of its power base easily. Data is power.

Self-service analytics and data visualization –– It is going to be a piece of cake to give planners and decision makers the technology based tools they need to do their own analytics and visualize the results of their studies graphically. It is going to be a genuine challenge to create a culture that requires them to do their own studies using those tools. An even greater challenge will be to create a climate that informs their decision making with the results of their own studies because they are so accustomed to making decisions intuitively.

Privacy – There is a great deal of concern – perhaps even excessive concern – about the privacy of the information collected about each student and her family. The concern is that this data could fall into the wrong hands or be abused by those who have been given responsibility for safeguarding the information. To some extent, this is a technological and management issue. However, the fundamental issue is fear that the technical and management safeguards either won’t work or will be abused. Lisa Shaw, a parent in the New York City public school system said, “It’s really invasive. There’s no amount of monetary funds that could replace personal information that could be used to hurt or harm our children in the future.”

Correlation vs cause and effect– Purists in rational argument want to see arguments that clearly spell out cause-and-effect relationships before blessing them as a basis for decision making. The fact that two factors may be highly correlated does not satisfy this demand for cause-and-effect. Nevertheless, real world experience in other areas of Big Data have shown that high correlations are sufficient by themselves to make decisions that are either lucrative or achieve the objectives the players in mind. This means they have been able to realize significant benefits based on correlation without being able to argue the underlying mechanics.

Money Nearly all educational institutions are strapped for money. When they make decisions to invest in the hardware, software, staff, and training to exploit Big Data, they are making decisions not to hire another professor, equip a student lab, or expand an existing building. That can be a tough call.

Numbers game Some argue – perhaps rightfully so – that Big Data reduces interactions with students to a numbers game. Recommendations and assessments are based entirely on analytics. This means that compassion, personal bonding, and an understanding of the unique circumstances of every student gets lost in the mix. Others argue that Big Data is an assist to the human process. In any event, this is unquestionably a stumbling block.

Privacy vs. Evidence Based Research

There is a great deal of concern about student privacy as we mentioned above, and it is one of the top Big Data challenges that must be resolved. One of the key reasons for this concern focuses on the process of growing up itself. It’s not unusual for students to participate in activist organizations in their youth that they reject later in life. Or they drank too much in university but sobered up once they had the responsibilities of jobs and families. Or a teacher may have given a student a negative evaluation that should not have survived his graduation or departure from the school. In the past, we simply forgot these things. Life moves on and we don’t give a great deal of attention to what happened 25 years ago. But permanent records that can be pulled up and viewed decades later may cast shadows on job candidates that are completely unwarranted at that time. In other words, we lose the ability to forget.

There is an even greater threat, though. Although there is general agreement about the value of predictive analytics, no one pretends that the predictions are inevitable. Nevertheless, a computer-generated prediction can take on the aura of truth. A prediction that a student is not suitable for a particular line of work may prevent hiring managers from hiring her for a position she is perfectly well suited to handle. These predictions can severely limit her opportunities in life forever.

One way of dealing with this is to pass legislation that limits access to student information, protects the identity of individuals, and yet still makes it available to those conducting legitimate educational research. Unfortunately, this ideal is better served in rhetoric than in reality.

Consider stripping student information of any identifying information and releasing it, along with records of other students in the same cohort, for general access for educational research. Yes, the school has taken all the required and appropriate steps to protect the students’ identity. But, no, it doesn’t work. That’s because Big Data practitioners generally access large data sets from a wide variety of sources. Some of those other sources (viz. Facebook) make no attempt to protect the individual’s identity. Those secondary sources have enough unique identifying characteristics that can be accurately correlated with the de-identified school records to re-identify those school records. The best laid plan of mice and men …………

There is no shortage of legislation in the US to protect student information. The most relevant legislation includes:

  • The Family Educational Rights and Privacy Act of 1974 (FERPA). This act prohibits the unauthorised disclosure of educational records. FERPA applies to any school receiving federal funds and levies financial penalties for non-compliance.
  • The Protection of Pupil Rights Amendment (PPRA) of 1978. This act regulates the administration of surveys soliciting specific categories of information. It imposes certain requirements regarding the collection and use of student information for marketing purposes.
  • The Children’s Online Privacy Protection Act of 1998 (COPPA). This act applies specifically to online service providers that have direct or actual knowledge of users under 13 and collect information online.

Unfortunately, this legislation is outdated and somewhat useless today. For example, it applies to schools but not to third party companies operating under contract to the schools. This legislation was enacted before the era of Big Data and doesn’t address the issues that this current technology raises. Further, the acts don’t include a “right of action.” This means there is no way to enforce the law.

In light of this, there are ongoing legislative attempts to deal with the need to protect the privacy of student information. Up until September 2015, 46 states introduced 162 laws dealing with student privacy; 28 of those pieces of legislation have been enacted in 15 states. There have been ongoing initiatives at the federal level as well. Relevant pieces of federal legislation that have been introduced include:

  • Student Digital Privacy and Parental Rights Act (SDPPRA)
  • Protecting Student Privacy Act (PSPA)
  • Student Privacy Protection Act (SPPA)

These acts are primarily concerned with protecting student data that schools pass along to third party, private sector companies for processing. In spite of the fact that these companies have generally built in their own data protection policies and procedures that already meet the requirements of this legislation, there is still considerable fear that the companies will use the data for nefarious purposes such as tailoring marketing messages to particular students – something that is clearly outside the scope of providing education or conducting educationally related research.

The US is not alone in its concern. The European Union has developed regulations that apply throughout the EU. This is in contrast to the fragmented American approach. To be fair to the Americans, however, the Constitution specifically provides that education is a state concern, not a federal one.

The EU 1995 Directive 95/46/EC is the most important EU legal instrument regarding personal data protection of individuals. Rather than discourage the use of third parties storing and processing student information, the EU prefers to regulate it. The EU recognizes that private sector companies provide a valuable service.

The Directive gives parents the option of opting out data sharing arrangements for their children. However, doing so would likely jeopardize the educational opportunities their children would enjoy otherwise. In other words, while parents have the right to opt out, it would be imprudent in practice to do so.

After considerable discussion and consultation, the EU Parliament approved the General Data Protection Regulation (GDPR or Regulation). This Regulation is set to go into effect in May 2018.This Regulation pays particular attention to requiring schools to communicate “in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child.”

Unfortunately, this is problematical. Big Data and Machine Learning develop algorithms that are quite opaque. Even the professionals who operate Big Data systems don’t know the inner workings of the algorithms their systems develop. Interestingly, they don’t even know which pieces of input are pivotal to the output and recommendations of those systems. In this context, it is reasonable that the general public sees EdTech companies as a threat to students’ autonomy, liberty, freedom of thought, equality and opportunity.

On the other hand, when you visit these EdTech websites, it certainly appears that they are driven by a sense of enlightenment. Their websites clearly suggest that they have the best interests of the students and their client schools in mind. Aside from the opaque nature of Big Data and Machine Learning algorithms, it is not clear – to this author at least – that EdTech companies deserve to be treated as skeptically as they are. It’s quite possible that the nub of the issue is not the stated objectives and current operations of these companies, but rather the uses that this data might be put to in the future and have not been foreseen today. In other words, the way the data might be used in the future is unpredictable. The unpredictable uses of the data could lead to unintended consequences.

In both Europe and the US, when we look at the furor about the importance of the privacy of student information, it often boils down to pedagogical issues.

Here is the nub of the conundrum in a nutshell. There is clearly a potential benefit of conducting educational research using student information. There is good reason to believe that tracking students over the course of their academic years – and perhaps even into their working careers – would allow scholars to identify early indicators of eventual success or failure. However, if scholars are prohibited from conducting that research by placing restrictions on student identification or restrictions on the length of time data can be stored, then that sort of research could not be conducted. This could conceivably lead to a loss of value to both individual students who could benefit from counseling informed by reliable research as well as to benefits to society at large.

How Is the Future of Big Data in Education Likely to Unfold?

Here are the trends to look for – in no particular order. These trends are instrumental in informing the schools’ policy development, strategic planning, tactical operations, and resource allocation, and overcoming the Big Data challenges in Education.

Focus student recruitment – Historically, colleges and universities have had student recruitment programs that were fairly broad in terms of geography and demographics. This led to a large number of student applications for admission. Unfortunately, many of the students the institutions accepted did not enrol in those schools. Colleges are now using Big Data to find those geographic areas and demographics where their promotional efforts not only generate large numbers of high caliber applicants, but also applicants who, if accepted into the college, will actually enrol.

Student retention and graduation Universities need to do more than attract high caliber students. They need to attract students who will stay in school and graduate. Big Data coupled with Machine Learning can help identify those students. In parallel with student recruitment, the schools will increasingly use Big Data to identify at risk students at the moment they show signs of falling behind. This will enable the schools to assist the students, help ensure their success, retain them in school, and increase the chances they will graduate.

Construction planning and facility upgrades Educational institutions at all levels have more demands to add or expand their buildings and upgrade their facilities than their budgets will permit. They need to establish priorities. Big Data will help planners sort through the data to identify those areas that are likely to be in highest demand and provide the greatest benefit to the students and the institutions.

Data centralization At the moment, nearly all data in educational institutions is held in organizational silos. That means that each department or organizational unit collects, stores, and manages the data it needs for its own purposes. That is a natural result of the need for each function to get its work done. However, it is counterproductive if we wish to apply Big Data. In the future, we can expect these siloed data stores to be integrated or linked virtually. Integration means that the data will be moved to a central repository and managed by a central function – like the IT department. Virtual integration means that the functional units will remain where they are at the moment but the IT department will have read access to each of these repositories. Quite likely, we will see both options in practice for the foreseeable future.

Data based decision making and planning Although Education has enjoyed the benefit of quantitative studies for centuries, the practice of education is generally driven by the philosophical views of educators more than data or evidence based studies. In fact, this approach has been enshrined in our commitment to academic freedom at the university level and has trickled down, to some extent, to public and private K-12 schools. Big Data will enable a data-rich culture that will inform policy development and operational planning to an extent we’ve never seen in the past.

Greater use of predictive analytics Machine Learning applied to Big Data will become increasingly successful at predicting students’ future success based on their past performance. Schools of all stripes will rely on these predictive analytics more and more in the future. This is likely to lead to two types of outcomes. On the one hand, schools will allocate more resources to those students most likely to succeed and, as a result, graduate more high-performing students who will deliver significant benefits to their communities and the world. On the other hand, predictive analytics will restrict the academic opportunities of failing students or those who show little promise – like Albert Einstein. Predictive analytics will also help institutions develop counter-intuitive insights that will challenge long cherished values and lead to better student and institutional results.

Local adoption of analytics tools Older readers will remember the days when word processing was handled by a pool of word processing typists. Over time, word processing migrated from the pool to executives’ assistants and, eventually, to the desks of the executives themselves. Once word processing reached the desks of the executives and other knowledge workers, word processing shifted from being a mechanical function to being a creative one. Knowledge workers crafted their messages as they took form on their screens. The same will be true of predictive analytics. We are going to see the hands-on management of predictive analytics studies migrate from Big Data specialists to the desktops (and laptops) of executives who need to think through, propose, and defend policy statements, strategic plans, and operational or tactical initiatives.

User experience – Educators often don’t know a student is having a problem until they see the student failing (or just barely passing) quizzes and tests. But, even when they recognize the problem, they don’t know the reasons any given student is falling behind. Big Data will help students by recognizing the problems they have as those problems occur. Then it can offer tutorials that address those problems as they occur – not days or weeks later when it may be too late to affect the students’ learning trajectories.

Real time quiz evaluations and corrective action. — As computers and tablets become ever more pervasive in classrooms, schools at all levels will be better able to collect digital breadcrumbs about how students perform on quizzes and determine what corrective action is required. This is going to eventually become the norm. Seven Ross, a professor at the Center for Research and Reform in Education at Johns Hopkins University agrees. He said, “Most of us in research and education policy think that for today’s and tomorrow’s generation of kids, it’s probably the only way.”

Privacy, privacy, privacy The privacy of student and family data will continue to be a hot issue. Over time, however, the benefits of sharing data with student identification data will outweigh the concerns of the general public. Sharing this data among qualified research professionals will become more socially acceptable not only as technological safeguards are put into place, but as they are accepted as being appropriate. In practice, society will discover that the student data they thought was secure, is not. Witness the data breach at Equifax that spilled confidential data about 143 million people. Do you remember the data breaches at Target and Home Depot? Again, tens of millions of people who trusted these companies with their credit card information were affected.

Learning Analytics and Educational Data Mining – We are seeing a new professional discipline emerge. The professionals in this field will have both the professional and technical skills to sort through the masses of unstructured educational data being collected on a wholesale basis, know what questions to ask, and then drill through the data to find useful, defensible insights that make a genuine difference in the field of Education. The demand for these specialists is likely to outstrip the supply for many years to come.

Games We are likely to see far more games introduced into the educational curriculum than we’ve ever seen before. Games are not only proven to be instrumental in the learning process, they also lend themselves to data acquisition for immediate or later analyses.

Flipped classrooms The Kahn Academy has reversed the historical process of delivering course material during class time and assigning homework to be handled out of class. It their flipped classrooms, students watch streaming videos at their leisure out of class. Class time is dedicated to providing students a forum where they can work through their problem sets and ask for – and get – help as they need it. This flipped classroom is going to become far more widespread because our technologies today enable it – and it just makes a lot of sense.

Adaptation on steroids Adaptation is nothing new. It’s been going on for thousands of years. The idea is that course material or explanations or problem sets or tutoring is tailored to the individual needs of the student. But when we put that adaptation on steroids, we see a shift in “kind.” In other words, we see something that was not present before. Today we can monitor every move students make, not just count the right and wrong answers they give to a quiz question. By analyzing facial expressions, delays in responding, and a myriad of other variables, we can tailor make and deliver a tutorial specifically suited to a student’s learning problem at the moment the problem occurs.

Institutional evaluation Schools have always presumed to grade their students. Until relatively recently, it was presumptuous for students to grade their teachers or their schools. Now it is becoming common practice. In fact, Big Data will play an ever-growing role in assessing the performance of individual instructors. More importantly, Big Data will rank order universities, colleges, and high schools on a wide range of variables that can be supported through empirical evidence. True, some of that evaluation will be based on “sentiment” – but much of it will be based on hard analytics that would have been too time consuming or too expensive to collect and analyze in a holistic manner.

The Jury Is Still Out

In spite of all the investment, the excitement, and the promise of Big Data in Education, we still don’t have enough experience to make categorical claims about its value. We are still struggling the top Big Data challenges we face.

In an article in The Washington Post last year, Sahlberg and Hasak claimed that the promised benefits of Big Data have not been delivered. As a visiting professor at The Harvard Graduate School of Education, Sahlberg is an authority we should listen to. He claims that our preoccupation with test results reveal nothing about the emotions and relationships that are pivotal in the learning process.   Our commitment to judging teachers by their students’ test scores has the effect of steering top performing teachers away from low performing schools – exactly where they are most needed. There are extensive efforts to evaluate both teachers and students. However, according to Sahlberg, this has NOT led to any improvement in teaching in the US.

The most that Big Data can offer is an indication of a high correlation between one factor and another. It cannot tell about cause and effect. In fact, cause and effect argments are difficult for people to make – and yet they are instrumental in building compelling arguments. Having said that, it is revealing to recognize that finding high correlations in other fields – even without a demonstrated cause and effect relationship – have proven to be quite beneficial.

Digitally Transforming Healthcare Industry

digital healthcare

Big Data Has Changed the Practice of Healthcare Forever – and the Change is Just Beginning. Healthcare organizations – old and new – are investing heavily in Big Data applications.

Big Data projects process data measured in petabytes to deliver significant healthcare benefits. Only a small proportion of that data comes from traditional databases with well-structured data. Instead, almost all of the data comes from sources that are messy, inconsistent, and never intended for a computer to use. I’m talking about messy, unstructured patient records. Accessing this unstructured data and making sense of it gives health care professionals and leaders insights they would never have otherwise. They directly affect the way health care is delivered on a patient-by-patient basis.

I’ll give you four real-world examples the health care industry has already realized. We’ll take a quick look at Apixio, Fitbit, the center for Disease Control, and IBM’s Watson Health.

APIXIO

Medical research has always been conducted on randomized trials of small populations. No one tried to conduct massive healthcare research using all the data on all patients because the work would have been over whelming. Limiting the size of the data sets researchers used made their research manageable. Working with small sample sizes creates methodological flaws of its own. This is not to criticize those studies but to recognize the limitations of the research outcomes based on the limitations of what was feasible at the time those studies were conducted.

Apixio set out to change all that. Apixio developed mechanisms for conducting healthcare research based on studies of actual patient healthcare records. Their mechanisms leverage both Big Data and machine learning. Further, they work with ALL the patient healthcare records a facility has to offer – not just a randomized subset. As new patients are treated, Apixio collects data about the symptoms, diagnoses, treatment plans, and actual outcomes. By integrating these new cases into the mix, the company can quickly determine what works and what doesn’t. The difference between discovering the effectiveness of healthcare treatment programs based on limited clinical research studies and those based on analyses of the effectiveness of treatment programs based on reviews of ALL patients can be dramatic. I’m talking here about studying the treatment outcomes for all patients, not just a small number included in clinical research studies.

Only about 20% of the patient healthcare records reside in well-ordered databases. 80% of the data is messy, unstructured data. I’m talking about the GP’s notes, consultant’s notes, and forms prepared for Medicare reimbursement purposes. Working with unstructured data used to be problematical. Institutions had to hire and train “coders” who would read free form materials (handwritten notes, typed notes, etc.) and capture the meanings of those notes in a form suitable for computer processing. Apixio dealt with this issue quite differently. It used computer based algorithms to scan and interpret this data. The company found that its computer assisted techniques enable coders to process two to three more patient records per hour. Further, the coded data it created this way can be as much as 20% more accurate than the manual only approach.

This computer-assisted approach also finds gaps in the documentation. In one nine-month period, Apixio reviewed 25,000 patient records and found 5,000 records that either did not record a disease or didn’t label it correctly. Correcting the data can only improve diagnoses and treatment programs.

Apixio does far more than produce studies that physicians can use to inform their treatment plans. It takes the next step. It reviews the healthcare records of each patient and develops personalized treatment plans based on a combination of the data it has collected for that patient and the results of its analyses of practice-based clinical data. This enables physicians to only order the tests that are useful and avoid expensive but worthless procedures.

This pays off handsomely for insurance companies that treat patients who are enrolled in the Medicare Advantage Plans. Under these plans, Medicare pays a “capitated payment.” This is a payment paid to treat patients based on their expected healthcare costs. By tailoring the diagnostic tests and treatment programs by individual, the company is able to reduce its costs dramatically. Those savings drop directly to the bottom line.

It’s not just the insurance companies that benefit, though. Patients benefit as well. Patients are not required to undergo inconvenient or painful procedures that would provide no benefit.

FITBIT

Fitbit is the leader in the sale of wearable devices that track fitness metrics, although Apple is hot on its heels with its Apple Watch. Fitbit sold 11 million devices between its founding in 2007 and March 2014. These devices track fitness metrics such as activity, exercise, sleep, and calorie intake. The data collected daily can be synchronized with a cumulative database that allows users to track their progress over time.

The driving principle here is that people can improve their health and fitness if they can measure their activity, diet, and its outcomes over time. In other words, people need to be informed in order to make better fitness decisions. Fitbit provides users with progress reports presented in a preformatted dashboard. This dashboard tracks body fat percentage, body mass index (BMI), and weight among other metrics.

Patients can share their data with their physicians to give them an on-going record of their key healthcare parameters. This means that doctors are not forced to rely on the results of tests that they order on an infrequent basis. To be fair, however, not all physicians are open to treating the data their patients collect on their own to be as credible as that collected in a clinical setting.

Insurance companies are prepared to adjust their premiums based on the extent to which their policyholders look after themselves as measured by Fitbit. This means that policyholders are required to share their Fitbit or Apple Watch data with the company. John Hancock already offers discounts to those who wear Fitbit devices and the trend is likely to spread to other insurance companies.

The fastest growing sub-market for Fitbit is employers. Employers can then provide their employees with Fitbit devices to monitor their health and activity levels (with their permission).

The CDC and NIH

The Center for Disease Control (CDC) and the National Institutes of Health (NIH) are leaders is applying Big Data identifying epidemics, tracking the spread of those epidemics, and – in some cases – projecting how they are likely to spread.

The CDC is tracks the spread of public health threats including epidemics through analyses of social media such as Facebook posts.

The NIH launched a project in 2012 it calls Big Data to Knowledge or BD2K. This project encourages initiatives to improve healthcare innovation by applying data analytics. The NIH website says, “Overall, the focus of the BD2K program is to support the research and development of innovative and transforming approaches and tools to maximize and accelerate the integration of Big Data and data science into biomedical research.”

A couple years ago the CDC used Big Data to track the likely spread of the Ebola virus. It used BigMosaic. BigMosaic is a Big Data analytics program that the CDC coupled with HealthMap. HealthMap is a data base that maps census data and migration patterns. HealthMap shows where immigrants from various countries are likely to live – right down to the county or even the community level. When the CDC identifies countries where there is a public health problem – like the Ebola virus – it can link that census data showing the distribution of expat communities with airline schedules to determine how the disease is likely to spread in the US – or even other countries. This allows the CDC to track the spread of disease in near real time. In some cases, it could even project how diseases are likely to spread.

These Big Data applications merge data about weather patterns, climate data, and even the distribution of poultry and swine. These applications present this data in a graphic form that makes it easier for epidemiologists to visualize how diseases are spreading geographically. The benefit, of course, is that the CDC and the World Health Organization can deploy its scarce resources to the areas where they can do the most good. They can do that because Big Data provides the tools to chart the spread of diseases by international travellers.

The Center for Disease Control now uses Big Data linked with Social Media to forecast the spread of communicable diseases. Historically, CDC tracked how they observed the reported spread of diseases; forecasting how diseases will spread is a new ball game. The CDC ran competitions for research groups to develop Big Data models that accurately forecasted the spread of diseases. The CDC received proposals for 28 systems. The two most successful were both submitted by Carnegie Mellon’s Delphi research group. These models are not predetermined but, instead, leverage Machine Learning to develop tailored models to forecast the specific spread of each disease.

The model is by no means perfect. The CDC gave the Carnegie Mellon model a score of .451 where 1.000 would be a perfect model. The average score for all 28 models was .430. That means that the model the CDC will use is the best available and much better than nothing, but still has considerable room for improvement.

The Delphi group is studying the spread of the dengue fever. It has plans to study the spread of HIV, Ebola, and Zika.

IBM and Watson Health

IBM is particularly proud of Watson, its artificial intelligence system on steroids. Although Watson has produced some stunning results such as winning the TV game Jeopardy against the two best Jeopardy contestants, our interests today are in healthcare.

Watson is machine learning at its finest. In the healthcare field, its managers feed it an on-going stream of peer reviewed research papers from medical journals and pharmaceutical data. Given that Big Data knowledge base, Watson applies that knowledge to individual patient records to suggest the most effective treatment programs for cancer patients. Watson’s suggestions are personalized to each patient.

Watson’s handlers don’t program the software to deliver predetermined outcomes. Instead, they apply Big Data algorithms to enable Watson to learn for itself based on the research it reviews as well as the diagnoses, treatment programs, and observed outcomes for individual patients.

IBM is partnering with Apple, Johnson & Johnson, and Medtronic to build and deploy a cloud-based service to provide personalized, tailored guidance to hospitals, insurers, physicians, researchers and even individual patients. This IBM offering is based on Watson – its remarkably successful system that integrates Big Data with machine learning to enable personalized healthcare on a massive scale.

Until now, IBM has used Watson in leading edge medical centers including the University of Texas MD Anderson Cancer Center, the Cleveland Clinic, and the Memorial Sloan Kettering Cancer Center in New York. Given its successes to date, IBM is now ready to take its system mainstream and broad based.

Big Data is transforming the Food and Beverage Industry

food and beverage and big data

A 2015 McKinsey study reported that food retailers can improve their operating margins by up to 60% simply by harnessing the power of Big Data. In order to keep pace with consumers’ fickle buying habits, food and beverage companies need to begin combining raw point-of-sale data with the Big Data that is now available. Analytical capabilities then can transform this data into meaningful intelligence that can inform management decisions. Those decisions will boost sales and improve their overall bottom-line performance. For example, food and beverage retailers, suppliers, and trading partners can share Big Data to ensure they offer the right products, in the right quantities, in individual stores and online.

Big Data Helps Drive In-Store Revenues

Food and beverage companies can use Big Data to increase traffic to their brick and mortar stores. The GPS location capabilities of most mobile phones provide a channel for retailers to display “pop-up” promotional messages that are highly relevant to an individual’s specific location and past purchasing history. A shopper, for example, standing in a frozen food aisle can receive a text offering a discount for a certain ice cream flavor nearby that she has bought in the past.

Big Data Helps Schedule Food Deliveries

Big Data can optimize on-time deliveries of orders to restaurants, food chains, and home customers. Big Data will collect recent information from various sources about road traffic, weather, temperature, routes, etc. and provide an accurate estimate of the orders’ times. This data analysis helps ensure that food/beverage companies don’t waste their resources transporting stale products. They will deliver perishable food items when they are fresh.

Big Data Helps Allocate Food Across the Country

By using Big Data to track purchasing decisions from wholesalers down to the customer level, food and beverage companies can learn what products are being purchased and where. For example, a company might learn that customers in the Pacific Northwest are purchasing 15% more of a diet beverage than the nationwide average. Further, they may learn that the Midwest is purchasing 15% less of that same beverage. This knowledge allows the company to know to ship more of the diet product to the Pacific Northwest and less to the Midwest.

Big Data Helps Maintain Consistent Food Quality

Big Data allows restaurants to maintain consistent quality of their products. Consumers expect the same taste in food at the chain restaurants they love. The taste of food not only depends upon the proper measurement of ingredients, but also on their quality, storage, and season. Big Data analytics can analyze such changes and predict the impact of each on the food quality and taste. The insights from these analyses will be used to identify pain points and suggest measures for improvement.

Big Data Analyzes Customer Sentiment

Big Data can analyze customer sentiment by monitoring customer emotions expressed on social media networks. Food companies use sentiment analysis to track their customers’ emotions. They can assess negative reviews and take appropriate preventive steps before the word spreads. Large food retailers like McDonald’s, KFC, and Pizza Hut have found this particularly valuable.

Big Data Has a Good Idea What Customers Will Purchase Next

Food and beverage companies use Big Data for “market basket analysis.” Market basket analysis is a technique which predicts the most obvious item that a customer is likely to purchase next based on her purchase history and the items already in her cart. Food retailers and restaurants use these projections to create effective combo deals and improve their marketing messages. For example, if the market basket analysis identifies that a customer prefers a muffin with her coffee, then it can create a combo to help her enjoy them together.

Big Data in the Home Improvement Industry

home-improvement-big-data

The Home Depot is the unchallenged leader in the home improvement retail sector in terms of applying Big Data to advantage. The Home Depot collects data from its own website, promotional emails, and social media. It uses that information to drive traffic to its stores by improving their marketing programs. As a result, The Home Depot is beating investors’ expectations and is described as “Amazon-proof.” Interestingly, this is happening at the same time many retailers are struggling to connect with their customers and deliver meaningful results to their investors.

The Home Depot Will Spend $4 Billion Over Three Years on Big Data

The Home Depot is spending roughly $4 billion from 2016-2018 to improve the company’s e-commerce platform and physical stores and bolster the link between the two. The Home Depot is creating a system that allows customers to easily order what they need online, have their employees collect these orders in store, and let their customers drop by the stores to pick up their purchases in moments. This buy-online, pickup in-store (BOPIS) model has proven vital for The Home Depot. According to the 2016 Internet Retailer’s report, about 25% of The Home Depot’s $3.76 billion in total website sales, or nearly $1 billion, came exclusively from their BOPIS program.

The Home Depot Uses Big Data to Reconfigure Its Supply Chain

Using technology and Big Data to rethink The Home Depot’s supply chain has been a key part of the company’s success. It will prove increasingly vital as the company moves forward. The Home Depot has used Big Data to improve its supply chain several ways:

  • Dynamic ETA, which gives customers delivery data and delivery estimates based on their exact location.
  • Sync is a multi-year project that will reduce shipping and inventory costs through better coordination between stores and distribution centers.
  • Its Customer Order Management System helps balance store and web inventories. It also enables buy-online, pickup in-store customers to choose the store with the shortest wait time for pick up rather than requiring customers to choose stores only by location.
  • An easy-to-use website and mobile shopping platform will make the customer experience more seamless while allowing The Home Depot to better collect customer data. It will use that data to further improve its Big Data initiatives

Wayfair Is a Winner Due to Big Data

Home goods e-commerce company Wayfair was created in the digital ecosystem of 2002. Since then it has thrived due to its consistent commitment to and use of Big Data. In 2016, Wayfair introduced a search with photo capability. This capability taps into Wayfair’s Computer Vision System. This Vision System is based on the company’s own machine learning techniques and its massive proprietary data sets. This system allows customers to upload images of furniture they are looking for and Wayfair will give customers search results that match the image provided as closely as possible.

This data collected from this visual search feature creates a powerful feedback loop, which makes Wayfair’s results more useful for customers. Wayfair measures the impact of its photo search system by tracking the number of loyal repeat customers. In the second quarter of 2017, the numbers of orders per customer and repeat customers both increased year-over-year. The number of repeat customers grew to be more than 61% of total orders in the second quarter of 2017; this compares well to the 58% in the second quarter of 2016. Repeat customers placed 2.6 million orders in the second quarter of 2017, an increase of 55% year-over-year. These increases in repeat customers and their orders is testament to the effectiveness of the Big Data driven photo search capability.

Big Data is everywhere in the Retail Industry

Retail and big data

Big Data is everywhere in the Retail Industry. It would be be hard to find any part of the management of retail operations that is not deeply touched by Big Data. In fact, it is already clear that to survive in the Amazon era, all retailers will have to rely heavily on Big data to help them store the right merchandise, at the right times, in the right quantity, and at the right price. Those that ignore it will die.

This does not mean that only the large companies that have the resources to exploit Big Data will thrive while small companies will die. Small companies will be able to harness the power of Big Data – but they are likely to do so through niche consulting firms that have developed the professional and technical skills, hired a stable of experts, built the computing platforms, and acquired access to the massive data required to operate effectively in this field. Smaller retail outfits that buy their services will find them expensive – but the benefits should far outweigh those costs.

The biggest costs will likely not appear on the company’s financial ledger. The biggest costs will be the time, focus, and energy that senior and middle management will need to invest to come to grips with how to leverage Big Data and how to build investment arguments that make sense. This last point proved to be a major stumbling block when general data processing began to make inroads into large and then medium sized companies some 40 years ago. It is quite liable to prove to be a stumbling block in the application of Big Data as well.

Online and Store General Merchandisers

With regards to department stores, the most promising use of Big Data is through recommendation engines.

Recommendation engines — Recommendation engines use the historical purchasing decisions of customers to predict future purchases and recommend other products to customers that they may be interested in. Big Data using these engines have the potential to generate accurate product recommendations to customers before they even leave the webpage. Amazon, for example, sees a 30%-60% revenue uplift due to these recommendations alone. These recommendation engines are a widely-used way of incorporating Big Data into department stores because they are easy to implement and have an immediate positive impact on revenue: recommendation engines are shown to have the potential to boost revenue by 24% on average.

Trend Forecasting — The second biggest use of Big Data in department stores is predicting trends and forecasting demand. Trend forecasting algorithms comb social media and web browsing habits to find what products and services are causing buzz. These algorithms also analyze ads to see what products marketing departments are pushing. The algorithms then compare the data gathered from social media with the data gathered from current ads to accurately predict what the top selling products for a given quarter will be, how to better market products, and how to develop more cost-effective marketing strategies.

These predictive algorithms assist retailers in making better informed decisions about stocking and product ordering. This capability is particularly helpful during the holiday season when shopping rates increase – machine learning can use past historical shopping data to forecast future purchasing and revenue outcomes. It is anticipated that this kind of predictive analysis in department stores will grow from a $2.7 billion global market in 2015 to a $9.2 billion by 2020, a CAGR of around 27%. In the US alone, predictive analysis from big data is expected to reach a $3.6 billion market by 2020. As of 2015, less than 25% of department stores had adopted predictive analytics. Between 2018 and 2020 this is anticipated to grow to 70%.

After identifying trends, Big Data (particularly in regards to customer economic and geographic information) can be used to understand where and when this demand will come from. This helps business to generate effective marketing and advertising campaigns. For example, Ozon.ru (Russia’s first online retailer) analyzed that demand for books rises when it gets colder during the winter months, and thus increases the number of book ads their customers see. This ability to accurately forecast demand, that comes with using Big Data, is helpful in lowering a business’s costs, as it is expensive to keep excess inventory on shelves and having too little stock drives down revenue and decreases customer engagement and loyalty.

Price Optimization — The third main use for big data in department store retail is optimizing pricing. In retail, Big Data can be used to help assist in determining when prices should be dropped (marked down optimization) or when they can be raised without customer dissatisfaction (reflected by a lack or reluctance to purchase). Previously, before the advent of Big Data, markdowns occurred at the end of the buying season, with stores hugely discounting their remaining merchandise. The problem with this approach is that demand is already gone by the time that markdowns occur. Big Data analytics demonstrate that what is actually most effective in increasing revenues is to gradually lower prices once demand initially begins to decrease. When the US retailer Stage Stores employed this technique, it could increase its traditional end of season sales revenue over 90% of the time.

Weather Optimization — Big Data is particularly helpful in optimizing prices in accordance with weather conditions. The Weather Company (part of IBM) has found that “weather is one of the largest swing factors for economic and business performance” – 60% of shoppers change their behaviors when it is either raining or it is hotter than average out. A 1o F drop in temperatures below 60o produces a 2%-3% drop in apparel sales. Approximately 60%-70% of a retailer’s excess expenses are due to weather-impacted supply chain costs (e.g., trucks held up due to poor weather conditions). In the UK, if temperatures reach over 65o F there is a 22% rise in fizzy drink sales, a 20% rise in juice sales, and a 90% rise in lawn furniture sales. In the US, temperatures below 64o F increase sales in soup, porridge, and lip care. Food, drink, pharmaceutical, and apparel sales are the categories most impacted by weather.

Targeting Individual Consumers — The final use of Big Data in general retail is identifying individual customers and how to most effectively market to and target them specifically – whether through email, text, or location-based alerts. Retailers, for example, can install sensors in their stores to identify customers’ locations through their smartphones. If a customer’s smartphone’s WiFi is turned on, it will attempt to connect with the store’s internet and this is how a customer’s location can be sensed and tracked. Retailers can then track what specific stores she visited, what departments she visited, and what products she purchased at what time and on what date. This information can be used to better understand each customer’s movements and patterns when it comes to shopping. Retailers can then use this information to reorganize their stores to optimize customers’ shopping experience and even to offer special deals and coupons to bring further business to their stores.

General Online Retailers

In addition to the Big Data applications listed above, there are four other applications that apply specifically to online retailers.

Dynamic Pricing — Dynamic pricing is Big Data at its finest. Dynamic pricing is highly responsive to external factors such as consumer demand and competitors’ prices. Dynamic pricing collects trend data about which products are being bought to automatically adjust prices. Its analytic capabilities slowly increase prices on items that are popular and discounts prices on items that are less popular. Dynamic pricing is key to increasing online retailers’ overall revenue.

Individual Customer Experience — Big Data analysis gives sellers insights about customer behavior and demographics and provides customers a personalized experience. For instance, customer data can be used to create buyer-specific e-mails for promotional campaigns. For example,  Amazon’s “Customers who bought this item also bought…” recommendation feature increased sales nearly 30% when it was first implemented. This is a simple and remarkably effective way to keep customers on a retail site and keep them buying. Consumers might have reservations about their favorite retailers knowing intimate details about their lives, but they’re going to love the results in practice. Sharing all those personal tidbits is helping companies like CNA identify fraud and prevent customers from having their identities compromised. Retailers can use information from live transactions and other sources (such as social feeds and geo data from apps) to prevent credit card fraud in real time.

Better Quality of ProductsAmazon is the e-commerce standard when it comes to smart, effective pricing. It can easily access its competitors’ pricing data and respond quickly with its own deals — changing some items’ prices up to 10 times a day. The industry-wide shift to dynamic pricing means that companies will no longer be competing on price alone. They will now need to establish a reputation for offering their customers the best value and the best experience.

Reduce incidents of shopping cart abandonment — Companies can also use cross-device tracking to reduce shopping cart abandonment rates. EBay research found that the average consumer uses as many as three or five devices or platforms during the course of her buying journey. Mapping this journey with data allows retailers to help their customers’ transition from one device to the next and complete their purchases.

Big Data is transforming the Auto Industry

Big Data in Auto Industry

The next few years are going to see an explosion in the rate at which detailed data is collected about the moment-by-moment operation of nearly all new cars. This data will be stored and collated in centralized databases that make Big Data analyses possible. McKinsey published a report in 2014 that estimated that the global market for connectivity components and services for cars was $38 billion that year. The report went on to project that the data-driven connected car industry would grow to $215 billion.

Big Data Will Transform Fleet Management

Fleet management will enjoy the greatest benefit from this Big Data analysis. Auto makers will now be able to determine which settings and features drivers actually use. This will help them improve their marketing. It will also identify the features that drivers really care about; this will focus the auto makers’ on-going R&D efforts.

Further, automakers can easily monitor their cars, identify potential problems, and issue maintenance calls. This will help maintain their fleets in peak performance. They will be able to identify drivers who are abusing their cars; they can issue advisories based on that information. All of these efforts are geared to minimizing the maintenance costs and maximizing the performance of their fleets.

Big Data Is Transforming At least Five Other Auto Practices

City Planning — City planners and engineers can use this same data to improve their plans for roadways and traffic flows.

Onboard Navigation — Navigation systems can use real time driving data to discover and display the fastest routes based on current traffic patterns.

Insurance Rates — Insurance companies can access the Big Data collected from connected cars to monitor each driver’s performance and, potentially, use this information to adjust rates and to determine what really happened in accidents.

Auto Dealer Marketing Campaigns — Dealers can use this Big Data to assist in planning their marketing campaigns. For example, Bullseye Prospecting is a product that helps dealers and their marketing agencies automate their marketing campaigns by leveraging third party and internal data on consumer behavior, incentives, and vehicle equity/valuation. This prospecting tool can cut the $600-$800 average per-car cost of sale by about 30%. It also helps dealers by sending a detailed, personalized message to their best customers at precisely the right time to prompt sales and services.

Used Car Valuations and Inventory Management — The 2016 Black Book survey indicated that nearly two-thirds of dealers are using 30%-50% more data since 2014 to establish vehicle valuations. This data also helps them set regional pricing, determine the appropriate supply of cars, and assess each vehicle’s history to manage their inventories. Some 69% of these dealers say the data is giving them better insights on pricing and profitability. 58% say Big Data is providing better insight into managing their inventory procurement. The majority of dealers believe they can avoid a market catastrophe similar to the one in 2008 because the data allows them to make more accurate decisions.

Cheat Sheet: Everything you Need to know about Big Data and the Mortgage Industry

mortgage industry

It’s common knowledge that Big Data has arrived in the Mortgage Industry. One of the most important questions leaders in our industry need to ask themselves, of course, is “Where is it all going?”  We’re going to give you our take on this issue in just a moment. But first, let me give a short synopsis of what Big Data is for those who are new to this field.

 

What is Big Data?

Historically, all the data computers used was set up in highly structured data bases. In other words, we had separate fields for each piece of data and we spent a lot of time and effort to make sure all the data was clean and accurate.  Big Data does away with that. Big Data reads data that was never meant to be analyzed by a computer.  This includes everything from Tweets and Facebook postings to newspaper clippings. All of these were written for human consumption, not for computer processing.

Big Data cut through that.  Big Data is able to read all of this unstructured, messy stuff that was never meant for computers and then makes sense of it.  It other words, it can read Tweets and Facebook postings and data from hundreds of different sources that are written in incompatible styles and assign meaning to what it’s reading.  In the mortgage industry, this means that we can now tap into huge reservoirs of information that were always available to us before – data that is in the public domain, but we could never get a computer to work with it.

Now let’s take a look at where Big Data is going to take the mortgage industry.

 

Big Data can be used to improve already existing mortgage processes.

  1. Pre-populate mortgage applications

We believe that Big Data is going to pre-populate mortgage applications. In other words, Big Data will mine data from bank records, publicly available data bases, social media sites, and other sites to collect all or nearly all the information required for a mortgage application. This will leave the applicant with the option of either clicking to ratify the pre-populated application as accurate or, on the other hand, edit a few fields here and there to fine tune the application.

Another approach here is for prospective home owners to complete their mortgage applications as they always have and then the mortgage company’s computers will compare the pre-populated versions with the applicants’ versions to identify discrepancies.

In either case, the objective of this exercise will be to enhance the accuracy of the data in the applications at the same time the system reduces the burden on the applicants.

  1. Computer algorithms to score mortgage applications

We can also see that computer algorithms will score mortgage applications using machine learning algorithms. These algorithms will approve or deny the applications immediately. Approved applications may be forwarded for processing right away. Rejected applications will qualify for a human review if the applicants don’t feel they have been scored properly. The goal of this instant evaluation will be to eliminate the delays in the current manual evaluation process – delays that are often measured in weeks.

We can see that Big Data will be instrumental in projecting the number of applications for new mortgages or refinanced mortgages in specific geographies and specific time frames. Further, Big Data will project the total value of these mortgages. These projections will help mortgage companies reposition their people and processing power based on projected market demand.  These projections would be based on the current mortgage portfolio the industry has in place in various geographic areas coupled with scenarios about shifts in mortgage interest rates.

  1. Big Data analysis of non-monetary defaults

We can expect to see Big Data analysis of non-monetary defaults on mortgages to become more common if not universal.  Here, I’m talking about flagging accounts where payments were made early and with an extra principal payment to being made on time with no extra payment. Or we will find homeowners whose home owner association is suing them. Or maybe the local government put a lien on the property on the grounds that the property is uninhabitable. Or the couple is getting divorced.  These are all early warning signs that Big Data will track as a matter of course. 

  1. More Objective Residential Property Appraisals

Residential property appraisals will become more objective and more accurate. Big Data will propose the most appropriate neighborhood comparable. It will develop appraisals using industry standards that will be driven by an algorithm. MReport claimed that, “More than 30 percent of loans fall short of the collateral valuation agreed to between customer and loan officer.” Big Data will help fix that.

Big Data is bringing big changes

 

How is the Business of Big Data Affecting the Mortgage Industry? 

  1. Increase in spending on Big Data

Spending on Big Data applications and technology will soar.  In 2014, 2015, and 2017, we’ve seen Big Data spending in the mortgage industry at $2.6 billion, $2.8 billion, and $3.2 billion respectively. We are going to see spending on Big Data continue to climb as the number of success stories grows.

  1. Increased need for big data analysts within the mortgage industry

The mortgage industry is going to suffer a severe shortage of Big Data analysts who know how to manipulate the huge and ever-growing quantities of data that will become available. We are going to need professionals who can manage the enquiries in ways that lead to highly defensible conclusions.  The growth in the demand for Big Data analysts is going to outstrip the supply.

  1. Increase in consultants

We are going to see the rapid growth of specialized firms that assist mortgage companies plan for and implement Big Data projects.  This function is going to outsourced rather than treated as a core competence for several reasons.  First, most mortgage companies will find it far too expensive to build their own in-house facilities.  Second, the process of building their in-house facilities will take too long and are liable to face many dead-end alleys. Third, they will not be able to attract the talent they need at a price they can afford.  Fourth, the management in existing mortgage companies will need to go through a steep learning curve that is best handled by a specialized firm.  Over time, we can expect mortgage companies to build teams of in-house Big Data talent while leaving the technologies to cloud-based firms.

As a result, Small mortgage companies that cannot afford to buy the necessary technologies will be squeezed out of business. Larger companies will buy them.

  1. Automation and Big Data will be an important pair

Mortgage companies are going to increasingly focus on building higher quality portfolios with fewer staff.  The only way to have a smaller staff complement and a larger mortgage portfolio is through automation.  That should be obvious.  Automation in general and Big Data in particular is the way of the future.

 

*Warning*: New Players

We are going to see many new, non-traditional players in the mortgage industry.  They will spring from places like Silicon Valley.  They will offer better service at lower costs than banks and traditional mortgage companies. For example, the Lending Club facilitated $3.6 billion in loans in the first six months of 2015.  Likewise, Prosper is growing fast.

 

How Does Big Data Help the Mortgage Industry Keep up with New Regulations and Laws?

We can expect the Federal Housing Administration to develop a growing number of regulations that the mortgage industry must comply with.  Many of these regulations will apply to a company’s portfolio of mortgages rather than any given mortgage.  Mortgage processors will continue to ensure that they comply with application specific compliance issues, but they cannot be expected to deal with portfolio-wide compliance issues.  In fact, it is unlikely that it is humanly possible to do so.  This means that mortgage companies will necessarily embrace Big Data to do that job for them.  Failure to do so means that they will face stiff penalties in court.  It is far better for these companies to catch non-compliance failures on their own and take action than to face their regulators in court.

Carl Pry, a managing director at Treliant Risk Advisors, said “It’s in every bank’s best interest to get one step ahead of the regulators and understand what that regulator is going to know and find. They need to resolve any discrepancies [and] do any file review analysis needed to be able to explain any disparities before the regulators find them.”

Here are a few more examples of how Big Data helps keep Mortgage companies out of legal trouble:

  • New regulations and compliance issues are making the appraisal process increasingly difficult. That, coupled with the fact that the number of qualified appraisers is not keeping up with the demand, means that the industry must necessarily rely on broad based, sophisticated tools like Big Data. This trend will continue.
  • Big Data is going to prove instrumental in flagging potential fraudulent mortgage transactions. The FBI and other law enforcement agencies are developing increasingly sophisticated techniques to identify potential abuses. Big Data algorithms will incorporate these fraud detection techniques into their algorithms and trigger pre-emptive enquiries.

 

Big Data, The Mortgage Industry, and the Mortgage Buyer: How Relations Can Be Vastly Improved

 Decades ago the local bank manager knew his customers well and was in a position to make an informed judgment call about the amount of credit to be extended.  Bank managers rarely make those decisions in retail bank branches and mortgage companies today.  Rather, those decisions are made by a committee – often in another city.  We need to reinvest some humanity into the decision-making process. Incorporating social media will go a long way in that direction.

The mortgage approval process is going to become more transparent. At the moment, borrowers only know whether they are approved or rejected, but they rarely have an idea why they were slotted where they were.  In the future, mortgage companies will be in a position to coach their applicants very specifically about what they need to do to be approved.

Additionally, Big Data is going to help reduce the risk in mortgage lending. Big Data will help brokers advise their clients about school performance and community crime rates. This will help the buyers make better-informed decisions and, ideally, lead to lower risk mortgages.

**Warning:**Potential future issues: Privacy

The privacy issue is going to become a big issue in Big Data.  Although everything Big Data practitioners do is legal, the act of mining social media on a wholesale basis was never considered when social media sites were first introduced.  We are going to see some interesting and instructive debates on ethical issues over the next decade before we see a consensus emerge.  Any legislation passed before those ethical debates come to closure will prove to be ill-conceived and counterproductive.

 

Conclusion

Just to wrap up, I want to make it clear that Big Data is already having an impact on how the mortgage industry operates and we are still at the early stages. We are going to be in for a very interesting ride over the next few years.

If you want to learn more about this, feel free to get in touch with me directly.  I’m Eskinder Assefa, CEO of SOMAmetrics in Berkeley, California. We work with mortgage companies to help them realize their full business potential by improving their sales and marketing strategies and leveraging emerging technologies that have an impact on the bottom line.

4 Ways Big Data is getting Mortgage Companies the Information They Need

big data

The problem

In every other industry besides the mortgage industry, buyers know exactly what they are buying before they lay their cash on the table. Car buyers can read Consumer’s Reports and drive the car around the block. Camera and computer buyers can download YouTube reviews of any product on the market in less than 30 seconds. Mortgage originators do their best to collect all the information they can to determine whether a prospective mortgage will be paid as agreed.  They have their standard checklists of questions and they are free to ask more questions as the application process goes on.  But once the mortgage is put in place, the only way to see if the payments are made on time is to track actual payments.  No one can tell the future.  No one can tell if a mortgage holder is going stop paying.  No one can tell the future. Or at least that used to be the case.  Big Data is changing that picture.  Big Data can help us look into the future with some degree of certainty.  But before we get into how that works, let me give you a brief run down on what Big Data is.

What is Big Data?

Historically, all the data computers used was set up in highly structured data bases. In other words, we had separate fields for each piece of data and we spent a lot of time and effort to make sure all the data was clean and accurate.  Big Data does away with that. Big Data reads data that was never meant to be analyzed by a computer.  This includes everything from Tweets and Facebook postings to newspaper clippings. All of these were written for human consumption, not for computer processing. Big Data cut through that.  Big Data is able to read all of this unstructured, messy stuff that was never meant for computers and then makes sense of it.  It other words, it can read Tweets and Facebook postings and data from hundreds of different sources that are written in incompatible styles and assign meaning to what it’s reading.  In the mortgage industry, this means that we can now tap into huge reservoirs of information that were always available to us in the public domain, but we could never get a computer to work with.

Big Data is the Solution the Mortgage Industry Needs

Today, Big Data can tell mortgage companies whatever they want to know about the people who hold mortgages with them.  Big Data can operate as a kind of “distant early warning system” for account servicers.

1.Spending Analyzation

Big Data can look at the shops where your mortgage applicants buy their clothes and watches. Then it can determine whether those shops are in line with their stated incomes or are splurges.  That’s not to say there is anything wrong with an occasional splurge, but if someone consistently spends beyond her earnings, then something is wrong.

2.Social Media Analyzation

We all know the old adage that “birds of a feather flock together.” So, when you know who someone’s friends are, you know a lot about that person.  And where can you find out who someone’s friends are more easily than on Facebook? Big Data can collect a list of your applicants’ friends, build profiles, and assess applicants.  That assessment could accelerate the application approval or be instrumental in squashing it. Knowing the applicants’ friends can offer a second order benefit. If the company approves an applicant’s mortgage, then it can approach each of her friends as well.  This can be particularly lucrative for subprime mortgages.

3.Website Analyzation

Even knowing the websites your applicants visit is fair ball. Applicants who say they want to settle down and build a career but have recently spent a lot of time on overseas travel websites and airline websites are suffering some sort of a discontinuity.  It’s better to discover that earlier rather than later.  

4.Holistic Customer Account Analyzation

Big Data can look at the actual spending patterns of mortgage applicants and see if they are in line with their stated income.  If their spending is too high, they might prove to be good prospects for subprime mortgage at higher interest rates. Banks have historically operated in a highly siloed way.  What I mean is that the department that handles checking and savings accounts knows nothing about their customers’ mortgage accounts, car loans, or children’s tax deferred education savings programs. Big Data can pull this data together across the bank’s own internal databases without violating any confidentiality agreements.  This enables bank agents to make offers to their customers that are right on target.  Imagine a customer who has been surfing new car websites for several weeks but has not asked for a loan – yet.  When she stops into the bank on another matter, the teller could raise the question of a car loan, tell her the extent to which she has been preapproved, and direct her to the office that has already prepared the paperwork.  

So what’s the hold-up?

In spite of these advantages, only 38% of banks in 2013 were using Big Data that way, according to a survey Celent conducted that year.  There is no doubt that percentage has increased during the last four years. Some see the collection of this online data to be an invasion of privacy – and perhaps it is.  The jury is still out.  But as long as this information is in the public domain, it is hard to justify the argument that there is anything underhanded going on here. Nevertheless, customers who want to guard their data more carefully are free to limit access to their social media data to their “friends.” They can also instruct their browsers not to maintain histories or maintain “cookies.” This carries a cost, of course. It’s often very handy for a computer user to rely on her browser to maintain user names and passwords to accelerate logins. Full disclosure of web activity does not necessarily hurt customers, either. A bank could notify a user by email when someone is using her debit card to make a purchase that is out of character with her routine spending patterns.  If there is no cause for alarm, she could simply ignore the alert.  But if it is a threat, she could act immediately. By having a full picture of each customer’s browsing behavior as well as online and offline spending patterns, banks and other financial organizations can tailor offers that are genuinely appropriate and tailored to each customer.

The Future of Mortgage and Big Data

In the future, we can expect mortgage companies to use Big Data to access an ever-wider range of publicly available information to build an increasingly comprehensive profile of each customer. It will integrate arrest records, bankruptcy records, credit records, court judgments, property ownership, and library fines available from publicly available online data bases. We can also expect companies in the business of buying existing mortgages to handle their own due diligence using Big Data. Each mortgage for sale may become more of less attractive over time depending on the recent behaviors of their mortgage holders. If you want to learn more about this, feel free to get in touch with me directly.  I’m Eskinder Assefa, CEO of SOMAmetrics in Berkeley, California. We work with mortgage companies to help them realize their full business potential by improving their sales and marketing strategies and leveraging emerging technologies that have an impact on the bottom line.

The Future of Big Data in the Mortgage Industry

mortgage industry

It’s common knowledge that Big Data has arrived in the Mortgage Industry. One of the most important questions leaders in our industry need to ask themselves, of course, is “Where is it all going?” We’re going to give you our take on this issue in just a moment. But first, let me give a short synopsis of what Big Data is for those who are new to this field.

Historically, all the data computers used was set up in highly structured data bases. In other words, we had separate fields for each piece of data and we spent a lot of time and effort to make sure all the data was clean and accurate. Big Data does away with that. Big Data reads data that was never meant to be analyzed by a computer. This includes everything from Tweets and Facebook postings to newspaper clippings. All of these were written for human consumption, not for computer processing.

Big Data cut through that. Big Data is able to read all of this unstructured, messy stuff that was never meant for computers and then makes sense of it. It other words, it can read Tweets and Facebook postings and data from hundreds of different sources that are written in incompatible styles and assign meaning to what it’s reading. In the mortgage industry, this means that we can now tap into huge reservoirs of information that were always available to us before – data that is in the public domain, but we could never get a computer to work with it.

Now let’s take a look at where Big Data is going to take the mortgage industry.

Big Data in the Mortgage Industry

One important Big Data application is pre-populating mortgage applications. In other words, Big Data will mine data from bank records, publicly available data bases, social media sites, and other sites to collect all or nearly all the information required for a mortgage application. This will leave the applicant with the option of either clicking to ratify the pre-populated application as accurate or, on the other hand, edit a few fields here and there to fine tune the application.

Another approach here is for prospective home owners to complete their mortgage applications as they always have and then the mortgage company’s computers will compare the pre-populated versions with the applicants’ versions to identify discrepancies.

In either case, the objective of this exercise will be to enhance the accuracy of the data in the applications at the same time the system reduces the burden on the applicants.

We can also see that computer algorithms will score mortgage applications using machine learning algorithms. These algorithms will approve or deny the applications immediately. Approved applications may be forwarded for processing right away. Rejected applications will qualify for a human review if the applicants don’t feel they have been scored properly. The goal of this instant evaluation will be to eliminate the delays in the current manual evaluation process – delays that are often measured in weeks.

We can see that Big Data will be instrumental in projecting the number of applications for new mortgages or refinanced mortgages in specific geographies and specific time frames. Further, Big Data will project the total value of these mortgages. These projections will help mortgage companies reposition their people and processing power based on projected market demand. These projections would be based on the current mortgage portfolio the industry has in place in various geographic areas coupled with scenarios about shifts in mortgage interest rates.

Spending on Big Data applications and technology will soar. In 2014, 2015, and 2017, we’ve seen Big Data spending in the mortgage industry at $2.6 billion, $2.8 billion, and $3.2 billion respectively. We are going to see spending on Big Data continue to climb as the number of success stories grows.

The mortgage industry is going to suffer a severe shortage of Big Data analysts who know how to manipulate the huge and ever-growing quantities of data that will become available. We are going to need professionals who can manage the enquiries in ways that lead to highly defensible conclusions. The growth in the demand for Big Data analysts is going to outstrip the supply.

We are going to see the rapid growth of specialized firms that assist mortgage companies plan for and implement Big Data projects. This function is going to outsourced rather than treated as a core competence for several reasons. First, most mortgage companies will find it far too expensive to build their own in-house facilities. Second, the process of building their in-house facilities will take too long and are liable to face many dead-end alleys. Third, they will not be able to attract the talent they need at a price they can afford. Fourth, the management in existing mortgage companies will need to go through a steep learning curve that is best handled by a specialized firm. Over time, we can expect mortgage companies to build teams of in-house Big Data talent while leaving the technologies to cloud-based firms.

The privacy issue is going to become a big issue in Big Data. Although everything Big Data practitioners do is legal, the act of mining social media on a wholesale basis was never considered when social media sites were first introduced. We are going to see some interesting and instructive debates on ethical issues over the next decade before we see a consensus emerge. Any legislation passed before those ethical debates come to closure will prove to be ill-conceived and counterproductive.

Decades ago the local bank manager knew his customers well and was in a position to make an informed judgment call about the amount of credit to be extended. Bank managers rarely make those decisions in retail bank branches and mortgage companies today. Rather, those decisions are made by a committee – often in another city. We need to reinvest some humanity into the decision-making process. Incorporating social media will go a long way in that direction.

The mortgage approval process is going to become more transparent. At the moment, borrowers only know whether they are approved or rejected, but they rarely have an idea why they were slotted where they were. In the future, mortgage companies will be in a position to coach their applicants very specifically about what they need to do to be approved.

We are going to see many new, non-traditional players in the mortgage industry. They will spring from places like Silicon Valley. They will offer better service at lower costs than banks and traditional mortgage companies. For example, the Lending Club facilitated $3.6 billion in loans in the first six months of 2015. Likewise, Prosper is growing fast.

We can expect the Federal Housing Administration to develop a growing number of regulations that the mortgage industry must comply with. Many of these regulations will apply to a company’s portfolio of mortgages rather than any given mortgage. Mortgage processors will continue to ensure that they comply with application specific compliance issues, but they cannot be expected to deal with portfolio-wide compliance issues. In fact, it is unlikely that it is humanly possible to do so. This means that mortgage companies will necessarily embrace Big Data to do that job for them. Failure to do so means that they will face stiff penalties in court. It is far better for these companies to catch non-compliance failures on their own and take action than to face their regulators in court.

Carl Pry, a managing director at Treliant Risk Advisors, said “It’s in every bank’s best interest to get one step ahead of the regulators and understand what that regulator is going to know and find. They need to resolve any discrepancies [and] do any file review analysis needed to be able to explain any disparities before the regulators find them.”

Big Data is going to help reduce the risk in mortgage lending. Big Data will help brokers advise their clients about school performance and community crime rates. This will help the buyers make better informed decisions and, ideally, lead to lower risk mortgages.

Big Data is going to prove instrumental in flagging potential fraudulent mortgage transactions. The FBI and other law enforcement agencies are developing increasingly sophisticated techniques to identify potential abuses. Big Data algorithms will incorporate these fraud detection techniques into their algorithms and trigger pre-emptive enquiries.

Residential property appraisals will become more objective and more accurate. Big Data will propose the most appropriate neighborhood comparables. It will develop appraisals using industry standards that will be driven by an algorithm. MReport claimed that, “More than 30 percent of loans fall short of the collateral valuation agreed to between customer and loan officer.” Big Data will help fix that.

New regulations and compliance issues are making the appraisal process increasingly difficult. That, coupled with the fact that the number of qualified appraisers is not keeping up with the demand, means that the industry must necessarily rely on broad based, sophisticated tools like Big Data. This trend will continue.

We can expect federal compliance regulations in the mortgage industry to be applied ever more strictly. In the last few years we’ve seen fines and settlement agreements that were even more onerous that Dodd-Frank required. In these cases, the government targeted the big boys like Wells Fargo Bank, Bank of America, CitiBank, PNC Bank, EverBank, JP Morgan Chase, One West, Santander Bank, and U.S. National Bank. Given the government’s practice of starting with the big companies and working down to smaller companies, it is not hard to see what is liable to happen.

We can expect to see Big Data analysis of non-monetary defaults on mortgages to become more common if not universal. Here, I’m talking about flagging accounts where payments were made early and with an extra principal payment to being made on time with no extra payment. Or we will find homeowners whose home owner association is suing them. Or maybe the local government put a lien on the property on the grounds that the property is uninhabitable. Or the couple is getting divorced. These are all early warning signs that Big Data will track as a matter of course.

Small mortgage companies that cannot afford to buy the necessary technologies will be squeezed out of business. Larger companies will buy them.

Mortgage companies are going to increasingly focus on building higher quality portfolios with fewer staff. The only way to have a smaller staff complement and a larger mortgage portfolio is through automation. That should be obvious. Automation in general and Big Data in particular is the way of the future.

Just to wrap up, I want to make it clear that Big Data is already having an impact on how the mortgage industry operates and we are still at the early stages. We are going to be in for a very interesting ride over the next few years.

Big Data Case Studies in Education

big data case studies

Big Data Case Studies with Proven Results

Big Data Case Studies: Coursera

Coursera provides education from leading universities around the world delivered over the internet. The instruction is handled through data streaming videos. Coursera tracks how its students watch those courses. Students might “rewind” to watch a section a second time. Or they might fast forward – skipping stuff they think they already know. Or they might go over the same course several times. Or they might just quit and walk away. Whatever they do, Coursera tracks it on a student-by-student basis. The company learns from this experience. It learns what works and what doesn’t. Occasionally it throws in a pop-quiz to see how well the students are learning. But there’s another reason, too. The company wants to see how well it’s doing. It’s a kind of self-evaluation. When the course designers realize that the learning process is not going as they had expected, they can go back and rework their material based on real-world feedback.

Big Data Case Studies: Arizona State University

Arizona State University, like many universities across the country, has its fair share of freshman students who are genuinely challenged in mathematics. One third of their freshman classes earned less than a C in math. Interestingly, this one score has been a reliable indicator of whether students would eventually graduate and collect their degrees – or drop out. To deal with this, ASU worked with Knewton apply its adaptive learning techniques. In just two years – from 2009 to 2011 — the pass rate in this course jumped from 64% to 75% at the same time the dropout rates fell by 50%.

Big Data Case Studies: West Virginia University

Simon Diaz, a professor at West Virginia University, was very curious why so many students who enrolled in online classes dropped out. One of the key rationales for providing online classes with streaming video at times convenient for the students was that the students wouldn’t feel shackled to a schedule that was incompatible with the daily realities of their lives. Using Big Data analytics, he looked at 33 variables for more than one million students. These variables included everything you would expect like age and gender to things you wouldn’t expect like military service and class size. What he discovered had never been obvious to anyone else before. The more classes students took at any one time, the more likely they were to drop out. Simply by reducing the number of courses students enrolled in at any one time would increase retention rates. But financial grants to students require those students to take a minimum number of courses. In other words, public policy was at odds with good educational practice – a conundrum that no one had discovered before based on a policy that had probably never been thought through with any empirical evidence. Another win for Big Data in Education.

Big Data Case Studies: Kent State

Kent State uses analytics to track student activity and project the likelihood of success. It tracks students over a ten-year time period collecting data about their majors, classes, demographics and other factors. Their system highlights the students at risk with red, yellow, green indicators. The reports help advisors focus their efforts on problem areas. Steven Antalvari, Kent State’s director of academic engagement and degree completion, said, “This data has helped us peel away certain layers faster, allowing us to spend the bulk of our time together working on the student’s purpose, goals, and career development.”