As Yogi Berra, master of American baseball, once said, “You can observe a lot by just watching.”
As Yogi Berra, American baseball legend, once said, “You can observe a lot by just watching.” There has been a lot to observe in the world of data privacy and protection recently. Advances in artificial intelligence and mobile computing promise to automate repetitive error-prone claims and underwriting tasks, from sifting through physician notes to scrutinizing heart traces. At the same time, insurers can better access masses of data to develop more granular assessments of risk.
Impressive, yes. But also concerning. Consumer advocacy groups have begun to raise the alarm about data misuse and theft. Indeed, according to Forbes magazine, the average data theft in 2018 cost $7.91 million in the U.S. alone in [legal fees, consumer reimbursements and other losses], the highest annual amount on record. Regulators have taken notice. The European Union’s proposal of the General Data Protection Regulation (GDPR) in 2012 spurred a spate of similar policies across the globe.
As sports analogies go, baseball seems oddly appropriate to describe the shifting state of data protection in insurance today. In baseball, the defense is given control of the ball, and the offense must decide whether to swing or let an opportunity pass by. Similarly, while insurers today have good reason for caution, many also have access to invaluable data and the knowledge to protect it. How can insurers both address data privacy concerns and capitalize on competitive opportunities? Perhaps the lessons every insurer needs to learn can be found on a baseball field.
Going on Offense: Data Analytics in Risk Assessment
The question is urgent as carriers face mounting volumes of data. Basic demographic details, such as age, gender, education, and occupation, are only a small percentage of the possible insights now available. Data scientists can now build highly accurate predictive models based on behavioral, socioeconomic, and biometric information.
Collected together, data from pharmacy, wellness, motor vehicle, and credit sources can help carriers construct a far more complete profile of each applicant or claimant. The possibilities seem endless. New rating factors can enable underwriters to better segment risk and protect against anti-selection, nondisclosure, and fraud. Post-sale, carriers can perform more accurate and comprehensive multivariate experience analysis to support better in-force management and uncover untapped distribution, cross-selling, and upselling opportunities. And, at claims time, more insight can deliver greater accuracy in adjudication.
But like any hitter up at bat, insurers make a series of tactical decisions to try for a hit, and risk a miss or even a strikeout. Players who step back from the plate could risk losing market share to more agile competitors. And yet, unauthorized or merely lax data usage can contribute to reputational damage, loss of business, and substantial fines. Success will only come to those carriers that rapidly develop both effective and protective business practices.
Playing Defense: Data Protection Strategies
The good news? Adopt the right strategies, and an insurer can gain the competitive advantage. The bad news? Each option brings positives and negatives:
- Create a data catalog – Companies can create a single source that gives users a view of all data available in an organization, from origin source to usage. This eases information sharing and visibility and can help carriers more easily identify gaps; it also magnifies risk if the data in such a catalog were to become inadequately protected.
- Detect and classify data – By correctly sorting sensitive data based on clear definitions, carriers can more easily identify and protect personally identifiable information and satisfy regulatory and auditor requests. On the other hand, misclassifying or mishandling this data due to vague definitions or process problems can draw hefty regulatory fines and reduce productivity.
- Protect data – Insurers can protect sensitive data through a variety of protection techniques, including anonymization, pseudonymization, encryption, or redaction. These enable only authorized users to see sensitive data elements, minimizing the risk of unintended disclosure and ensuring compliance. However, removal of data fields can impede analysis and raise integration risks, particularly in large complex organizations with multiple touchpoints. Also, no technical solution can be protective if the definitions of sensitive data are too lenient.
- Set a master person index – Carriers can establish a master person index by evaluating two or more data records containing the same, or similar, data elements to make a determination if they are for the same individual, who then can be assigned an alias across different partners and different data sets. This index can empower a carrier to better manage jumbo risks and retention limits, reduce the need for time-consuming manual comparisons, and increase data quality – but only if the aliases for each individual are correctly linked. The practice can lead to data redundancy, selection bias, and incorrect linkages that degrade data quality.
Complicating this, protection technology itself is evolving, opening new choices and fresh risks. Consider the widespread practice of data anonymization. Insurers routinely anonymize or withhold identifying factors to protect personal information from analysts. While this approach is highly effective in isolation, it often proves too static when insurers must compare multiple sources of data. Alternate data streams cannot be easily merged with the anonymized data, and overreliance on this technique can impede deeper understanding of risk.
Carriers have responded to this problem by pursuing a variety of protection methods:
- Tokenization protects the identity of the individual through substitution. A token is attached to a data set and unlocked or relocked as information flows through different systems and users. The problem? Such tokens are often inconsistent for the same data subject coming from different data lineages.
- Psuedonymization answers the need to compare and enrich data sets by enabling merging of de-identified datasets via common pseudonyms for individuals. This offers a dynamic ability to merge new sources to existing data sets over time, but comes with a significant administrative cost to manage, including role-based separation of duties to minimize re-identification.
- Encryption renders the data unintelligible via an algorithm, but allows the data to be accessed with the correct decryption key. This enables authorized users to decrypt the data when needed for an authorized business process. An organization must therefore take care to protect the decryption key to ensure protection of the data.
- Differential privacy adds random noise to a returned query via a mathematical function with a specific privacy parameter, thereby preventing an adversary from using a process of deduction by elimination to discover sensitive information about an individual. This approach minimizes the risk of identifying individuals by stitching together disparate pieces of information, but this protection can be weakened based on the number of queries and analyses conducted.
After examining this complex landscape, perhaps the only clear conclusion is that there is no single path to achieving data protection. Each privacy strategy should be customized to a carrier’s unique operational status and market objectives.
At the same time, all data protection trends seem to follow a few shared governing principles. These are proactive solutions, designed to help insurers take action in advance of an incident. Systems and technologies are also being developed to prioritize privacy via “Privacy by Design”1 without reliance on any one single analyst or administrator’s judgment. There is growing awareness that carriers must consider end-to-end security, from the moment data enters a system through its destruction and at all steps in between. Visibility and transparency are also essential; insurers are seeking to ensure data use is subject to ongoing compliance and the consent of stakeholders.
Finally and perhaps most critically, insurers are approaching the challenge of data protection not as a trade-off between privacy and utility, but as a means to achieve both. In other words, strong data science practice requires equally solid privacy protections. This goes beyond simply adhering to data protection legislation; it also requires meeting the expectations of consumers, who rightfully demand that insurers be conscientious custodians of their data. Trust and transparency are essential, so data usage and transmission should not extend past mutually understood applications.
Carriers who seek to delay the future, rather than invest in it, forsake their chance to hit a home run, while thoughtless use of data will almost inevitably result in striking out.