Data Science in the Insurance Industry
By Kat Campise, Data Scientist, Ph.D.
Data science as applied within the insurance industry is currently in an emerging stage. While actuarial scientists utilize statistical methods for their risk calculations, and predictive analytic techniques are used within the industry, insurance companies haven’t embraced data science as quickly as other industries. But, why?
The short answer is regulation. Although insurance companies are privately owned and operated, their decisions have a widespread impact on the public. Health insurance is a prime example of the public and private intermingling despite the insurance policy being a private contract between the policyholder and the insurance company. Policyholders pay X amount monthly and/or agree to meet a premium payment amount to, ideally, have a safety net in case a drastic event occurs, such as needing heart surgery. Moreover, there may be thousands, tens of thousands or hundreds of thousands of policyholders who rely on the insurance company’s decisions.
If the insurance company fails to meet the agreed-upon financial obligation — and they’ve devised massive legal documents that state what they will and will not cover, and when — then a ripple effect is generated. Should the policyholder have a heart attack, they are not going to merely wait for death. No, instead they’ll be rushed to the hospital and treated. However, if they cannot pay, then the hospital now has the responsibility to recoup the money from elsewhere. Depending on the U.S. state, either the state remits payment or the cost is passed on to existing and future patients. If the state pays, then the money is replenished through taxes. As much as many may believe that medical services should be free, doctors, nurses, and other health care providers also need to be paid, as do the vendors of the medical equipment and pharmaceutical companies.
So, everyone has skin in the game when it comes to insurance: all policyholders, the hospital, the physicians and nurses, the insurance company, and non-policy holders who are residents of the particular U.S. state. Consequently, insurance companies are regulated at the state level which includes licensing, overseeing financial durability, and monitoring the insurance company’s actions to ensure fair and reasonable market practices.
Also, keep in mind that insurance companies need a larger population of policyholders that don’t generate frequent claims, whether large or small. In the case of health insurance, for the insurance company to remain financially viable and meet its obligations to all of its policyholders, the healthy population paying into the monetary pool must be greater than the policyholders who are more likely to need ongoing medical treatment. As such, policy pricing is based on statistical assessments of policyholder risk. Smokers with a history of heart disease present a higher risk of financial demands on other policyholders, which in turn can increase the costs of insurance and medical care for everyone else. Calculating these factors is the realm of the actuary.
Actuarial Science vs. Data Science
Actuarial science and data science have a primary skillset in common: advanced math education and statistical expertise. But, the path to becoming a data scientist is, for now, less rigorous when compared to actuarial science. For instance, if you’re interested in actuarial science, you’ll still need to complete an academic course of study that includes the following:
- Calculus
- Probability and Statistics
- Linear Algebra
- Actuarial Science
- Business coursework
- Finance and Economics
Attaining your Bachelor’s degree is only the beginning. You may get your foot in the door as an actuary intern, but to rise through the ranks towards earning the median pay of about $105,900 per year (and get closer to the $206,820 or higher salary the top 10% of actuaries earn), you’ll need to pass between 6 and 10 exams to become a Fellow. After you successfully pass the first 7 exams, the Associate level is reached (as a general rule). Insurance employers will usually fund your exams, which can save you thousands of dollars in exam fees. But, you’ll still need to spend roughly 8 years studying and passing the exams, along with performing your daily duties as an actuary, if you want to attain Fellow status.
Two organizations provide exams and certification, and each focuses on a particular type of insurance:
- Casual Actuary Society (CAS): auto, home, worker’s comp, medical malpractice
- Society of Actuaries (SOA): investments, finance, health, life insurance, retirement; notably, the SOA has added a Predictive Analytics component, where candidates apply predictive modeling to a business case through the use of R. This represents an incorporation of data science processes into the insurance industry.
The job outlook for actuaries is bright: 21% projected growth through the year 2031 according to the Bureau of Labor Statistics (BLS). However, the BLS also estimates that data science employment will grow by 36% in the same time span. So actuaries who dive into data science roles could very well ride the Big Data wave into long, successful careers.
Comparatively, data scientists often start out higher on the pay scale with a BLS-reported median yearly income of $100,910. An ambitious actuary with a data science background may have the skills necessary to transcend this pay gap.
The Data Science Point of Departure
As actuarial science candidates toil away at passing exams, the expectation for data scientists is that they’ve earned at least a master’s degree in a STEM field. Depending on the industry, data scientists aren’t generally shackled to an extreme regulatory environment. They have more breathing room in terms of building, deploying and monitoring their predictive models. In short, data scientists approach business problems from a research design perspective. Errors are drawn out through an iterative process that involves a specific set of stakeholders, e.g., internal departments and consumer-facing systems an processes. There is some oversight, but not at the same level that actuaries experience.
For instance, let’s say that a health technology company (not an insurance company) asks their data scientist to build a recommendation system that ingests data from internal and external data sources which may be structured, semi-structured, and unstructured. The algorithm would then produce a predictive output and a series of recommendations for the next course of action. This can be consumer-facing, such as listing insurance pricing comparisons (an estimate), or internal, e.g., predicting customer churn for a subscription service. They may have a team consisting of a lead data scientist, a data engineer, a data analyst, IT, and a manager or C-level executive collaborating with them. Minimum viable products (MVPs) are frequently launched to the public and then fine-tuned via additional iterations.
Within an insurance context, this process is layered in internal and external oversight. Releasing an MVP isn’t an option in the insurance industry due to strict regulatory requirements. The same recommendation system produced by a data scientist (or an actuary with advanced data science skills or training) in the insurance industry is likely to be examined and monitored by internal regulatory departments and audited by an external regulatory team prior to launch. Furthermore, there will be specific protocols at each stage of the audit that cannot be avoided and significantly reduce the hypothesis testing approach that is essential to data science.
How Data Scientists Can Help Improve the Insurance Industry
Nonetheless, data science practices are being merged into the insurance industry. As previously stated, the SOA has released a Predictive Analytics exam that focuses on model building, codifying the underlying statistical algorithm into the R programming language, and then assessing the results of the model. Data science moves the insurance industry into analyzing a wider variety of impact factors for risk mitigation and pricing. Insurance as a one size fits all approach only functions when the pooled risk is constrained, as in the case of employer-provided insurance. When insurance is expanded to a larger risk pool, such as a population of over 300 million (the Affordable Care Act is an apt example here), then risk and pricing tend to increase.
Risk and Pricing via AI
We now have more data available than any other time in human history. We also have made great strides in utilizing machine learning to capture a multitude of data — including qualitative data — and making predictions as to the likelihood of an event occurring. With regard to the health insurance industry, we can make better predictions as to the policyholders who are more likely to need a larger return on their monthly insurance or premium payments vs. those who are essentially financing that need.
The above leads us to better customer segmentation. Policyholders are, after all, customers. Each has a particular scenario that doesn’t consistently fall within the Generalized Linear Model relevant (and extrapolated) to a larger population. For example, some areas of a state have a higher probability of flooding or wildfires. Alternatively, for auto insurance, you may live in a city where there’s a higher likelihood of your car being stolen or your being involved in a collision. But, you’re a conscientious car owner/driver, and neither has ever happened to you.
Rather than you paying a higher price for others who aren’t as mindful on the road (reckless drivers), a well-designed machine learning protocol will be able to auto adjust your pricing based on more than just the increased risk of where you live and how much you drive your vehicle. This can be supported by digital data that the auto insurance company collects; perhaps a dash cam or some other app that uploads your driving (or other car related data) to your insurance companies database. From there, the risk and pricing algorithm produces the adjustment. Surely, this is a highly simplified example. The point here is that insurance pricing and product offerings can be individualized, and data science provides the means for this to be a reality.
The same can be applied to health insurance: the policyholder uses an agreed upon health app and receives discounts if they are performing an activity that lessens the risk of injury or disease. Naturally, the question of data privacy arises, as it should. Although we, in the U.S., haven’t yet adopted as pervasive data protections as the EU via the General Data Protection Regulation (GDPR), something similar beyond HIPAA can move us forward towards decreasing the insane costs of health insurance with increasing optimal health outcomes for the insured.
Claims fraud in the U.S., health insurance notwithstanding, costs taxpayers $400 billion per year. Whether subsidized through the government or via policyholder payments, insurance fraud hurts everyone. With continued advancements in AI, which has the ability to weight and assimilate the most relevant data sourced from far more data points than humans can, claims fraud detection can be improved and more quickly mitigated.
Becoming an Insurance Industry Data Scientist
To become a data scientist in the insurance industry, it’s important for you to understand actuarial science and the insurance regulatory complexities. This doesn’t mean that you need to be an actuary prior to entering the industry. So, unless you’re someone who loves studying and passing exams, you don’t need to follow the actuary exam path described above.
There is, however, a slow movement towards actuaries taking on more data science type activities. Eventually, the industry may require a similar learning path between their actuaries and data scientists. Thus, coursework in actuarial science, business, economics, and finance should be added to your data science learning queue. To take actuarial coursework, you’ll need to have completed a series of math prerequisites (calculus 1 through 3, linear algebra, differential equations; each university has its own requirements). Those of you who’ve already majored in math or have completed the math requirements may find that edX’s “Introduction to Actuarial Science” will give you enough exposure to get started in the industry.
2021 US Bureau of Labor Statistics salary and employment figures for actuaries and data scientists reflect national data, not school-specific information. Conditions in your area may vary. Data accessed January 2023.