Nine projects awarded funding through American Family Insurance partnership with UW-Madison Data Science Institute

MADISON—The American Family Insurance Data Science Institute (DSI) at UW-Madison announced nine campus research projects selected for funding through the fourth round of the American Family Funding Initiative.

The Funding Initiative is a partnership between DSI and American Family Insurance that provides UW-Madison faculty and staff with support for data science and artificial intelligence research that is relevant to the insurance industry. Launched in 2020, this initiative has provided 27 researchers and teams with nearly $4 million for their work, and it is expected to position UW-Madison faculty to launch and further their cutting-edge data science research.

Funds are awarded through a competitive application process administered by DSI, and American Family Insurance provides selected projects with funding and mentoring by their data scientists. Both organizations evaluate applications on their novelty, potential impact to data science and alignment with topics of interest to American Family.

American Family Insurance has committed $10 million over 10 years to this unique research partnership with UW-Madison. The Round 5 call for proposals will be announced in January 2023.

Projects awarded funding in Round 4 include:

Multi-Modal Analytics for Unbiased Estimation of Driving Behavior: Understanding driving behavior is central to efficient, safe transportation and associated insurance mechanisms. Suman Banerjee (Computer Sciences) seeks to create an unbiased system for evaluating driving behavior that will use multi-modal signals, especially from audio-visual sensors, to learn contextual information about why certain behaviors happen.

Quasi-Experimental Designs for Learning Systems: A growing number of systems, including hospitals and insurance companies, aim to derive knowledge from internal data to improve their day-to-day operations. Amy Cochran (Mathematics and Population Health Sciences), Gabriel Zayas-Caban (Industrial and Systems Engineering) and Brian Patterson (Berbee Walsh Department of Emergency Medicine) will develop a causal inference framework for estimating the effects of interventions on these systems and provide algorithms to guide their use of risk prediction models in their operations, with the goal of improving services and reducing costs.

Fairness Guarantees for Learners Without Explicit Access to Demographics: The most advanced sample complexity bounds on fair machine learning are called multicalibration convergence bounds. These bounds specify the number of samples required to achieve performance parity across many population demographics. Under the leadership of Kassem Fawaz (Electrical and Computer Engineering), this research will yield additional perspective on both algorithmic fairness and multicalibration error convergence bounds. Further, it will enable machine learning practitioners to easily understand the convergence behavior of multicalibration error for a myriad of classifier architectures.

Counterfactual Evaluation of Sequential Decision Policies: One way to evaluate AI-based decision-making policies and autonomous systems before they are deployed is to take data from a previously used policy and answer the counterfactual question, “What would have happened if the new policy had been making decisions instead of the older policy?” Through this project, Josiah Hanna (Computer Sciences) will introduce novel methods for counterfactual policy evaluation in sequential decision-making, where even small changes in how decisions are made can lead to drastically different outcomes over time.

Auto-labeling Foundations: While crowdsourcing is a popular way to collect labeled training data for machine learning, it is expensive and time-consuming to hand-label each data point. Systems that automatically label data points while actively learning a model perform well in practice, but there is no theoretical understanding of what performance guarantees can be expected from these systems, or whether the resulting biased datasets can even be trusted. Ramya Korlakai-Vinayak (Electrical and Computer Engineering) and Fred Sala (Computer Sciences) aim to close this gap by developing theoretical foundations for characterizing the performance of auto-labeling systems.

Contrastive Language-Image Learning for Out-of-distribution Detection: A major issue that prevents machine learning algorithms from being deployed to real-world problems is the safe handling of anomalous data that differs from the training distribution data. Prior research on out-of-distribution (OOD) data detection has been primarily driven by image recognition tasks. Under the leadership of Sharon Li (Computer Sciences), this project will pioneer new directions in contrastive image-language learning for OOD detection. Li will explore how language and vision can provide complementary sources of information to better estimate uncertainty in claim fraud detection and other risk scenarios.

Expanding Knowledge Graphs with Humans in the Loop: Knowledge graphs encode human expertise in a structured manner to enhance the performance of recommender, forecasting, and other machine learning systems. As new concepts emerge in a domain, knowledge graphs need to be expanded to include them. However, expanding knowledge graphs manually is infeasible at scale. Emaad Manzoor and Jordan Tong (Wisconsin School of Business) propose methods to automatically expand knowledge graphs that are designed from the ground up to operate with humans in the loop.

Optimal Features for Heterogeneous Matrix Completion: Matrix completion, or filling in the unknown entities in a matrix, is one of the most fundamental problems in data science, and existing models are incompatible with some types of data. Daniel Pimentel-Alarcon (Biostatistics and Medical Informatics), Jeff Linderoth and Jim Luedtke (Industrial and Systems Engineering) will develop a new model and algorithms specifically tailored to complete matrices with heterogeneous data, with important applications in recommender systems, computer vision systems for processing and analyzing visual images, data inference and outlier detection.

Doing More with Linear Transformers: Machine learning and computer vision methods that drive applications such as object detection, image recognition, language understanding and voice recognition make use of models known as “transformers” that can require weeks or months to train. Through this research, Vikas Singh (Biostatistics and Medical Informatics) will significantly extend the capabilities of current models based on algorithmic and implementation improvements. Singh will focus on ultra-long temporal/spatio-temporal sequences, coming from a broad variety of applications, and study the key challenges that need to be overcome to allow efficient training and deployment of transformer models.