American Family Funding Initiative Awards

Sixteen teams of UW-Madison faculty and collaborators have been awarded nearly two million dollars through the American Family Funding Initiative for data science projects addressing topics such as machine learning, user location privacy protection and student entrepreneurship.

Data science, computing and artificial intelligence are rapidly growing fields, motivated by the increasing availability of masses of unstructured data, scalable cloud computing and new modeling tools for advancing scientific discovery. American Family Insurance has partnered with UW through the American Family Insurance Data Science Institute (DSI) to offer mini grants of $75-150K per year for data science research at UW-Madison.

The goal of the American Family Funding Initiative is to stimulate and support highly innovative, groundbreaking research. Launched in spring 2020, this initiative is expected to position UW-Madison faculty to launch and further cutting-edge data science research, and be more competitive when applying for extramural research funding. A third round of funding will be announced in 2021.

Fall 2020 awards

This is an accordion element with a series of buttons that open and close related content panels.

Machine Learning Approaches for Metadata Standardization

Principal investigator: Colin Dewey (colin.dewey@wisc.edu), Professor of Biostatistics and Medical Informatics.
Co-Principal Investigator: Mark Craven, Biostatistics and Medical Informatics

Researchers and businesses are increasingly using large data sets, compiled from many sources, for training machine learning systems and performing statistical analyses. A major bottleneck arises from the fact that compiled data sets often contain unstandardized, unstructured metadata that describe each record. Manual standardization of metadata is labor intensive and often requires substantial expertise in the field of study.

To mitigate this issue, this project will develop machine learning approaches for automating the task of metadata standardization in large, heterogeneous data sets. The researchers will use state-of-the-art natural language processing models and develop active learning algorithms, which facilitate identification of records that would most benefit from human expert input. They will demonstrate the performance of these methods on the Sequence Read Archive—a vast repository of public biological sequence data.

Adaptive Operations Research and Data Modeling for Insurance Applications

Principal Investigator: Michael Ferris (ferris@cs.wisc.edu), Professor of Computer Sciences.

Uncertainty abounds in decision problems and optimization is a key tool used to mitigate its effects, utilizing the power of data science. This project will deploy a new approach that separates strategic decision making from operational modeling, in the context of a claim adjustment problem in the insurance industry. In this setting, random accidents occur across a large service area, requiring agents to deploy to the site to assess, document and determine appropriate courses of action. Our approach differentiates normal workload from crisis situations. It will inform an operational model that schedules resources over time to service both routine, normal workloads in a cost-effective manner, and enable the company to react efficiently to crisis situations. The model can be applied to problems as diverse as disaster recovery, chemical spill mitigation and electricity planning for extreme weather events.

A Deep Learning Approach to User Location Privacy Protection

Principal Investigator: Song Gao (song.gao@wisc.edu), Assistant Professor of Geography.
Co-Principal Investigator: Jerry Zhu, Computer Sciences.

User location information is a key component of both research and business intelligence. With the increasing availability of mobile devices and popularity of mobile apps, users in social network platforms actively share rich information about their locations on the Earth, the places they go and the activities they engage in. Those location-based profiles provide an invaluable source of information. However, mobility data is among the most sensitive data being collected by mobile apps, and users increasingly raise privacy concerns. The proposed research aims to develop a deep learning architecture that will protect users’ location privacy while keeping the capability for location-based business recommendations. The algorithms developed through this research may be applied in usage-based insurance (UBI) and other location intelligence domains.

GAN-mixup: A New Approach to Improve Generalization in Machine Learning

Principal Investigator: Kangwook Lee (kangwook.lee@wisc.edu), Assistant Professor of Electrical and Computer Engineering.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

The recent successes of machine learning hinge on the ability of predictive models to generalize, or adapt well to previously unseen data. Data augmentation, the process of injecting artificial data points into a training set, is widely employed for improving generalization. One of the most prominent data augmentation algorithms is mixup, which helps achieve state-of-the-art generalization performance across several benchmark tasks.

While mixup algorithms are useful for improving generalization for a wide class
of tasks, they have a few critical limitations. Mixup sometimes degrades generalization, restricting the applicability of these tasks. Moreover, current mixup algorithms do not have any theoretical performance guarantees. To address these challenges, the researchers will develop a computationally efficient mixup algorithm based on a generative adversarial network (GAN). They will also develop a theoretical framework for analyzing the performance of various mixup algorithms. This research will provide a new approach to improve generalization, with provable performance guarantees.

Integer Programming for Mixture Matrix Completion

Principal Investigator: Jeff Linderoth (linderoth@wisc.edu), Professor of Industrial and Systems Engineering.
Co-Principal Investigators: Jim Luedtke, Industrial and Systems Engineering; Daniel Pimentel-Alarcon, Biostatistics and Medical Informatics.

Matrix completion, or filling in the unknown entities in a matrix, is one of the most fundamental problems in data science. Matrix completion is used in applications such as recommender systems that predict the rating a user would give to an item, such as a movie or product, and then make recommendations to the user. This project will develop algorithms for solving a mixture matrix completion problem (MMCP), which has important applications not only in recommender systems, but also in computer vision systems for processing and analyzing visual images, data inference, and outlier detection.

Key to this research will be the development and application of advanced algorithmic techniques from integer programming, a powerful mathematical tool for solving optimization problems involving discrete choices. The work will pave the way towards the application of integer programming for a broad class of large-scale data science problems.

Developing a State-of-the-Science Regional Weather Forecasting System

Principal Investigator: Michael Morgan (mcmorgan@wisc.edu), Professor of Atmospheric and Oceanic Sciences.
Co-Principal Investigator: Brett Hoover, Space Science and Engineering Center.

This project will develop an ensemble weather prediction system for American Family Insurance that will provide high-resolution weather forecasting run entirely in cloud computing infrastructure. This project will improve the accuracy of forecasting hazardous weather by producing many realizations of the same forecast from slightly varying initial conditions.

The probabilistic forecasts will provide advanced warning of not only hazards including hail, wind gusts, and hurricane impacts in targeted regions, but also the uncertainty associated with the predictability of these hazards. This novel research will provide a state-of-the-science technique in regional weather modelling.

Model Recycling: Accelerating Machine Learning by Re-using Past Computations

Principal Investigator: Shivaram Venkataraman (shivaram@cs.wisc.edu), Assistant Professor of Computer Sciences.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

Data scientists train machine learning models that are used in a wide range of domains, from drug discovery to recommendation engines. Training a machine learning model, and fine-tuning the parameters that control how well a model performs, take significant time and resources. The process of incremental fine-tuning is often manual and involves retraining models from scratch. This project will automate and accelerate this process of fine-tuning by reusing and sharing past computations from prior training jobs, using a technique called model recycling. The researchers will develop a software framework that can help data scientists accelerate model fine-tuning, and a proposed intelligent predictor that can automatically save prior computation results, based on their importance.

Question Asking with Differing Knowledge and Goals (continuation from Spring 2020)

Principal Investigator: Joe Austerweill (austerweil@wisc.edu), Assistant Professor of Psychology

People spend a significant proportion of their time asking each other questions to gather information. Entire professions, such as academia and customer service, are dedicated to asking and answering questions. Despite tremendous progress in machine learning, automated methods that answer a person’s questions are still inferior to answers from people.

Why are people better at answering questions? One reason is that question-askers leave out information that those answering the questions can fill in from their rich knowledge of language and the world. A recent machine learning method addresses this issue by asking multiple, reformulated versions of a human question, providing multiple answers, and learning to select the answer that is most likely to satisfy a person. However, this is done purely from data and does not incorporate psycholinguistic research demonstrating that people prefer simpler answers that are tailored to their personal goals and knowledge.

This project investigates whether incorporating psycholinguistic factors can improve automated question-answering methods. If so, then researchers can test novel, potential psycholinguistic factors and learn more about the underlying mechanisms that enable people to answer questions.

Lightweight Natural Language and Vision Algorithms for Data Analysis (continuation from Spring 2020)

Principal Investigator: Vikas Singh (vsingh@biostat.wisc.edu), Professor of Biostatistics and Medical Informatics

Collaborators: Zhanpeng Zeng (Computer Sciences), Shailesh Acharya and Glenn Fung (American Family Insurance)

Natural language processing is a form of artificial intelligence that helps computers read and understand human language. Efficient and accurate natural language processing models are central to various applications but have a significant computational footprint.
The overarching goal of this project is to accelerate the time it takes to train and test these models by developing alternative solutions that are based on much faster image processing primitives.

Spring 2020 awards

This is an accordion element with a series of buttons that open and close related content panels.

Question Asking with Differing Knowledge and Goals

Principal Investigator: Joe Austerweill (austerweil@wisc.edu), Assistant Professor of Psychology

People spend a significant proportion of their time asking each other questions to gather information. Entire professions, such as academia and customer service, are dedicated to asking and answering questions. Despite tremendous progress in machine learning, automated methods that answer a person’s questions are still inferior to answers from people.

Why are people better at answering questions? One reason is that question-askers leave out information that those answering the questions can fill in from their rich knowledge of language and the world. A recent machine learning method addresses this issue by asking multiple, reformulated versions of a human question, providing multiple answers, and learning to select the answer that is most likely to satisfy a person. However, this is done purely from data and does not incorporate psycholinguistic research demonstrating that people prefer simpler answers that are tailored to their personal goals and knowledge.

This project investigates whether incorporating psycholinguistic factors can improve automated question-answering methods. If so, then researchers can test novel, potential psycholinguistic factors and learn more about the underlying mechanisms that enable people to answer questions.

Using Data to Foster Entrepreneurship and Innovation in the Madison Ecosystem

Principal Investigator: Jon Eckhardt (jon.eckhardt@wisc.edu), Associate Professor of Business

Collaborators: Brent Goldfarb (U Maryland), Molly Carnes (WISELI)

Entrepreneurship is an important path for upward mobility and wealth creation. Student entrepreneurship matters, in part, because student startups are not necessarily modest endeavors. In 1979, recent UW-Madison graduate Judy Faulkner founded the electronic medical records company Epic, which today employs over 10,000 people. Research indicates that student-entrepreneurship at UW-Madison is surprisingly prevalent.
Despite the impact of student entrepreneurship, little is known about what drives entrepreneurial intentions and activity amongst students, such as an interest in starting a company. Further, female students are less than half as likely as male students to self-report entrepreneurial intentions or actions.

The goal of this project is to support the work of the Academic Entrepreneurship Study Team at UW-Madison. This team is using data analysis techniques to enhance the impact and management of entrepreneurship programs at UW-Madison and other U.S. universities. Insights from this research will support the creation of evidence-based interventions to increase the prevalence and effectiveness of student entrepreneurship.

Machine Learning for Usage-Based Insurance

Principal Investigator: Robert Holz (reholz@ssec.wisc.edu), Senior Scientist, Space Science Engineering Center
Co-PI: Willem Marais (Space Science Engineering Center)
Collaborator: Rebecca Willett (University of Chicago)

Usage Based Insurance (UBI) is a type of vehicle insurance where the costs depend on the user’s type of vehicle, distance travelled, speed and driving behavior. The goals of UBI are to enable insurers to promote safer driving behavior, reduce the frequency and magnitude of auto accidents, and help reduce costs to insurers and drivers.

Data collected for UBI primarily consist of GPS locations collected from smartphones. Additionally, ancillary datasets provide information on speed restrictions, lane information, points of interest and functional road classifications. Together, these data can be used to classify driving behaviors at different risk levels.

This project investigates machine learning methods that analyze very large UBI datasets in order to produce a measure of driver risk and safety. A key technical question of the investigation is how to accurately model UBI data that will allow for an effective and robust measure.

Optimizing Question and Answer Systems via User Feedback

Principal Investigator: Robert Nowak (nowak@engr.wisc.edu), Wisconsin Institute for Discovery and Professor of Electrical and Computer Engineering

Question-and-Answer (Q&A) systems are online software systems that aim to answer questions asked by users. Such systems are increasingly common throughout business, industry and healthcare. This project aims to develop new theory and methods for optimizing Q&A systems based on user feedback.

This project will begin with text embeddings that map words, sentences and whole documents into numerical representations that find similarities and connections in language. The research will draw on recent advances in the field of multi-armed bandit problems—a modeling approach that balances the choice of acquiring new knowledge with the competing choice of relying only on existing knowledge—to explore new approaches for Q&A systems. The research team will develop scalable algorithms for these systems with attention to search optimization and computation time, as human users of Q&A systems will not tolerate large delays in receiving answers to questions.

Improving Traffic Safety Outcomes Through Data Science

Principal Investigator: David Noyce (danoyce@wisc.edu), Professor and Associate Dean, College of Engineering

While advances during the last 40 years in vehicle design, traffic engineering and driver behavior have led to significant improvements in transportation safety, recent trends have shown a leveling—and in some cases an increase—in the number of traffic crash fatalities. Emerging data provide new opportunities for incentives and technologies that move the trend towards zero fatalities once again. However, there are vital research questions about which technologies hold the most promise and how these different solutions work together to help drivers make informed, safe decisions.

The vision for this research is to translate advances in automotive technology and data science into tools that will improve driver safety and bolster the safety performance of emerging technologies, such as advanced driver assistance systems and automated vehicles. The researchers will conduct collaborative data science research, including machine learning and other approaches, to develop algorithms focused on incentivizing positive driver behavior. Researchers will also quantify the safety performance of emerging technologies, filling information gaps for automated vehicle developers, insurance companies, policy makers and the public.

Learning Causal Relationships from Data

Principal Investigator: Irene Ong (irene.ong@wisc.edu), Assistant Professor of Obstetrics and Gynecology and Biostatistics and Medical Informatics, School of Medicine and Public Health

Co-PI: Aubrey Barnard (Biostatistics and Medical Informatics)

Humans naturally develop an understanding of cause and effect by exploring the world. But causality is not nearly so easy for machines to learn. As a result, causal understanding is often missing from artificially intelligent systems, as you may have noticed when your digital assistant goes awry. To help improve the causal reasoning abilities of such systems, this research project develops an algorithm for learning causal relationships from data, one that is more efficient, accurate and robust than similar algorithms. These characteristics make causal learning more usable and likely to be incorporated into systems like your digital assistant in the future.

For the time being, the causal learning algorithm will be applied to discovering the environmental factors that prevent or cause asthma, and to identify relationships in electronic health data that will help prevent severe drug reactions and improve patient care by tailoring it to each individual patient.

3D Capture and Scanning Technology for Insurance Documentation

Principal Investigator: Kevin Ponto (kbponto@wisc.edu), Associate Professor, School of Human Ecology

Insurance claims adjusters constantly face the challenge of inspecting and assessing a scene to understand potential risk, or what took place after an event. They typically do this using tools such as digital photography. Recent advances in 3D capture technologies have created new ways to digitize the world around us. The overall goal of this project is to design and implement a system that utilizes 3D scanning and capture technology for automated documentation of scenes. This has the potential to reduce disputes between insurance companies and their clients, saving money and time for both parties.

As the utilization of 3D capture technology in this area is quite novel, and upcoming technological changes may create new directions of inquiry, the project will focus on research and design of an automated inventory system. This work will provide foundational knowledge for how 3D capture technologies may benefit the insurance industry.

Lightweight Natural Language and Vision Algorithms for Data Analysis

Principal Investigator: Vikas Singh (vsingh@biostat.wisc.edu), Professor of Biostatistics and Medical Informatics

Collaborators: Zhanpeng Zeng (Computer Sciences), Shailesh Acharya and Glenn Fung (American Family Insurance)

Natural language processing is a form of artificial intelligence that helps computers read and understand human language. Efficient and accurate natural language processing models are central to various applications but have a significant computational footprint.
The overarching goal of this project is to accelerate the time it takes to train and test these models by developing alternative solutions that are based on much faster image processing primitives.

Ultra-Fast Training for the Third Wave of Artificial Intelligence: Novel Categories in Text Classification

Principal Investigator: Jerry Zhu (jerryzhu@cs.wisc.edu), Professor of Computer Sciences

The first wave of artificial intelligence (AI) emerged in the 1980s as expert systems that apply rules to deduce new facts. In the 2000s, the second wave of AI emerged as statistical machine learning, including deep learning. Second-wave AI networks are trained on enormous data sets labeled to recognize patterns. The yet-to-come third wave of AI is predicted to combine and supersede the first two waves. Third-wave AI systems will require far fewer data items for training, and will apply rules in ways that are more similar to human cognition.

This project aims to take a step toward the third wave of AI by allowing data scientists to train a classifier (the “brain of AI”) using intuitive data transformation rules. This contrasts second-wave AI, where the data scientists must label training data. We expect that providing rules instead of labels will achieve faster, better training. This project will focus on text classification used by businesses, with the aim of producing more agile text classifiers with fewer human resources.

Co-PI: Ara Vartanian (Computer Sciences)