Audio and video analysis reveals social and political behavior

Using HPC tools to characterize C-SPAN clips and other sources, Bryce Dietrich probes the predictive power of motion, vocal pitch, and more.


Bryce Dietrich thinks audio and video files hold untapped information about the workings of the courts, Congress, and government agencies.


He’s found, for example, that analyzing vocal pitch during U.S. Supreme Court arguments can predict how justices will rule, and that tracking motion in C-SPAN videos can expose degrees of partisan polarization in the U.S. House of Representatives.


“Nonverbal elements tend to be understudied in political science,” Dietrich says. “Audio analysis and other methods can reveal a lot of information about emotional states and expressions.”


These methods also demand a lot of computing resources, one reason Dietrich decided to bring his research program to the University of Iowa in 2016.


Since then, he’s been using the university’s high-performance computing clusters to download files, run audio and video analysis software, and house data.


Dietrich, today an assistant professor of sociology and political science, first delved into audio and video analysis as a doctoral student at the University of Illinois. He knew immediately that the work would require plenty of processing power and data storage.


As a visiting professor at the University of Missouri, he turned to Amazon Web Services. “The workflow was kind of cumbersome, but I thought it was amazing,” he recalls. The UI’s Neon cluster, however, took his work a giant leap forward.


Sai Ramadugu, computational science and training facilitator with ITS Research Services, scripted a sample job to show what the cluster could do. “Now I use it exclusively for all kinds of tasks,” Dietrich says.


Video analysis, especially, would be virtually impossible without the HPC. Consider Dietrich’s study of what C-SPAN “ant-farm” shots—wide views that show elected officials scurrying about after floor votes—reveal about partisanship.


“Video motion indicates social interaction,” Dietrich explains. “We wanted to see how often members of Congress literally cross the aisle.”


His team evaluated more than 6,500 C-SPAN videos covering nearly 20 years. They used HPC resources to characterize ant-farm shots, automatically isolate relevant frames from hundreds of video hours, and measure aggregate motion by assessing differences between frames.


The team found that video clips with less motion tend to follow especially partisan votes. Less post-vote motion also seems to predict greater division on subsequent votes.


Projects like this prompted the Shorenstein Center for Media, Politics, and Public Policy and the Institute for Quantitative Social Science at Harvard University’s Kennedy School to name Dietrich a fellow. He credits the College of Liberal Arts and Sciences—and especially Executive Associate Dean Raul Curto—for making the appointment possible.


In other work, Dietrich and colleagues are developing a new project that would characterize tense police-citizen encounters using body-camera data. Identifying vocal patterns and other signs that a situation is about to spin out of control could inform early-intervention systems.


He’s also teaming up with the Internet Archive on tools that use audio fingerprinting to search cable news databases for specific speeches, helping ordinary archive users interact with vast quantities of data.


High-performance computing and machine-learning techniques remain relatively uncommon in social-science research. For Dietrich, they bring together two driving interests.


“I’m interested in the dynamics of powerful people interacting with each other,” Dietrich says, “and I like figuring out how to use data in new ways.”