Mega metadata on political ads: how we collect it, what you can do with it
May 12, 2016
The Political TV Ad Archive is wealthy in metadata about political ads. After tracking more than 20 markets across 11 key primary states, we now have more than 171,400 rows of data about airings and archived more than 1,295 ads for posterity. Already, journalists have plumbed these riches to create a video game, charts, visualizations, mash ups, and interactive apps. (For a full list, see our press page.) But there are many more creative things that could be done.
Here’s a quick guide to our metadata: how we collect it, what it means, and a few ideas about how to take it further. A detailed data dictionary can be found on our data page; this post gives the big picture.
Where, when, and what. We collect TV through the TV News Archive and use the Duplitron 5000 built by Dan Schultz to count political ads, who goes into detail here about how we do it. The results are poured into data fields giving details on where ads aired, when, and on what TV programs and stations. These data can be downloaded by pressing the following blue button on the data page.
So, for example, in the snippet of data below, pulled from this ad sponsored by Bernie Sanders, “network” refers the call letters of the TV station. “Location,” details the market where the ad aired, which corresponds to Nielsen “designated market areas.” “Program,” is the name of the TV program; “program_type” is whether the Internet Archive classifies this program as “news” or “not news.” The “start_time” and “end_time” refer to the specific date and time that this particular airing of the ad aired. These are rendered in UTC, or “coordinated universal time.” “Archive_id” is a unique id for an ad, which is why in the illustration below, all are identical.
Each row equals one airing of this one particular ad. Researchers can use a program such as Excel to produce counts of how many times an ad or group of ads has aired in a particular location, or on a certain channel, or whether the ad has appeared more on news shows or shows that are not news, and so on. For example, we’ve seen journalists visualize which TV ads are favored by which candidates. Others have shown how often ads appear to viewers over a particular period of time.
Who is behind the ad? From our partners at the Center for Responsive Politics (CRP), we gather information on who is behind the ads. Internet Archive researchers review each ad as we identify it, and record the name of the sponsor as it appears in that ad, such as “Donald J. Trump For President.”
That sponsor name is linked via an API to CRP’s data on what type of entity that sponsor is. Is it a super PAC, which can take unlimited contributions from donors, such as Priorities USA Action? Is it a nonprofit (501)c group, such as American Freedom Fund, often referred to as a “dark money” group because it doesn’t report its donors publicly?
In the snippet of data below, downloaded from this Trump ad, we’re looking at Trump’s candidate committee–his official committee registered with the Federal Election Commission (FEC), and as such must disclose donors over $200 and is limited in how much it is permitted to accept from each donor.
From CRP we get the “sponsor_affiliation,” if there is one–who the sponsor supports or opposes. CRP provides, if there is a sponsor affiliation, whether that organization supports or opposes that sponsor. Race provides information about which federal race the ad is part of–the presidential race, or a U.S. House or Senate race. In the latter case, the “race” field includes the district as well. Cycle refers to the election cycle. In this case, we are in the 2016 election cycle, which covers 2015-2016.
Using these data, researchers can analyze which campaigns are benefiting the most from dark money advertising, or where a particular super PAC has aired the most ads in markets we track.
What’s the ad about? Internet Archive researchers add additional information about ads that can help researchers figure out what an ad is about. For example, consider the data snippet below about this ad sponsored by Right to Rise USA, the super PAC that supported Jeb Bush’s candidacy.
An Internet Archive researcher watched the ad, and entered the names of which candidates were mentioned in the ad. In this case, it was the two other GOP governors that Bush was competing against for the presidency: Chris Christie of New Jersey and John Kasich of Ohio. The researcher also selected subjects covered by the ad, pulling from this index from one of our fact checking partners, PolitiFact. The researcher entered the “message” of the ad–pro, con, or mixed. In this case, the researcher selected “mixed,” since the ad contrasted the records of Christie and Kasich with Bush’s record.
Finally, the researcher notes if this is a “campaign” ad, which focuses on candidates, or an “issue” ad, which focuses on a “national legislative issue of public importance.” Federal Communications Commission (FCC) rules require that TV stations disclose ad buy contracts for both types of ads; therefore the Political TV Ad Archive includes such ads in this collection. Example: this ad on Puerto Rico debt.
There’s one more source for information on what political ads are about. Our data page also has the following download, which contains a list of all ads archived for the project.
This provides information about all the ads in our archive, including those that we have not found on TV. This might be because the ad is airing in a market we’re not tracking, or because it’s an ad that appears exclusively on social media. As can be seen in the snippet of data below, for most ads, we have transcripts.
At the time of this posting, we don’t know of any journalists or researchers have done visualizations based on the transcripts, or on the subjects of the ads–it’s an area crying out to be explored.
We also have a field called “reference.” If this number is higher than 0, that means we have a fact- or a source-check about that ad from one of our journalism partners. You can see this ad embedded on the video of the ad on our website. For example, the first ad listed above is here; if you scroll down the page, there are four fact- and source-checks from our partners on this one ad, which criticized Kasich on his record as governor of Ohio. See screenshot below.
We hope researchers will dig into this metadata and invent new ways to understand the barrage of political ads that are hitting voters this election season. Contact us with your questions and creations @PoliticalAdArchive or email@example.com.