We scraped Fox News transcripts from here. In all, we scraped around ~24k transcripts.
I scraped the data again in 2025, and the breakdown is as follows:
| year | count |
|---|---|
| 2003 | 450 |
| 2004 | 365 |
| 2005 | 431 |
| 2006 | 411 |
| 2007 | 304 |
| 2008 | 418 |
| 2009 | 425 |
| 2010 | 314 |
| 2011 | 523 |
| 2012 | 1019 |
| 2013 | 777 |
| 2014 | 866 |
| 2015 | 890 |
| 2016 | 821 |
| 2017 | 1259 |
| 2018 | 1752 |
| 2019 | 5865 |
| 2020 | 5995 |
| 2021 | 5400 |
| 2022 | 6782 |
| 2023 | 9585 |
| 2024 | 8256 |
| 2025 | 1474 |
The final dataset, including the HTML files, is posted on a Harvard Dataverse
- notnews/msnbc_transcripts β MSNBC Transcripts: 2003--2022
- notnews/cnn_transcripts β CNN Transcripts 2000--2025
- notnews/stanford_tv_news β Stanford Cable TV News Dataset
- notnews/nbc_transcripts β NBC transcripts 2011--2014
- notnews/archive_news_cc β Closed Caption Transcripts of News Videos from archive.org 2014--2023