This repository contains the TikTok Video IDs of content related to the 2024 Presidential Election. The data was collected with TikTok's Research API. To comply with TikTok's terms of service, we are only publicly releasing the TikTok IDs of the collected TikTok videos. We will continue to be updating this repostory with more data, so stay tuned for updates!
data/: Contains a CSV file of video IDs collected.scripts/: Contains the Python code used for data collection.supplementary_files/: Lists the keywords and hashtags applied in each phase of data collection.
-
6/29/2024
- Initial release of collected video IDs.
- Published keywords/hashtags used in query and scripts for data collection (with future modifications planned).
-
7/20/2024
- Updated metadata collection script (
metadata_collection.py).
- Updated metadata collection script (
-
11/1/2024
- Added additional metadata in
/data/TikTok_IDs_v2.csv(covering publication dates 11/1/2023 - 10/24/2024). - Filled gaps in
TikTok_IDs_Memo1.csv. - Total IDs: 3,176,949
- Added additional metadata in
-
02/13/2025
- Added additional metadata in
/data/TikTok_IDs_v3.csv(covering publication dates 11/1/2023 - 01/20/2025). - Total IDs: 4,069,908
- The IDs are only posted on Github, but all the transcripts generated by OpenAI's Whisper are in this Google Drive Folder: https://drive.google.com/drive/folders/1DsFYkBjIw4Ve0ngKYT-OF96LsMlxLgyW?usp=sharing.
- Added additional metadata in
To fully utilize the resources in this repository, please ensure you have access to:
- TikTok Research API: For reposted videos and metadata collection via
scripts/metadata_collection.py. - Duoyin API: For collecting comments.
- TikTok Unofficial API: For collecting user information.
For any inquiries, please reach out to Gabriela Pinto at gpinto@usc.edu.
For an in-depth analysis of the dataset, check out our memo on arXiv: https://arxiv.org/abs/2407.01471. An updated version will be available soon, so stay tuned!