Disclaimer (January, 7th 2021)
At the beginning of December 2020, Tiktok changed its API structure
and its security measures to control the traffic of metadata. As a
result, requests made with tiktokr
are blocked very often, if not
systematically (error when parsing the json data structure).
After trying minor patches, we concluded that Tiktokr needs to be completely rewritten to fit the new infrastructure of Tiktok. Because none of the author has the time currently to rewrite the package, we putting it on hold for now and appologize for the resulting inconvenience. If you are interested in taking over the challenge, we are glad to share the knowledge that we have accumulated along the development of tiktokr.
The goal of tiktokr
is to provide a scraper for the video-sharing
social networking service TikTok.
While writing this library, we were broadly inspired by the Python
module
davidteather/TikTok-Api.
You will need Python 3.6 or Docker to use tiktokr
. If you want to use
Docker check out the guide for that
here.
Many thanks go to Vivien Fabry for creating the hexagon logo.
Overview
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("benjaminguinaudeau/tiktokr")
Load library
library(tiktokr)
Make sure to use your preferred Python installation
library(reticulate)
use_python(py_config()$python)
Install necessary Python libraries
tk_install()
In November 2020, Tiktok increased its security protocol. They now frequently show a captcha, which is easily triggered after a few requests. This can be solved by specifying the cookie parameter. To get a cookie session:
- Open a browser and go to “http://tiktok.com”
- Scroll down a bit, to ensure, that you don’t get any captcha
- Open the javascript console (in Chrome: View > Developer > Javascript Console)
- Run
document.cookie
in the console. Copy the entire output (your cookie). - Run
tk_auth()
in R and paste the cookie.
Click on image below for screen recording of how to get your TikTok cookie:
The tk_auth
function will save cookies (and user agent) as environment
variable to your .Renviron
file. You need to only run this once to use
tiktokr
or whenever you want to update your cookie/user agent.
tk_auth(cookie = "<paste here the output from document.cookie>")
TikTok requires API queries to be identified with a unique hash. To get
this hash tiktokr
runs a puppeteer-chrome
session in the background.
Apparently puppeteer
sometimes causes issues on some operating
systems, so we also created a Docker image, that can be run on any
computer with Docker installed. Note: if you run tiktokr
with Docker
you won’t need a Python installation.
To find out if you are experiencing puppeteer
problems run:
library(tiktokr)
Sys.setenv("TIKTOK_DOCKER" = "")
tk_auth(cookie = "<your_cookie_here>")
tk_init()
out <- get_signature("test")
if(stringr::str_length(get_docker_signature("")) > 16){
message("Puppeteer works well on you computer")
} else {
message("Puppeteer does not work, please consider using Docker")
}
If you experience problems try to install Docker as outlined in the steps below.
If you have either a Mac, Linux (for example Ubuntu) or Windows 10 Professional / Education / Enterprise operating system, simply install Docker (click on respective hyperlinks).
If you only have Windows 10 Home the installation of Docker requires more steps.
-
Follow the steps to install Windows Subsystem for Linux
-
Follow the steps to install Docker on Windows Home
To run tiktokr
with Docker you need to use tk_auth()
with
docker = TRUE
which sets the necessary environment variable.
tk_auth(docker = T)
Now run tk_init()
to set up the Docker container.
tk_init()
You can check whether your Docker container is working correctly by running the following code:
if(stringr::str_length(get_docker_signature("")) > 16){
message("Signature successful. Your Docker container is working.")
} else {
message("Unable to get signature")
}
Now try running the examples below.
For every session involving tiktokr
, you will need to initialize the
package with tk_init()
. Once it is initialized you can run as many
queries as you want.
tk_init()
Returns a tibble with trends.
# Trend
trends <- tk_posts(scope = "trends", n = 200)
user_posts <- tk_posts(scope = "user", query = "willsmith", n = 50)
Note: Hashtags query only provides 2k hits, which are not drawn randomly or based on the most recent post date but rather some mix of recent and popular TikToks.
hash_post <- tk_posts(scope = "hashtag", query = "maincharacter", n = 100)
Note: Hashtags query only provides 2k hits, which are not drawn randomly or based on the most recent post date but rather some mix of recent and popular TikToks.
user_posts <- tk_posts(scope = "user", query = "willsmith", n = 50)
music_post <- tk_posts(scope = "music", query = user_posts$music_id[1], n = 100)
With tk_dwnl
you can download videos from TikTok.
From Trends:
# fs::dir_create("video")
trends <- tk_posts(scope = "trends", n = 5)
trends %>%
split(1:nrow(.)) %>%
purrr::walk(~{tk_dwnl(.x$video_downloadAddr, paste0("video/", .x$id, ".mp4"))})
# fs::dir_delete("video")