How to clear all subreddit messages for a specific period of time

I have a function to clear all messages in the bitcoin subunit between 2014-11-01 and 2015-10-31.

However, I can only retrieve 990 messages that will return only by October 25th. I do not understand what's going on. I included Sys.sleep from 15 seconds between each retrieval after the link to https://github.com/reddit/reddit/wiki/API , but to no avail.

In addition, I experimented with scrapers from another subreddit (fitness), but it also returned about 900 messages.

require(jsonlite) require(dplyr) getAllPosts <- function() { url <- "https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&limit=100" extract <- fromJSON(url) posts <- extract$data$children$data %>% dplyr::select(name, author, num_comments, created_utc, title, selftext) after <- posts[nrow(posts),1] url.next <- paste0("https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&after=",after,"&limit=100") extract.next <- fromJSON(url.next) posts.next <- extract.next$data$children$data # execute while loop as long as there are any rows in the data frame while (!is.null(nrow(posts.next))) { posts.next <- posts.next %>% dplyr::select(name, author, num_comments, created_utc, title, selftext) posts <- rbind(posts, posts.next) after <- posts[nrow(posts),1] url.next <- paste0("https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&after=",after,"&limit=100") Sys.sleep(15) extract <- fromJSON(url.next) posts.next <- extract$data$children$data } posts$created_utc <- as.POSIXct(posts$created_utc, origin="1970-01-01") return(posts) } posts <- getAllPosts() 

Does reddit have any limit that I click?

+6
source share
1 answer

Yes, all reddit lists (posts, comments, etc.) are limited to 1000 items; they are essentially just cached lists, not queries, for performance reasons.

To get around this, you will need to do a smart search based on timestamps .

+4
source

All Articles