^Q is a software control (XON) that should not be in HTML. I suspect that his unexpected presence confuses both Nokogiri and Heroku, but in different ways.
HTML documents from wild places on the Internet can be corrupted in any number of ways. I saw all kinds of garbage in them, and if I couldn’t figure it out with iconv or Unicode transliteration, I would resort to a quick global search and replace to remove anything outside the normal ASCII range before processing.
In Ruby, global search and replace uses String#gsub .
doc = Nokogiri::HTML(html.gsub("\u0011", ''))
source share