Ruby on Rails "incorrect byte sequence in UTF-8" due to bot

I have some errors caused by the Chinese bot: http://www.easou.com/search/spider.html when it scrolls my sites.

My application versions are all with Ruby 1.9.3 and Rails 3.2.X

Here's the stacktrace:

An ArgumentError occurred in listings#show: invalid byte sequence in UTF-8 rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params' ------------------------------- Request: ------------------------------- * URL : http://www.my-website.com * IP address: XXXX * Parameters: {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"} * Rails root: /.../releases/20140708150222 * Timestamp : 2014-07-09 02:57:43 +0200 ------------------------------- Backtrace: ------------------------------- rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params' rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query' rack (1.4.5) lib/rack/utils.rb:93:in `each' rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query' rack (1.4.5) lib/rack/request.rb:332:in `parse_query' actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query' rack (1.4.5) lib/rack/request.rb:209:in `POST' actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST' actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters' ------------------------------- Session: ------------------------------- * session id: nil * data: {} ------------------------------- Environment: ------------------------------- * CONTENT_LENGTH : 514 * CONTENT_TYPE : application/x-www-form-urlencoded * HTTP_ACCEPT : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 * HTTP_ACCEPT_ENCODING : gzip, deflate * HTTP_ACCEPT_LANGUAGE : zh;q=0.9,en;q=0.8 * HTTP_CONNECTION : close * HTTP_HOST : www.my-website.com * HTTP_REFER : http://www.my-website.com/ * HTTP_USER_AGENT : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) * ORIGINAL_FULLPATH : / * PASSENGER_APP_SPAWNER_IDLE_TIME : -1 * PASSENGER_APP_TYPE : rack * PASSENGER_CONNECT_PASSWORD : [FILTERED] * PASSENGER_DEBUGGER : false * PASSENGER_ENVIRONMENT : production * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME : -1 * PASSENGER_FRIENDLY_ERROR_PAGES : true * PASSENGER_GROUP : * PASSENGER_MAX_REQUESTS : 0 * PASSENGER_MIN_INSTANCES : 1 * PASSENGER_SHOW_VERSION_IN_HEADER : true * PASSENGER_SPAWN_METHOD : smart-lv2 * PASSENGER_USER : * PASSENGER_USE_GLOBAL_QUEUE : true * PATH_INFO : / * QUERY_STRING : * REMOTE_ADDR : 183.60.212.153 * REMOTE_PORT : 52997 * REQUEST_METHOD : GET * REQUEST_URI : / * SCGI : 1 * SCRIPT_NAME : * SERVER_PORT : 80 * SERVER_PROTOCOL : HTTP/1.1 * SERVER_SOFTWARE : nginx/1.2.6 * UNION_STATION_SUPPORT : false * _ : _ * action_controller.instance : listings#show * action_dispatch.backtrace_cleaner : #<Rails::BacktraceCleaner:0x000000056e8660> * action_dispatch.cookies : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28> * action_dispatch.logger : #<ActiveSupport::TaggedLogging:0x0000000318aff8> * action_dispatch.parameter_filter : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/] * action_dispatch.remote_ip : 183.60.212.153 * action_dispatch.request.content_type : application/x-www-form-urlencoded * action_dispatch.request.parameters : {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"} * action_dispatch.request.path_parameters : {:action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"} * action_dispatch.request.query_parameters : {} * action_dispatch.request.request_parameters : {} * action_dispatch.request.unsigned_session_cookie: {} * action_dispatch.request_id : 9f8afbc8ff142f91ddbd9cabee3629f3 * action_dispatch.routes : #<ActionDispatch::Routing::RouteSet:0x0000000339f370> * action_dispatch.show_detailed_exceptions : false * action_dispatch.show_exceptions : true * rack-cache.allow_reload : false * rack-cache.allow_revalidate : false * rack-cache.cache_key : Rack::Cache::Key * rack-cache.default_ttl : 0 * rack-cache.entitystore : rails:/ * rack-cache.ignore_headers : ["Set-Cookie"] * rack-cache.metastore : rails:/ * rack-cache.private_headers : ["Authorization", "Cookie"] * rack-cache.storage : #<Rack::Cache::Storage:0x000000039c5768> * rack-cache.use_native_ttl : false * rack-cache.verbose : false * rack.errors : #<IO:0x000000006592a8> * rack.input : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0> * rack.multiprocess : true * rack.multithread : false * rack.request.cookie_hash : {} * rack.request.form_hash : * rack.request.form_input : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0> * rack.request.form_vars :    W "  陷q B  )     F  PZ  8   & G\y P  u T ed  . % mxEAẳ\ d* Hg   C賳 lj      U 1  ]pgt P  Ɗ   c"    LX  D   HR y  p`6 l   lN P  l S    `V4y  c    X2  &JO!  *p  l  - U  w }g ԍk   (  FJ   q : 5G Jh pί    ࡃ]  z h     d } } * rack.request.query_hash : {} * rack.request.query_string : * rack.run_once : false * rack.session : {} * rack.session.options : {:path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil} * rack.url_scheme : http * rack.version : [1, 0] 

As you can see, there is no invalid utf-8 in the rack.request.form_vars , but only in rack.request.form_vars . I have about a hundred mistakes a day, and it all looks like this.

So, I tried to force utf-8 to rack.request.form_vars with something like this:

 class RackFormVarsSanitizer def initialize(app) @app = app end def call(env) if env["rack.request.form_vars"] env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8') end @app.call(env) end end 

And I call it in my application.rb :

 config.middleware.use "RackFormVarsSanitizer" 

This does not work because I already have errors. The problem is that I cannot test in development mode because I do not know how to set rack.request.form_vars .

I installed utf8-cleaner gem but didn't fix anything.

Anyone have an idea to fix this? or call it during development?

+50
ruby ruby-on-rails ruby-on-rails-3 utf-8
Jul 09 '14 at 7:54
source share
3 answers

Therefore, you do not need to comment on the comments in another answer, this is what I am doing now - I have not seen errors for 24 hours, so it looks very promising:

Add rack-utf8_sanitizer to your Gemfile:

 gem 'rack-utf8_sanitizer' 

and run

 bundle 

Put this middleware in app/middleware/handle_invalid_percent_encoding.rb and rename the class HandleInvalidPercentEncoding (because ExceptionApp is too general).

In the config block config config/application.rb do:

 require "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb" # NOTE: These must be in this order relative to each other. # HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover, # so it must run after (= be inserted before) Rack::UTF8Sanitizer. config.middleware.insert 0, HandleInvalidPercentEncoding config.middleware.insert 0, Rack::UTF8Sanitizer # from a gem 

Deployment. Done.

(The app turns out to be a place for middleware in a project that I'm working on, but I would prefer lib . No matter who should work.)

+29
Jul 13 '14 at 10:00
source share

Add this line to your Gemfile , then run bundle in your terminal:

 gem "handle_invalid_percent_encoding_requests" 

This solution is based on the Henrik solution , turned into a gem of the Rails Engine .

+9
Jul 22. '14 at 17:53
source share

There is a problem in gem repo with reference to someone possible solution - they say it works for them, but they are not sure if this is a good solution.

I still have to try, but I think I will.

0
Jul 09 '14 at 13:30
source share



All Articles