Invalid classifier gemstone character encoding

Everything seemed to work fine when I added elements to my Postgres database. Without changing anything, my Rails application began to fail when it started with Madeleine anywhere in my application:

EncodingError in EventsController#update invalid encoding symbol app/controllers/events_controller.rb:137:in `update' 

137 - problem:

 135 def update 136 @event = Event.find(params[:id]) 137 m = SnapshotMadeleine.new("bayes_data") .... end 

I can classify things in the console, though, which is part of what causes me confusion. In the console, this works great:

 m = SnapshotMadeleine.new("bayes_data") {} => #<Madeleine::DefaultSnapshotMadeleine:0x000... m.system => #<Classifier::Bayes:0x000... m.system.classify "test" 

I use the latest Classifier gem with Madeleine.

I realized that something was corrupted in my picture that he was trying to upload, so I deleted it, but that did not solve the problem. Here is my second-last snapshot (now the most recent):

 a = File.read('bayes_data/000000000000000000041.snapshot') a.encoding => #<Encoding:UTF-8> a.valid_encoding? => true 

Not sure what is going on here. I saw that some people with Ruby 1.9.3-p125 had similar problems, so I upgraded to the latest stable version 1.9.3-p194, but that didn't help either.

Here is a link to the documentation for the classifier, which mentions how to use Madeleine: http://classifier.rubyforge.org/

I would really like to understand what is happening here. Thanks!

+7
source share
3 answers

I had a problem with rails_admin gem with a mysql adapter that was not encoded, maybe you can check if your postgres adaptar is there and if you don't try another

0
source

I don’t know why the standard Marshal class does not work, but I had good results using

 m = SnapshotMadeleine.new("bayes_data", YAML) do b = Classifier::Bayes.new "Positive", "Negative end 

and then

 m = SnapshotMadeleine.new("bayes_data", YAML) 

did something break with the marshal? Not sure.

0
source

You must ensure that the strings you use to train the data are UTF-8 encoded. If you are training a data set, an ugly hack must have

 Encoding.default_external = Encoding::UTF_8 Encoding.default_internal = Encoding::UTF_8 

in the script.

0
source

All Articles