Why does running a PHP script from CRON cause problems with character encoding?

I have a php script that I run from the terminal, here is what it does:

  • captures a row of data from the database (the table stores JSON strings for processing by this script);
  • converts the JSON string to an array and prepares the data that needs to be inserted into the database.
  • Inserts the required data into the database

here is the script:

#!/usr/bin/php <?PHP //script used to parse tweets we have gathered from the twitter streaming API mb_internal_encoding("UTF-8"); date_default_timezone_set('UTC'); require './config/config.php'; require './libs/db.class.php'; require './libs/tweetReadWrite.class.php'; require './libs/tweetHandle.class.php'; require './libs/tweetPrepare.class.php'; require './libs/pushOver.class.php'; require './libs/getLocationDetails.class.php'; //instatiate our classes $twitdb = new db(Config::getConfig("twitterDbConnStr"),Config::getConfig("twitterDbUser"),Config::getConfig("twitterDbPass")); $pushOvr = new PushOver(); // push error messages to my phone $tweetPR = new TweetPrepare(); // prepares tweet data $geoData = new getLocationDetails($pushOvr); // reverse geolocation using google maps API $tweetIO = new TweetReadWrite($twitdb,$tweetPR,$pushOvr,$geoData); // read and write tweet data to the database /* grab cached json row from the ORCALE Database * * the reason the JSON string is brought back in multiple parts is because * PDO doesnt handle CLOB very well and most of the time the JSON string * is larger than 4000 chars - its a hack but it works * * the following sql specifies a test row to work with which has characters like €$£ etc.. */ $sql = " SELECT a.tjc_id , dbms_lob.substr(tweet_json, 4000,1) part1 , dbms_lob.substr(tweet_json, 8000,4001) part2 , dbms_lob.substr(tweet_json, 12000,8001) part3 FROM twtr_json_cache a WHERE a.tjc_id = 8368 "; $sth = $twitdb->prepare($sql); $sth->execute(); $data = $sth->fetchAll(); //join JSON string back together $jsonRaw = $data[0]['PART1'].$data[0]['PART2'].$data[0]['PART3']; //shouldnt needs to do this, doesnt affect the outcome anyway $jsonRaw = mb_convert_encoding($jsonRaw, "UTF-8"); //convert JSON object to an array $data = json_decode($jsonRaw,true); //prepares the data (grabs the data I need from the JSON object and does some //validation etc then finally submits to the database $result = $tweetIO->saveTweet($data); // returns BOOL echo $result; ?> 

now, if I run it from the terminal using ./proc_json_cache.php or php proc_json_chache.php , it works fine, the data goes to the UTF-8 database, and all is well, the data in the database looks like this: £ $@ € < test .

if I call this script via CRON, it still saves the data, but special characters like € £ etc. are just squares and the data in the database looks like $@ < test .

so far the things i tried add the following lines to my crontab:

 TERM=xterm SHELL=/bin/bash 

this was consistent with my current shell env session settings, as well as adding the following to a bash script that calls my php script:

 export NLS_LANG="ENGLISH_UNITED KINGDOM.AL32UTF8" export LANG="en_GB.UTF-8" 

again to match my current ENV shell settings, but I still get a character encoding problem when the script is run from cron vs direct in the terminal.

Does anyone have similar problems that can shed light on how to fix this? Thanks in advance.

EDIT:

Here are some more server information:

OS: SUSE Linux Enterprise Server 11 PHP: 5.2.14

+4
source share
2 answers

So, after several hours of working on the problem, it looks like it is associated with shell session variables that are not passed to the PHP script.

One thing that I forgot to mention was that the script is not called directly by the cron task, but by another type of type PHP, a script that checks if the script has already been run, and if not, would use pcntl_exec() to call the script.

Now, since I did not pass the environment parameters as the third parameter, this meant that any shell environment settings that I set in crontab, where they were not transferred to my script (which shared the current process space).

So I really did this:

 pcntl_exec($script, $args); //script take over the process space //but no continued shell env settings 

When what I had to do was:

 $a = get_defined_vars(); pcntl_exec($script, $args, $a['_SERVER']); //script take over the process space //but with shell env settings continued 

see the php.net manual for pcntl_exec () for more information.

0
source

Try adding to a bash script that calls your PHP script:

 unset LANG LANGUAGE LC_CTYPE export LANG=en_GB.UTF-8 LANGUAGE=en LC_CTYPE=en_GB.UTF-8 

See: Re: Crontab charset not in utf-8

+2
source

All Articles