I have a php script that I run from the terminal, here is what it does:
- captures a row of data from the database (the table stores JSON strings for processing by this script);
- converts the JSON string to an array and prepares the data that needs to be inserted into the database.
- Inserts the required data into the database
here is the script:
#!/usr/bin/php <?PHP //script used to parse tweets we have gathered from the twitter streaming API mb_internal_encoding("UTF-8"); date_default_timezone_set('UTC'); require './config/config.php'; require './libs/db.class.php'; require './libs/tweetReadWrite.class.php'; require './libs/tweetHandle.class.php'; require './libs/tweetPrepare.class.php'; require './libs/pushOver.class.php'; require './libs/getLocationDetails.class.php'; //instatiate our classes $twitdb = new db(Config::getConfig("twitterDbConnStr"),Config::getConfig("twitterDbUser"),Config::getConfig("twitterDbPass")); $pushOvr = new PushOver(); // push error messages to my phone $tweetPR = new TweetPrepare(); // prepares tweet data $geoData = new getLocationDetails($pushOvr); // reverse geolocation using google maps API $tweetIO = new TweetReadWrite($twitdb,$tweetPR,$pushOvr,$geoData); // read and write tweet data to the database /* grab cached json row from the ORCALE Database * * the reason the JSON string is brought back in multiple parts is because * PDO doesnt handle CLOB very well and most of the time the JSON string * is larger than 4000 chars - its a hack but it works * * the following sql specifies a test row to work with which has characters like €$£ etc.. */ $sql = " SELECT a.tjc_id , dbms_lob.substr(tweet_json, 4000,1) part1 , dbms_lob.substr(tweet_json, 8000,4001) part2 , dbms_lob.substr(tweet_json, 12000,8001) part3 FROM twtr_json_cache a WHERE a.tjc_id = 8368 "; $sth = $twitdb->prepare($sql); $sth->execute(); $data = $sth->fetchAll(); //join JSON string back together $jsonRaw = $data[0]['PART1'].$data[0]['PART2'].$data[0]['PART3']; //shouldnt needs to do this, doesnt affect the outcome anyway $jsonRaw = mb_convert_encoding($jsonRaw, "UTF-8"); //convert JSON object to an array $data = json_decode($jsonRaw,true); //prepares the data (grabs the data I need from the JSON object and does some //validation etc then finally submits to the database $result = $tweetIO->saveTweet($data); // returns BOOL echo $result; ?>
now, if I run it from the terminal using ./proc_json_cache.php or php proc_json_chache.php , it works fine, the data goes to the UTF-8 database, and all is well, the data in the database looks like this: £ $@ € < test .
if I call this script via CRON, it still saves the data, but special characters like € £ etc. are just squares and the data in the database looks like $@ < test .
so far the things i tried add the following lines to my crontab:
TERM=xterm SHELL=/bin/bash
this was consistent with my current shell env session settings, as well as adding the following to a bash script that calls my php script:
export NLS_LANG="ENGLISH_UNITED KINGDOM.AL32UTF8" export LANG="en_GB.UTF-8"
again to match my current ENV shell settings, but I still get a character encoding problem when the script is run from cron vs direct in the terminal.
Does anyone have similar problems that can shed light on how to fix this? Thanks in advance.
EDIT:
Here are some more server information:
OS: SUSE Linux Enterprise Server 11 PHP: 5.2.14