Learning Python How can I make it more pythonic?

I am a PHP developer studying the outside world. I decided to start learning Python.

The script below is my first attempt at porting a PHP script to Python. His task is to get tweets from the Redis store. Tweets come from the Twitter Streaming API and are saved as JSON objects. Then the necessary information is extracted and written to the CSV file, which must be imported into MySQL using LOAD DATA LOCAL INFILE , located on another server.

So the question is: now that I have my first run of a Python script, how can I make it more Pythonic? Are there any suggestions you have? Make it better? Tricks I should know about? Constructive criticism?

Update: Having accepted all the offers so far, here is the updated version:
Update 2:. Run the code through pylint. Now the result is 9.89 / 10. Any other suggestions?

 # -*- coding: utf-8 -*- """Redis IO Loop for Tweelay Bot""" from __future__ import with_statement import simplejson import re import datetime import time import csv import hashlib # Bot Modules import tweelay.red as red import tweelay.upload as upload import tweelay.openanything as openanything __version__ = "4" def process_tweets(): """Processes 0-20 tweets from Redis store""" data = [] last_id = 0 for i in range(20): last = red.pop_tweet() if not last: break t = TweetHandler(last) t.cleanup() t.extract() if t.get_tweet_id() == last_id: break tweet = t.proc() if tweet: data = data + [tweet] last_id = t.get_tweet_id() time.sleep(0.01) if not data: return False ch = CSVHandler(data) ch.pack_csv() ch.uploadr() source = "http://bot.tweelay.net/tweets.php" openanything.openAnything( source, etag=None, lastmodified=None, agent="Tweelay/%s (Redis)" % __version__ ) class TweetHandler: """Cleans, Builds and returns needed data from Tweet""" def __init__(self, json): self.json = json self.tweet = None self.tweet_id = 0 self.j = None def cleanup(self): """Takes JSON encoded tweet and cleans it up for processing""" self.tweet = unicode(self.json, "utf-8") self.tweet = re.sub('^s:[0-9]+:["]+', '', self.tweet) self.tweet = re.sub('\n["]+;$', '', self.tweet) def extract(self): """Takes cleaned up JSON encoded tweet and extracts the datas we need""" self.j = simplejson.loads(self.tweet) def proc(self): """Builds the datas from the JSON object""" try: return self.build() except KeyError: if 'delete' in self.j: return None else: print ";".join(["%s=%s" % (k, v) for k, v in self.j.items()]) return None def build(self): """Builds tuple from JSON tweet""" return ( self.j['user']['id'], self.j['user']['screen_name'].encode('utf-8'), self.j['text'].encode('utf-8'), self.j['id'], self.j['in_reply_to_status_id'], self.j['in_reply_to_user_id'], self.j['created_at'], __version__ ) def get_tweet_id(self): """Return Tweet ID""" if 'id' in self.j: return self.j['id'] if 'delete' in self.j: return self.j['delete']['status']['id'] class CSVHandler: """Takes list of tweets and saves them to a CSV file to be inserted into MySQL data store""" def __init__(self, data): self.data = data self.file_name = self.gen_file_name() def gen_file_name(self): """Generate unique file name""" now = datetime.datetime.now() hashr = hashlib.sha1() hashr.update(str(now)) hashr.update(str(len(self.data))) hash_str = hashr.hexdigest() return hash_str+'.csv' def pack_csv(self): """Save tweet data to CSV file""" with open('tmp/'+self.file_name, mode='ab') as ofile: writer = csv.writer( ofile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) writer.writerows(self.data) def uploadr(self): """Upload file to remote host""" url = "http://example.com/up.php?filename="+self.file_name uploadr = upload.upload_file(url, 'tmp/'+self.file_name) if uploadr[0] == 200: print "Upload: 200 - ("+str(len(self.data))+")", self.file_name print "-------" #os.remove('tmp/'+self.file_name) else: print "Upload Error:", uploadr[0] if __name__ == "__main__": while True: process_tweets() time.sleep(1) 
+4
source share
7 answers

Instead:

  i=0 end=20 last_id=0 data=[] while(i<=end): i = i + 1 ... 

the code:

  last_id=0 data=[] for i in xrange(1, 22): ... 

The same semantics, more compact and Pythonic.

Instead

 if not last or last == None: 

just do

 if not last: 

since None is false-ish anyway (so not last is True when last is None). In general, when you want to check if something is None). In general, when you want to check if something is None , code is None , not == None`.

IN

  if(j['id'] <> last_id): 

lose extra parentheses and instead of the deprecated operator <> and code

  if j['id'] != last_id: 

and remove extra parentheses from other if .

Instead:

  if len(data) == 0: 

the code:

  if not data: 

since any empty container is false-ish.

IN

 hash_str = str(hash.hexdigest()) 

instead of this

 hash_str = hash.hexdigest() 

since the method already returns a string, making str call redundant.

Instead:

  for item in data: writer.writerow(item) 

using

  writer.writerows(data) 

which executes the loop on your behalf.

Instead

  ofile = open('tmp/'+file_name, mode='ab') ... ofile.close() 

use (in Python 2.6 or better, or in 2.5, starting the module with

  from __future__ import with_statement 

to "import from the future" function of the with ) operator:

  with open('tmp/'+file_name, mode='ab') as ofile: ... 

which guarantees closure for you (including in cases where an exception may occur).

Instead

 print "Upload Error: "+uploadr[0] 

using

 print "Upload Error:", uploadr[0] 

and similarly for other print statements, a comma inserts space for you.

I am sure that there are more such trifles, but there are several that "jumped to the eye" when I looked at your code.

+19
source

Python python does not use integer flow control. Idiom almost always for item in container: In addition, I would use a class to store the "User object". It will be much easier to use than the simple container types that lists and dictionaries like (and organize your code in a more OO style.) You can compile reg-exes before hand for better performance.

 class MyTweet(object): def __init__(self, data): # ...process json here # ... self.user = user for data in getTweets(): tweet = MyTweet(data) 
+6
source
 # Bot Modules import red #Simple Redis API functions import upload #pycurl script to upload to remote server 

If your application will be used and supported, it is better to pack all these modules in a package.

+2
source

Instead....

  i=0 end=20 last_id=0 data=[] while(i<=end): i = i + 1 

you can use...

 for i in range(20): 

but overall, it’s not very clear where it comes from 20? magi #?

+2
source

If you have a method that is not suitable for the view pane, you really want to shorten it. Say 15 lines or so. I see that at least 3 methods look like: print_tweet, save_csv and upload_data. It's a little hard to say what they should be named for, but there seem to be three separate sections of code that you should try to break out of.

+2
source

Run your code through pylint .

+2
source
  • Each variable name I've ever seen in Python was lowercase, with no underscores. (I do not think this is a requirement and cannot be standard practice.)
  • You must break the logic into several single-purpose methods.
  • Take 2 more steps and create some classes to encapsulate related methods.
+1
source

All Articles