I am having problems with strings in python without being == when I think they should be, and I believe this has something to do with how they are encoded. Basically, I parse some values ββseparated by commas that are stored in zip archives (GTFS channels are especially for those who are interested).
I use the ZipFile module in python to open specific zip archive files and then compare the text there with some known values. Here is an example file:
agency_id,agency_name,agency_url,agency_phone,agency_timezone,agency_lang ARLC,Arlington Transit,http:
The code I use tries to identify the position of the line "agency_id" in the first line of text so that I can use the corresponding value in any subsequent lines. Here is the code snippet:
zipped_feed = ZipFile(feed_name, "r") agency_file = zipped_feed.open("agency.txt", "r") line_num = 0 agencyline = agency_file.readline() while agencyline: if line_num == 0: # this is the header, all we care about is the agency_id lineparts = agencyline.split(",") position = -1 counter = 0 for part in lineparts: part = part.strip() if part == "agency_id": position = counter counter += 1 line_num += 1 agencyline = agency_file.readline() else: .....
This code works for some zip archives, but not for others. I did some research and tried the listing (part), and I got '\ xef \ xbb \ xbfagency_id' instead of 'agency_id'. Does anyone know what is going on here and how can I fix it? Thanks for the help!
source share