I have the following text and I would like to clear the data items and save them in excel. Is there a way to do this in Python?
text = """ ANNUAL COMPENSATION LONG-TERM COMPENSATION --------------------------------------- ------------------------------------- AWARDS PAYOUTS -------------------------- ---------- SECURITIES OTHER RESTRICTED UNDERLYING ALL OTHER NAME AND PRINCIPAL ANNUAL STOCK OPTIONS/ LTIP COPMPENSA- POSITION YEAR SALARY ($) BONUS ($) COMPENSATION ($) AWARD(S) ($)(1) SAR (#) PAYOUTS($) TION($)(3) ------------------ ---- ---------- --------- ---------------- --------------- ---------- ---------- ---------- JOHN W. WOODS 1993 $595,000 $327,250 There is no $203,190.63 18,000 $ 29,295 Chairman, President, & 1992 $545,000 $245,250 compensation 166,287.50 18,825 (2) Not $ 29,123 Chief Executive Officer 1991 $515,000 $283,251 required to be 45,000 Applicable of AmSouth & AmSouth disclosed in Bank NA this column. C. STANLEY BAILEY 1993 $266,667(4) $133,333 117,012.50 4,500 $ 11,648 Vice Chairman, AmSouth 1992 $210,000 $ 84,000 42,400.00 4,800 $ 12,400 & AmSouth Bank NA 1991 $186,750 $ 82,170 161,280.00 9,750 C. DOWD RITTER 1993 $266,667(4) $133,333 117,012.50 4,500 $ 13,566 Vice Chairman, AmSouth 1992 $210,000 $ 84,000 42,400.00 4,800 $ 12,920 & AmSouth Bank NA 1991 $188,625 $ 82,995 161,280.00 9,750 WILLIAM A. POWELL, JR. 1993 $211,335 $ 95,101 11,000 $124,548 President, AmSouth 1992 $330,000 $132,000 98,050.00 11,100 $ 22,225 and Vice Chairman, 1991 $308,000 $169,401 24,000 AmSouth Bank NA Retired in 1993 A. FOX DEFUNIAK, III 1993 $217,000 $ 75,950 52,971.88 4,500 $ 11,122 Senior Executive Vice 1992 $200,000 $ 62,000 42,400.00 4,800 $ 11,240 President, Birmingham 1991 $177,500 $ 78,100 161,280.00 9,750 Banking Group, AmSouth Bank NAEW STEPHENSON, JR. 1993 $177,833 $ 71,133 52,971.88 3,400 $ 9,256 Senior Executive Vice 1992 $150,000 $ 45,000 27,825.00 3,150 $ 8,560 President, AmSouth 1991 $140,000 $ 52,488 107,520.00 6,750 and Chairman & Chief Executive Officer, AmSouth Bank of Florida """
Right now, I'm just trying to get it in csv style format with '|' character to separate data items, and then manually extract data to succeed:
tmp = open('tmp.txt','w') tmp.write(text) tmp.close() data1 = [] for line in open('tmp.txt'): line = line.lower() if 'SALARY' in line: line = line.replace(' ','|') line = line.replace('--', '') line = line.replace('- -', '') line = line.replace('- -', '') line = line.replace('(1)', '') line = line.replace('(2)', '') line = line.replace('(3)', '') line = line.replace('(4)', '') line = line.replace('(5)', '') line = line.replace('(6)', '') line = line.replace('(7)', '') line = line.replace('(8)', '') line = line.replace('(9)', '') line = line.replace('(10)', '') line = line.replace('(11)', '') line = line.replace('(S)', '') line = line.replace('($)', '') line = line.replace('(#)', '') line = line.replace('$', '') line = line.replace('-0-', '0') line = line.replace(')', '|') line = line.replace('(', '|-') line = re.sub(r'\s(\d)', '|\\1', line) line = line.replace(' ', '') line = line.replace('||', '|') data1.append(line) data = ''.join(data1)
The problem is that I have to do this thousands of times, and I’ll always have to go through each table and save the elements I need. Is there a way to create a dictionary that will track things like year, salary, bonus, other annual compensation, etc. For each person listed in the leftmost column?