I have an input file with a list of strings.
I repeat every fourth line, starting from line 2.
From each of these lines I create a new line from the first and last 6 characters and put it in the output file only if this new line is unique.
The code I wrote for this works, but I work with very large deep sequencing files and works throughout the day and has not made much progress. Therefore, I am looking for any suggestions to make it much faster, if possible. Thanks.
def method(): target = open(output_file, 'w') with open(input_file, 'r') as f: lineCharsList = [] for line in f: #Make string from first and last 6 characters of a line lineChars = line[0:6]+line[145:151] if not (lineChars in lineCharsList): lineCharsList.append(lineChars) target.write(lineChars + '\n') #If string is unique, write to output file for skip in range(3): #Used to step through four lines at a time try: check = line #Check for additional lines in file next(f) except StopIteration: break target.close()
source share