this is a previous question , where to improve time performance in a python function, I need to find an efficient way to split my text file
I have the following text file (over 32 GB) not sorted
.................... 0 274 593869.99 6734999.96 121.83 1, 0 273 593869.51 6734999.92 121.57 1, 0 273 593869.15 6734999.89 121.57 1, 0 273 593868.79 6734999.86 121.65 1, 0 272 593868.44 6734999.84 121.65 1, 0 273 593869.00 6734999.94 124.21 1, 0 273 593868.68 6734999.92 124.32 1, 0 274 593868.39 6734999.90 124.44 1, 0 275 593866.94 6734999.71 121.37 1, 0 273 593868.73 6734999.99 127.28 1, .............................
the first and second columns are the identifier (ex: 0 -273) of the location of the point x, y, z in the grid.
def point_grid_id(x,y,minx,maxy,distx,disty): """give id (row,col)""" col = int((x - minx)/distx) row = int((maxy - y)/disty) return (row, col)
the (minx, maxx) is the beginning of my grid with the size distx,disty . The number of tiles Id
tiles_id = [j for j in np.ndindex(ny, nx)]
I need to slice a 32 GB file in the n (= len(tiles_id)) number of files.
I can do this without sorting, but n times the file is n. For this reason, I want to find an effective separation method for the start form of the file (0,0) (= tiles_id[0]) . After that, I can only read split files once.