I am trying to use a freebase data dump, but it pops up that I have some problems reading files with python. It seems that my program could not read all the lines.
def test2(): count=0 for line in open(FREEBASE_TOPIC): count+=1 return count def test3(): count=0 for line in open(FREEBASE_QUAD): count+=1 return count if __name__ == "__main__": print "FREEBASE TOPIC - NR LINES:",test2() print "FREEBASE QUAD - NR LINES:",test3()
Results in this:
FREEBASE TOPIC - ITR TIME: 1.21000003815 FREEBASE TOPIC - NR LINES: 1643010 FREEBASE QUAD - ITER TIME: 0.797000169754 FREEBASE QUAD - NR LINES: 3155131
It could be all. It seems that several lines contain the whole free base. And I donβt see how you can iterate over one 33 GB file and another 5 GB file in 2 seconds.
What's wrong? I download files again if something went wrong during the download process, but I need decades with my connections, so I ask if there is any time. The file size is correct and I printed some lines and they look right.
source share