I am trying to read from a folder containing text documents using python 3. In particular, this is a modification of the LingSpam email spam dataset. I expect the code that I wrote to return all the names of text documents in 1893, however, instead, the code returns the first 420 file names. I do not understand why this dwells on the total number of file names. Any ideas?
if not os.path.exists('train'):
from urllib.request import urlretrieve
import tarfile
urlretrieve('http://cs.iit.edu/~culotta/cs429/lingspam.tgz', 'lingspam.tgz')
tar = tarfile.open('lingspam.tgz')
tar.extractall()
tar.close()
abc = []
for f in glob.glob("train/*.txt"):
print(f)
abc.append(f)
print(len(abc))
I tried to change the glob parameters, but still did not succeed.
Edit: Apparently, my code works for everyone but me. Here is my conclusion
source
share