Python (nltk) - UnicodeDecodeError: codec 'ascii' cannot decode bytes

I am new to NLTK. I get this error and I was looking for it for encoding / decoding and in particular UnicodeDecodeError, but this error seems to be specific to the NLTK source code.

Here's the error:

Traceback (most recent call last): File "A:\Python\Projects\Test\main.py", line 2, in <module> print(pos_tag(word_tokenize("John big idea isn't all that bad."))) File "A:\Python\Python\lib\site-packages\nltk\tag\__init__.py", line 100, in pos_tag tagger = load(_POS_TAGGER) File "A:\Python\Python\lib\site-packages\nltk\data.py", line 779, in load resource_val = pickle.load(opened_resource) UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0: ordinal not in range(128) 

How do I solve this error?

This is what causes the error:

 from nltk import pos_tag, word_tokenize print(pos_tag(word_tokenize("John big idea isn't all that bad."))) 
+8
python compiler-errors error-handling nltk
source share
4 answers

try this ... NLTK 3.0.1 with Python 2.7.x

 import io f = io.open(txtFile, 'rU', encoding='utf-8') 
+5
source share

I had the same problem with you. I am using Python 3.4 on Windows 7.

I installed "nltk-3.0.0.win32.exe" (from here ). But when I installed "nltk-3.0a4.win32.exe" (from here ), my problem with nltk.pos_tag was resolved. Check it out.

EDIT: If the second link does not work, you can look here .

+4
source share

Duplicate: NLTK 3 POS_TAG throws UnicodeDecodeError

In short: NLTK is not compatible with Python 3. You need to use NLTK 3, which currently sounds a bit experimental.

-2
source share

Try using the "textclean" module

 >>> pip install textclean 

Python code

 from textclean.textclean import textclean text = textclean.clean("John big idea isn't all that bad.") print pos_tag(word_tokenize(text)) 
-2
source share

All Articles