Delete pause using NLTK and Pandas

Question

Delete pause using NLTK and Pandas

I have some problems with Pandas and NLTK. I am new to programming, so excuse me if I ask questions that can be easily resolved. I have a csv file that has 3 columns (Id, Title, Body) and about 15,000 rows.

My goal is to remove stop words from this CSV file. The operation for lowercase and separated operations works well. But I can not find my mistake, why stop words are not deleted. What am I missing?

import pandas as pd from nltk.corpus import stopwords pd.read_csv("test10in.csv", encoding="utf-8") df = pd.read_csv("test10in.csv") df.columns = ['Id','Title','Body'] df['Title'] = df['Title'].str.lower().str.split() df['Body'] = df['Body'].str.lower().str.split() stop = stopwords.words('english') df['Title'].apply(lambda x: [item for item in x if item not in stop]) df['Body'].apply(lambda x: [item for item in x if item not in stop]) df.to_csv("test10out.csv")

+6

python pandas csv nltk stop-words

slm Oct 20 '15 at 19:47

source share

2 answers

 df.replace(stop,regex=True,inplace=True)

-1

176coding Apr 05 '17 at 14:24

source share

Abtpst · Accepted Answer · 2015-10-20T20:15:58+0000

You are trying to make an inplace replacement. you have to do

  df['Title'] = df['Title'].apply(lambda x: [item for item in x if item not in stop]) df['Body'] = df['Body'].apply(lambda x: [item for item in x if item not in stop])

Delete pause using NLTK and Pandas

More articles: