I have some problems with Pandas and NLTK. I am new to programming, so excuse me if I ask questions that can be easily resolved. I have a csv file that has 3 columns (Id, Title, Body) and about 15,000 rows.
My goal is to remove stop words from this CSV file. The operation for lowercase and separated operations works well. But I can not find my mistake, why stop words are not deleted. What am I missing?
import pandas as pd from nltk.corpus import stopwords pd.read_csv("test10in.csv", encoding="utf-8") df = pd.read_csv("test10in.csv") df.columns = ['Id','Title','Body'] df['Title'] = df['Title'].str.lower().str.split() df['Body'] = df['Body'].str.lower().str.split() stop = stopwords.words('english') df['Title'].apply(lambda x: [item for item in x if item not in stop]) df['Body'].apply(lambda x: [item for item in x if item not in stop]) df.to_csv("test10out.csv")
source share