TypeError: argument of type 'float' does not repeat

Question

TypeError: argument of type 'float' does not repeat

I am new to python and TensorFlow. I recently started to understand and execute TensorFlow examples and came across this: https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html

I got an error, TypeError: the argument of type "float" is not iterable , and I believe the problem is with the following line of code:

df_train [LABEL_COLUMN] = (df_train ['income_bracket'] apply (lambda x: '> 50K' in x)). astype (int)

(income_bracket is the label column of the census dataset, with '> 50K' being one of the possible label values and the other label is '= <50K'. The dataset is read in df_train. The explanation given in the documentation for the reason given above is the following: "Since the task is a binary classification problem, we will build a label column with the name" label "whose value is 1 if revenue exceeds 50K and 0 otherwise.")

If someone could explain to me what exactly is happening and how to fix it, it will be great. I tried using Python2.7 and Python3.4, and I don't think the problem is with the language version. Also, if someone knows about great tutorials for those new to TensorFlow and pandas, please share the links.

Full program:

import pandas as pd import urllib import tempfile import tensorflow as tf gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"]) race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"]) education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000) marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100) relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100) workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100) occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000) native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000) age = tf.contrib.layers.real_valued_column("age") age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) education_num = tf.contrib.layers.real_valued_column("education_num") capital_gain = tf.contrib.layers.real_valued_column("capital_gain") capital_loss = tf.contrib.layers.real_valued_column("capital_loss") hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week") wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))] deep_columns = [ tf.contrib.layers.embedding_column(workclass, dimension=8), tf.contrib.layers.embedding_column(education, dimension=8), tf.contrib.layers.embedding_column(marital_status, dimension=8), tf.contrib.layers.embedding_column(gender, dimension=8), tf.contrib.layers.embedding_column(relationship, dimension=8), tf.contrib.layers.embedding_column(race, dimension=8), tf.contrib.layers.embedding_column(native_country, dimension=8), tf.contrib.layers.embedding_column(occupation, dimension=8), age, education_num, capital_gain, capital_loss, hours_per_week] model_dir = tempfile.mkdtemp() m = tf.contrib.learn.DNNLinearCombinedClassifier( model_dir=model_dir, linear_feature_columns=wide_columns, dnn_feature_columns=deep_columns, dnn_hidden_units=[100, 50]) COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num", "marital_status", "occupation", "relationship", "race", "gender", "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"] LABEL_COLUMN = 'label' CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"] CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"] train_file = tempfile.NamedTemporaryFile() test_file = tempfile.NamedTemporaryFile() urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name) urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name) df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True) df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1) df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) def input_fn(df): continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS} categorical_cols = {k: tf.SparseTensor( indices=[[i, 0] for i in range(df[k].size)], values=df[k].values, shape=[df[k].size, 1]) for k in CATEGORICAL_COLUMNS} feature_cols = dict(continuous_cols.items() + categorical_cols.items()) label = tf.constant(df[LABEL_COLUMN].values) return feature_cols, label def train_input_fn(): return input_fn(df_train) def eval_input_fn(): return input_fn(df_test) m.fit(input_fn=train_input_fn, steps=200) results = m.evaluate(input_fn=eval_input_fn, steps=1) for key in sorted(results): print("%s: %s" % (key, results[key]))

thanks

PS: Full stack trace for error

 Traceback (most recent call last): File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module> df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780) File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda> df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) TypeError: argument of type 'float' is not iterable

+5

python pandas tensorflow

jaspreet kaur bassan Aug 30 '16 at 22:29

source share

2 answers

As you can see, when you check test.data , you will obviously see that the first row of data has "NAN" in the income_bracket field.

I also checked that this is the only line containing "NAN" by doing:

 ib = df_test ["income_bracket"] t = type('12') for idx,i in enumerate(ib): if(type(i) != t): print idx,type(i)

RESULT: 0 <type 'float'>

So you can just skip this line:

df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)

+1

Microos Oct 24 '16 at 16:28

source share

jaspreet kaur bassan · Accepted Answer · 2016-09-02T08:19:45+0000

The program works verbatim with the latest version of pandas, i.e. 0.18.1

TypeError: argument of type 'float' does not repeat

More articles: