I am working with an enron email dataset and I am trying to delete email addresses that do not have "@ enron.com" (i.e. I would like to have only email emails). When I tried to delete these addresses without @ enron.com, some letters were simply missed for some reason. Below is a small graph where the vertices are the email address. This is the gml format:
Creator "igraph version 0.7 Sun Mar 29 20:15:45 2015" Version 1 graph [ directed 1 node [ id 0 label " csutter@enron.com " ] node [ id 1 label " steve_williams@eogresources.com " ] node [ id 2 label " kutner.stephen@enron.com " ] node [ id 3 label " igsinc@ix.netcom " ] node [ id 4 label " dbn@felesky.com " ] node [ id 5 label " cheryltd@tbardranch.com " ] node [ id 6 label " slover.eric@enron.com " ] node [ id 7 label " alkeister@yahoo.com " ] node [ id 8 label " econnors@mail.mainland.cc.tx.us " ] node [ id 9 label " jafry@hotmail.com " ] edge [ source 5 target 5 weight 1 ] ]
My code is:
G = ig.read("enron_email_filtered.gml") for v in G.vs: print v['label'] if '@enron.com' not in v['label']: G.delete_vertices(v.index) print 'Deleted'
In this dataset, you must delete 7 emails. However, based on the code above, only 5 emails are deleted.
python graph igraph graph-modeling-language
user1894963
source share