Creating adjacency matrix in python from csv dataset

Question

Creating adjacency matrix in python from csv dataset

I have data that comes in a format like this:

eventid mnbr 20 1 26 1 12 2 14 2 15 3 14 3 10 3

eventid is an event in which a participant was present, the data is presented in a panel, as you can see that each participant attends several events, and several members can attend the same event. My goal is to create an adjacency matrix that shows:

  mnbr 1 2 3 1 1 0 0 2 0 1 1 3 0 1 1

where there is 1 when two members attend the same event. I was successfully able to read the columns of the csv file in 2 separate 1D numpy arrays. However, in the future I am not sure how to proceed. What is the best way to create a matrix using column 2 and how later can I use column 1 to populate the values? I understand that I did not post any code and did not expect any solutions in this regard, but I would be very grateful for the idea of how to approach the problem effectively. I have about 3 million observations, so creating too many external variables will be problematic. Thanks in advance. I received a notification that my question is a potential duplicate, however my problem was in analyzing the data and not in creating an adjacency matrix.

+5

python numpy adjacency-matrix csv

thyde Apr 22 '15 at 4:48

source share

1 answer

Arthur vaïsse · Accepted Answer · 2015-04-22T06:37:40+0000

Here is the solution. It does not give you the directly requested adjacency matrix, but it gives you what you need to create it yourself.

 #assume you stored every line of your input as a tuples (eventid, mnbr). observations = [(20, 1), (26, 1), (12, 2), (14, 2), (15,3 ), (14, 3), (10, 3)] #then creates an event link dictionary. ie something that link every event to all its mnbrs eventLinks = {} for (eventid, mnbr) in observations : #If this event have never been encoutered then create a new entry in links if not eventid in eventLinks.keys(): eventLinks[eventid] = [] eventLinks[eventid].append(mnbr) #collect the mnbrs mnbrs = set([mnbr for (eventid, mnbr) in observations]) #create a member link dictionary. This one link a mnbr to other mnbr linked to it. mnbrLinks = { mnbr : set() for mnbr in mnbrs } for mnbrList in eventLinks.values() : #add for each mnbr all the mnbr implied in the same event. for mnbr in mnbrList: mnbrLinks[mnbr] = mnbrLinks[mnbr].union(set(mnbrList)) print(mnbrLinks)

Executing this code gives the following result:

 {1: {1}, 2: {2, 3}, 3: {2, 3}}

This is a dictionary in which each mnbr has an associated mnbrs adjacency mnbrs . This is actually an adjacency list, which is a compressed adjacency matrix. You can expand it and build the matrix that you requested using dictionary keys and values in the form of row and column indexes.

Hope this helps. Arthur.

EDIT: I introduced the adjacency list approach so that you can implement your own adjacency matrix construction. But you should consider really using this data structure if your data is sparse. See http://en.wikipedia.org/wiki/Adjacency_list

EDIT 2: add code to convert adjacencyList to small smart adjacencyMatrix

 adjacencyList = {1: {1}, 2: {2, 3}, 3: {2, 3}} class AdjacencyMatrix(): def __init__(self, adjacencyList, label = ""): """ Instanciation method of the class. Create an adjacency matrix from an adjacencyList. It is supposed that graph vertices are labeled with numbers from 1 to n. """ self.matrix = [] self.label = label #create an empty matrix for i in range(len(adjacencyList.keys())): self.matrix.append( [0]*(len(adjacencyList.keys())) ) for key in adjacencyList.keys(): for value in adjacencyList[key]: self[key-1][value-1] = 1 def __str__(self): # return self.__repr__() is another possibility that just print the list of list # see python doc about difference between __str__ and __repr__ #label first line string = self.label + "\t" for i in range(len(self.matrix)): string += str(i+1) + "\t" string += "\n" #for each matrix line : for row in range(len(self.matrix)): string += str(row+1) + "\t" for column in range(len(self.matrix)): string += str(self[row][column]) + "\t" string += "\n" return string def __repr__(self): return str(self.matrix) def __getitem__(self, index): """ Allow to access matrix element using matrix[index][index] syntax """ return self.matrix.__getitem__(index) def __setitem__(self, index, item): """ Allow to set matrix element using matrix[index][index] = value syntax """ return self.matrix.__setitem__(index, item) def areAdjacent(self, i, j): return self[i-1][j-1] == 1 m = AdjacencyMatrix(adjacencyList, label="mbr") print(m) print("m.areAdjacent(1,2) :",m.areAdjacent(1,2)) print("m.areAdjacent(2,3) :",m.areAdjacent(2,3))

This code gives the following result:

 mbr 1 2 3 1 1 0 0 2 0 1 1 3 0 1 1 m.areAdjacent(1,2) : False m.areAdjacent(2,3) : True

Creating adjacency matrix in python from csv dataset

More articles: