Analyze whatsApp Session Log

I am trying to write a parser for WhatsApp conversation log. The minimum log file at the end of the question.

There are two types of messages in this log: regular, where the syntax is

date time: Name: Message

As you can see, it Messagecan go to a new line, and the name can contain :.

The second type of message is the "event" message, which can be of the following types:

date time: Name joined
date time: Name left
date time: Name was removed
date time: Name changed the subject to "GroupName"
date time: Name changed the group icon

I tried to write some regular expression, but there are several difficulties that I have encountered: how to process multi-line messages, how to parse a field Name(since splitting :does not work), how to create a regular expression that only messages from senders that are currently located are recognized in a group, and finally, how to parse special messages (for example, parsing a search for the one attached to the last word is not a good idea).

How can I parse such a log file and move everything to a dictionary?

, , , , - dict: , "" ( , ..) "", .

>>>datab[Sender1]['Events']
>>>[('Joined',data1,time1),('Left',data2,time2]

>>>datab[Sender2]['Messages']
>>>[(data1,time1,Message1),(data2,time2,Message2)]

, !

29/03/14 15:48:05: John Smith changed the subject to "Test"

29/03/14 16:10:39: John Smith joined

29/03/14 16:10:40: Person:2 joined

29/03/14 16:10:40: John Smith: Hello!

29/03/14 16:11:40: Person:2: some random words,

29/03/14 16:12:40: Person3 joined

29/03/14 16:13:40: John Smith: Hello!Test message with newline
Another line of the same message
Another line.

29/03/14 16:14:43: Person:2: Test message using as last word joined

29/03/14 16:15:57: Person3 left

29/03/14 16:17:16: Person3 joined

29/03/14 16:18:21: Person:2 changed the group icon

29/03/14 16:19:16: Person3 was removed 

29/03/14 16:20:43: Person:2: Test message using as last word left
+1
2

:

(?P<datetime>\d{2}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}): (?P<name>\w+(?::\s*\w+)*|[\w\s]+?)(?:\s+(?P<action>joined|left|was removed|changed the (?:subject to "\w+"|group icon))|:\s(?P<message>(?:.+|\n(?!\n))+))

, . , lookahead \n

+2

.

@ Casimir - 2014 . Whatsapp . , (, , ..).

(?<datetime>\d{1,2}\/\d{1,2}\/\d{1,4}, \d{1,2}:\d{1,2}( (?i)[ap]m)*) - (?<name>.*(?::\s*\w+)*|[\w\s]+?)(?:\s+(?<action>joined|left|was removed|changed the (?:subject to "\w+"|group icon))|:\s(?<message>(?:.+|\n(?!\d{1,2}\/\d{1,2}\/\d{1,4}, \d{1,2}:\d{1,2}( (?i)[ap]m)*))+))
0

All Articles