How to break the color codes used by mIRC users?

Question

How to break the color codes used by mIRC users?

I am writing an IRC bot in Python using irclib, and I am trying to register messages on certain channels. The problem is that some mIRC users and some bots write using color codes .
Any idea on how I can remove these parts and leave only a clear ascii text message?

+6

python irc

daniels Jun 09 '09 at 14:55

source share

7 answers

Secondary and following sentences are defective because they look for numbers after any character, but not after the color code character.

I improved and combined all the posts with the following consequences:

we remove the inverse character
delete color codes without leaving numbers in the text.

Decision:

regex = re.compile("\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

+7

frederik Aug 17 '10 at 15:21

source share

 p = re.compile("\x03\d+(?:,\d+)?") p.sub('', text)

+1

chaos Jun 09 '09 at 15:02

source share

As I found this question helpful, I decided that I had contributed.

I added a few words to the regex

 regex = re.compile("\x1f|\x02|\x03|\x16|\x0f(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

\x16 removed the "reverse" character. \x0f gets rid of another bold character.

+1

Xorlev Mar 16 '10 at 1:12

source share

AutoDl-irssi had a very good one written in perl, here it is in python:

def stripMircColorCodes(line) : line = re.sub("\x03\d\d?,\d\d?","",line) line = re.sub("\x03\d\d?","",line) line = re.sub("[\x01-\x1F]","",line) return line

+1

sparks Mar 26 '15 at 13:30

source share

I even had to add ' \x0f ', no matter what it has

 regex = re.compile("\x0f|\x1f|\x02|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE) regex.sub('', msg)

0

Loeschme Oct 20 '09 at 1:12

source share

I know that I wrote that I want to use regex because it can be cleaner, I created a non-regex solution that works fine.

 def colourstrip(data): find = data.find('\x03') while find > -1: done = False data = data[0:find] + data[find+1:] if len(data) <= find+1: done = True try: assert int(data[find]) data = data[0:find] + data[find+1:] except: done = True try: assert not done assert int(data[find]) data = data[0:find] + data[find+1:] except: if not done and (data[find] != ','): done = True if (len(data) > find+1) and (data[find] == ','): try: assert not done assert int(data[find+1]) data = data[0:find] + data[find+1:] data = data[0:find] + data[find+1:] except: done = True try: assert not done assert int(data[find]) data = data[0:find] + data[find+1:] except: pass find = data.find('\x03') data = data.replace('\x1d','') data = data.replace('\x1f','') data = data.replace('\x16','') data = data.replace('\x0f','') return data datastring = '\x0312,4This is coolour \x032,4This is too\x03' print(colourstrip(datastring))

Thanks for helping everyone.

0

baudsmoke Apr 15 '15 at 8:19

source share

Smerity · Accepted Answer · 2009-06-09T15:17:27+0000

Regular expressions are my purest bet, in my opinion. If you have not used them before, this one is a good resource. For more information on the Python regex library, go here .

import re regex = re.compile("\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

The regular expression looks for ^ C (this is \ x03 in ASCII , you can confirm by running chr (3) with the line command), and then optionally look for one or two characters [0-9], then optionally come a comma, and then another or two characters [0-9].

(?: ...) says to forget about keeping what was found in brackets (since we don’t need to do it) ,? means a match of 0 or 1 and {n, m} means a match of n with m of the previous grouping. Finally, \ d means match [0-9].

The rest can be decoded using the links to which I refer above.

 >>> regex.sub("", "blabla \x035,12to be colored text and background\x03 blabla") 'blabla to be colored text and background blabla'

Decision

chaos' is similar, but it may end up consuming more than two numbers, and also won’t remove any free C characters that may be floating around (like the one that closes the color command)

How to break the color codes used by mIRC users?

More articles: