This program:
from __future__ import print_function
import re
tests = (
'(c) 2012 DC Comics',
'DC Comics. 2012',
'DC Comics, (c) 2012.',
'DC Comics, Copyright 2012',
'(c) 2012 10 DC Comics',
'10 DC Comics. 2012',
'10 DC Comics , (c) 2012.',
'10 DC Comics, Copyright 2012',
'Warner Bros, 2011',
'Stanford and Sons, Ltd. Inc. (C) 2011. All Rights Reserved.',
)
for input in tests:
print("<", input)
output = re.sub(r'''
(?P<lead> (?: \S .*? \S )?? )
[\s.,]*
(?: (?: \( c \) | copyright ) \s+ )?
(?P<year> (?:19|20)\d\d )
[\s.,]?
''', r"\g<year>. \g<lead>", input, 1, re.I + re.X)
print(">", output, "\n")
when running under Python 2.7 or 3.2, it produces this output:
< (c) 2012 DC Comics
> 2012. DC Comics
< DC Comics. 2012
> 2012. DC Comics
< DC Comics, (c) 2012.
> 2012. DC Comics
< DC Comics, Copyright 2012
> 2012. DC Comics
< (c) 2012 10 DC Comics
> 2012. 10 DC Comics
< 10 DC Comics. 2012
> 2012. 10 DC Comics
< 10 DC Comics , (c) 2012.
> 2012. 10 DC Comics
< 10 DC Comics, Copyright 2012
> 2012. 10 DC Comics
< Warner Bros, 2011
> 2011. Warner Bros
< Stanford and Sons, Ltd. Inc. (C) 2011. All Rights Reserved.
> 2011. Stanford and Sons, Ltd. Inc All Rights Reserved.
Most likely, this is what you were looking for.