I'm currently trying to process several data feeds that I have no control over, where I use regular expressions in C # to extract information.
The source of the data feed is to extract the underlying row data from their database (for example, product name, price, etc.), and then format that data into English text strings. For each line, part of the text is repeated with static text, and some with dynamically generated text from the database.
eg
Panasonic TV with a FREE Blu-ray player
Sony TV with free DVD player + Box Office DVD
Kenwood Hi-Fi Unit with $ 20 Amazon MP3 Voucher
Thus, the format in this case is: PRODUCT with FREEGIFT.
PRODUCT and FREEGIFT are dynamic parts of each line, and the text “c” is static. Each channel has about 2,000 lines.
Creating a regular expression to extract dynamic parts is trivial.
The problem is that the marketing data feed controls continue to change the structure of the static text, usually once every two weeks, so this week I could:
New Panasonic TV and FREE Blu-ray Player if you order today
New Sony TV and free DVD player + Box Office DVD if you order today
New Kenwood Hi-Fi unit and $ 20 Amazon MP3 Voucher if you order today
And next week it will probably be something else, so I have to keep changing my regexes ...
How would you handle this?
? , ?
.