Automated journalism

A web application called StatSheet received funding today (August 4, 2010)

http://techcrunch.com/2010/08/04/former-crunchies-finalist-statsheet-recieves-1-3-million-in-series-a/

They are engaged in "automated journalism" - they use computers to report on sports games that look like statistics.

http://www.guardian.co.uk/media/pda/2010/mar/30/digital-media-algorithms-reporting-journalism

Does anyone know what approaches / algorithms are used for this / how can this be reproduced?

+3
source share
1 answer

Details for such projects are a bit rare, but it looks like the Stats Monkey baseball adder consists of:

  • Statistical model . They build a model of how baseball games usually unfold, and most likely by looking at how certain variables (for example, runs, bats, etc.) change during the game or differ from what you expect to see in the game (for example, a team without a name scores more points than a team with a high advantage). How well this game fits (or does not fit), this model gives them an idea of ​​what might be interesting in this game (for example, key games or players).

  • Text generation . Given a library of pre-written narrative arcs (such as a back-and-forth game, a back-and-forth victory, etc.), they use “interesting information” from the game model to build a summary of the game. I'm not sure, but it looks like they are using the decision tree - driven by information from the model - to select one of these arcs.

  • Another glue . This is not mentioned in their record, but I would suggest that there are quite a few hard-coded rules that “stick together” the main narrative arcs into a single cohesive story.

Stats Monkey authors have done a significant amount of research in related fields, such as site generalization and automatic merging and content generation. Here are some articles that might be interesting:

+7
source

All Articles