Hadoop and Stata

Does anyone have experience using Stata and Hadoop? Stata 13 now has a Java Plugin API , so I think it should be simple to get them to play well.

I am particularly interested in being able to analyze weblog data in order to get it in a form suitable for statistical analysis.

This question arose at the beginning on Statalist , but there was no answer, so I thought I would try it here, where the audience is more likely to have experience with this technology.

+7
hadoop hive apache-pig stata
source share
2 answers

Demetrius

I think it would be easier to do something like this using ELK Stack ( http://www.elastic.co ). Logstash (middle layer) has several parsers / tokenizers / analyzes built on the Apache Lucene engine for cleaning and formatting log data, and can push the received data into elasticsearch, which provides an HTTP API that you can easily twist to get the results (for example, use insheetjson and pass the HTTP GET request as a url and it should be imported into Stata without much trouble).

I am trying to combine a program to use the Jackson JSON library to create more robust JSON I / O capabilities from within Stata and definitely not mind trying to work with others to do this.

Hope this helps, Billy

+1
source share

I'll take an (un?) Educated blow on this. From the point of view of the Java API, the caller seems to regard Stata as essentially a data store. If this is the case, then I would suggest that Stata will fit into the world of haops as a database, and it will be accessed by its own InputFormat and OutputFormat. In your particular case, I would suggest that you write StataOutputFormat, which your reducer will use to write the analyzed data. The only downside, apparently, is your comment links, that Stata applications tend to bind I / O, so I don't know how using howoop will really help you with

  • you still have to record all this data, and
  • that the record will be associated with I / O, regardless of whether you use the hadoop command or not.
0
source share

All Articles