Streaming API with languages

Is there anyway I can only get English tweets using the Twitter API Twitter? It seems that using the results of the “sample” or “filter” accounts for about 60-70 percent of non-English tweets.

thanks

Joel

+6
twitter
source share
5 answers

I did not find a good solution for this, I solved it using the following:

1) filter by lang attribute equal to "en".

2) I found that several non-English languages ​​are still in English tweets. So, I downloaded the Spanish, Dutch and Indonesian word lists and checked the number of non-English words in tweets. More than 1, and I cast it as non-English.

3) I think I need to filter Portuguese as well, I need to research this.

+7
source share

Filtering only English-language messages from the twitter stream is an active area of ​​research. You can use the built-in language identification system to locally process the stream and select only messages in English. One such system is langid.py . Full disclosure, I am the author of langid.py.

Another system I know is ldig from Nakatani Shuyo . I have not yet had the opportunity to experiment with him, but it is specifically designed to identify Twitter message names.

+6
source share

Twitter will release a new (or updated) attribute soon for this purpose only! See their blog post, Introducing New Metadata for Tweets

The new lang attribute indicates the language in which the Tweet text was written, identified by Twitter's machine language detection algorithms.

At the time of writing the lang and language attribute, the parameter has not yet appeared, however, check the API Change Calendar to see when they plan to release it (currently “2013” ​​is just indicated).

Update 3/30/2013:

The lang attribute was added to the Streaming API on March 26, 2013. In addition, it was also available in the REST API on March 6, 2013.

+3
source share

For use in the Twitter streaming API, the language is now a request parameter:

https://dev.twitter.com/docs/streaming-apis/parameters#language

So, for English, you add 'language = en' to the query string.

+1
source share

Twitter just finished! Cf calendar API:

https://dev.twitter.com/calendar

On March 26, 2013, the lang attribute and the lang language parameter are displayed in a streaming stream in Blog Post.

Rocks twitter API !!

0
source share

All Articles