Scalable data flow building from Elasticsearch

Suppose I have a set of documents stored in an Elasticsearch index, so that each document has the following (simplified) form:

{
  "timestamp": N,
  "val": X
}

where Nis a long integer representing unix-timestamp, and Xis some float.

My goal is to build behavior valover time; in other words, get a graph where the x axis is time (stamp) and the y axis is val.

Medium-small number of documents

If the number of documents stored in the index is average, then with the help of pythonI could do the following. Scan documents using, for example, scan-helper , and create a list of JSON documents. Then convert the list to pandas.DataFrameand sort its string according to timestamp. Finally, I can, now easily, build data, as I described above. Here is a minimal example:

docs = scan(
            es, # instance of es-client
            index = 'myIndex',
            doc_type = 'myDocType')
docsList = []
for doc in docs:
    docsList.append(doc)
dfDocs = pandas.DataFrame(docsList)
dfDocsSorted = dfDocs.sort(columns='timestamp')
dfDocsSorted.plot(x='timestamp', y='val')

Here's what the output looks like for some sample dataset:

enter image description here

I think this is a fairly clean and accurate solution, given that the number of documents is limited.

A large number of documents

"" , , ? , , scan " ". , , ( ) .

, Elasticsearch? , ?

+4

All Articles