Zeppelin: What's the best way to query and work with SQL?

Question

Zeppelin: What's the best way to query and work with SQL?

I want to use Zeppelin for database queries. Currently, I see two possibilities, but none of them are sufficient for me:

Set up a database connection as an "interpreter", name it, for example. "sql1", use it in the paragraph, run the sql query and use the built-in useful graphing tools. All the tutorials and tips seem to relate to this, but then the documentation suddenly stops! But I want to do more with the data: I want to filter and process. If I want to build it again (with other restrictions), I have to execute the query again (which can take several seconds or minutes) (see My other Zeppelin SQL question : reusing data from a query without another interpreter or a new query )
Use spark with python, scala or similar. But the documentation, it seems, only loads the csv data, fits into the data framework and then accesses this data file using sql. First of all, there is no access to data with sql. What is the best way for me to access SQL data? Can I use an already configured "interpreter" (database connection)?

0

python sql mysql apache-zeppelin

tardis Jul 11 '17 at 8:48

source share

2 answers

You can use the Zeppelin API to retrieve paragraph data:

 val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text") df.first.getAs[String](0)

These Spark Scala lines will retrieve the SQL query used by paragprah. You can do the same to get the results that I think.

0

Thomas decaux Jul 17 '17 at 10:44

source share

tardis · Accepted Answer · 2017-07-12T11:52:06+0000

I can’t find a solution for 1. But I made a short solution for 2. which works in zeppelin using python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure these packages are installed, everything they are in Debian 9). I wonder why I did not find such a solution before ...

%python from sqlalchemy import create_engine import pandas as pd sql = "select col1, col2 from table limit 10" df = pd.read_sql(sql, create_engine('mysql+mysqldb://user: password@host :3306/database').connect()) z.show(df)

If you want to connect to another database, such as db2 or oracle, you need to use other python packages and edit the first part of the create_engine line.

Zeppelin: What's the best way to query and work with SQL?

More articles: