How to get the iPython inbuild magic command to work in the Pyspark core for Jupyter laptops?

I am using the PySpark kernel installed through the Apache Toree in a Jupyter Notebook using Anaconda v4.0.0 ( Python 2.7.11 ). After retrieving the table from Hive use matplotlib/panda to plot in the Jupyter block, following the guide below:

 %matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt # Set some Pandas options pd.set_option('display.notebook_repr_html', False) pd.set_option('display.max_columns', 20) pd.set_option('display.max_rows', 25) normals = pd.Series(np.random.normal(size=10)) normals.plot() 

I got stuck in the first link when trying to use% matplotlib inline which shows

 Name: Error parsing magics! Message: Magics [matplotlib] do not exist! StackTrace: 

Looking at Toree Magic and MagicManager , I realized that %matplotlib calls MagicManager instead of the iPython built-in magic command.

Is it possible for Apache Toree - PySpark use the iPython in-build magic command?

+5
source share
1 answer

I made a workaround hack for PySpark and the magic team to work, instead of installing the Toree PySpark kernel I use PySpark directly on Jupyter Notebook .

  • Download and install Anaconda2 4.0.0

  • Download Spark 1.6.0 for Hadoop 2.6

  • Add ~/.bashrc with the following commands and enter source ~/.bashrc to update the environment variables

    # added to trigger sparks
    export PATH = "{your_spark_dir} spark / sbin: $ PATH"
    export PATH = "{your_spark_dir} spark / bin: $ PATH"

    # Added to launch a spark application in cluster mode
    export JAVA_HOME = / usr / lib / jvm / java-8-openjdk-amd64 / jre

    # The next 2 lines are optional, only Spark Cluster is needed; export HADOOP_CONF_DIR = {your_hadoop_conf} / hadoop-conf
    export YARN_CONF_DIR = {your_hadoop_conf} / hadoop-conf

    # Added by Anaconda2 4.0.0 installer
    export PATH = "{your_anaconda_dir} / Anaconda / bin: $ PATH"

    # Added to run pyspark on jupyter laptop
    export PYSPARK_DRIVER_PYTHON = {your_anaconda_dir} / Anaconda / bin / jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS = "notebook --NotebookApp.open_browser = False --NotebookApp.ip = '0.0.0.0' --NotebookApp.port = 8888"
    export PYSPARK_PYTHON = {your_anaconda_dir} / Anaconda / bin / python

Jupyter Laptop Launch

  • pyspark --master=yarn --deploy-mode=client to start the PySpark in cluster mode laptop

  • Open a browser and enter IP_ADDRESS_OF_COMPUTER:8888

Renouncement
This is just a workaround, not a real way to fix the problem. Please let me know if you find a way for Toree PySpark ipython inbuild magic command to work. Magic team like %matplotlib notebook

+1
source

All Articles