How to disable INFO logging in Spark?

Question

How to disable INFO logging in Spark?

I installed Spark using the AWS EC2 manual, and I can run the program using the bin/pyspark script to get to the spark hint and also successfully complete the quick start task.

However, I cannot understand for life how to stop all the detailed INFO entries after each command.

I used almost all the possible scripts in the code below (commenting, setting to OFF) in my log4j.properties file in the conf folder, where I run the application, and also on each node and does nothing. I still get logging of INFO statements after each statement is executed.

I am very confused about how this should work.

 #Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Here is my full classpath when I use SPARK_PRINT_LAUNCH_COMMAND :

Spark command: / Library / Java / JavaVirtualMachines / jdk 1.8.0_05.jdk / Contents / Home / bin / java -cp: /root/spark-1.0.1-bin-hadoop2/conf: /root/spark-1.0.1 -bin-hadoop2 / conf: /root/spark-1.0.1-bin-hadoop2/lib/spark- assembly-1.0.1-hadoop2.2.0.jar: /root/spark-1.0.1-bin-hadoop2/lib /datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin- hadoop2 / Lib / DataNucleus kernel-3.2.2.jar: /root/spark-1.0.1-bin-hadoop2/ lib / datanucleus-rdbms-3.2.1.jar -XX: MaxPermSize = 128m -Djava.library.path = -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -class org.apache.spark.repl. Main

spark-env.sh content:

 #!/usr/bin/env bash # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/ # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_CLASSPATH, default classpath entries to append # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client mode # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2) # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Worker (eg 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Master (eg 1000M, 2G) (Default: 512 Mb) # - SPARK_YARN_APP_NAME, The name of your application (Default: Spark) # - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: 'default') # - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job. # - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job. # Options for the daemons used in the standalone deploy mode: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (eg "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (eg 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (eg "-Dx=y") # - SPARK_HISTORY_OPTS, to set config properties only for the history server (eg "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (eg "-Dx=y") # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"

+122

python scala hadoop yarn apache-spark pyspark

horatio1701d Aug 07 '14 at 22:48

source share

15 answers

Inspired by pyspark / tests.py, I did

 def quiet_logs( sc ): logger = sc._jvm.org.apache.log4j logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR ) logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

Calling this right after creating the shortened stderr SparkContext lines for my test from 2647 to 163. However, creating the SparkContext itself keeps 163 logs, until

 15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

and its unclear how to configure this programmatically.

+48

FDS Aug 25 '15 at 15:46

source share

Modify the conf / log4j.properties file and change the following line:

  log4j.rootCategory=INFO, console

to

  log4j.rootCategory=ERROR, console

Another approach would be to:

Fire fighter and enter the following:

 import org.apache.log4j.Logger import org.apache.log4j.Level Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("akka").setLevel(Level.OFF)

After that you will not see any magazines.

+34

AkhlD Jan 07 '15 at 8:44

source share

 >>> log4j = sc._jvm.org.apache.log4j >>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

+32

wannik Dec 28 '15 at 5:09

source share

For PySpark, you can also set the log level in your scripts using sc.setLogLevel("FATAL") . From the docs :

Manage our logLevel. This overrides any user log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

+24

Galen Long Apr 26 '16 at 23:52

source share

In Spark 2.0, you can also configure it dynamically for your application using setLogLevel :

  from pyspark.sql import SparkSession spark = SparkSession.builder.\ master('local').\ appName('foo').\ getOrCreate() spark.sparkContext.setLogLevel('WARN')

The pyspark console will already have a default spark session available.

+22

mdh Nov 09 '16 at 10:07

source share

This may be due to how Spark calculates its class path. My hunch is that the Hadoop log4j.properties file appears ahead of Spark in the classpath, not allowing your changes to take effect.

If you run

 SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell

then Spark will print the full class path used to start the shell; in my case, I see

 Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

where /root/ephemeral-hdfs/conf is at the head of the class path.

I opened the problem [SPARK-2913] to fix this in the next version (I will have a patch soon).

At the same time, here are some workarounds:

Add export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf" to spark-env.sh .
Delete (or rename) /root/ephemeral-hdfs/conf/log4j.properties .

+13

Josh Rosen Aug 08 '14 at 0:11

source share

You can use setLogLevel

 val spark = SparkSession .builder() .config("spark.master", "local[1]") .appName("TestLog") .getOrCreate() spark.sparkContext.setLogLevel("WARN")

+10

Unmesha SreeVeni Oct 26 '18 at 11:55

source share

Spark 1.6.2:

 log4j = sc._jvm.org.apache.log4j log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

Spark 2.x:

 spark.sparkContext.setLogLevel('WARN')

(spark is SparkSession)

Alternatively, old methods,

Rename conf/log4j.properties.template to conf/log4j.properties in Spark Dir.

In log4j.properties change log4j.rootCategory=INFO, console to log4j.rootCategory=WARN, console

Different log levels are available:

OFF (most specifically, without registration)
FATAL (most specifically, little data)
ERROR - Log only in case of errors
WARNING - login only in case of warnings or errors
INFO (default)
DEBUG - record details of steps (and all the logs above)
TRACE (least specific, lots of data)
ALL (least specific, all data)

+7

Ani Menon Mar 18 '18 at 10:51

source share

I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1.2.1.

 # Step 1. Change config file on the master node nano /root/ephemeral-hdfs/conf/log4j.properties # Before hadoop.root.logger=INFO,console # After hadoop.root.logger=WARN,console # Step 2. Replicate this change to slaves ~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/

+5

oleksii Mar 04 '15 at 15:49

source share

How do i do this:

in the place where I run spark-submit script do

 $ cp /etc/spark/conf/log4j.properties . $ nano log4j.properties

change INFO to any level of logging and then run spark-submit

+1

user3827333 Apr 28 '16 at 22:24

source share

Just add the following parameter to your spark-submit command

 --conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"

This temporarily overrides the system value for this job only. Check the exact property name (log4jspark.root.logger here) from the log4j.properties file.

Hope this helps, cheers!

+1

Gaurav Adurkar Jul 27 '18 at 8:44

source share

The following is a snippet of code for Scala users:

Option 1:

Below you can add a fragment at the file level.

 import org.apache.log4j.{Level, Logger} Logger.getLogger("org").setLevel(Level.WARN)

Option 2:

Note: which will be applicable to all applications using the spark session.

 import org.apache.spark.sql.SparkSession private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate() spark.sparkContext.setLogLevel("WARN")

Option 3:

Note: this configuration should be added to your log4j.properties .. (maybe like /etc/spark/conf/log4j.properties (where there is a spark installation) or in your project folder log4j.properties), since you change to the module level. This will be applicable for all applications.

 log4j.rootCategory=ERROR, console

IMHO, option 1 is a wise way, since it can be disabled at the file level.

+1

Ram Ghadiyaram May 10 '19 at 23:19

source share

I want to continue to use logging (a logger for Python), you can try to split configurations for your application and for Spark:

 LoggerManager() logger = logging.getLogger(__name__) loggerSpark = logging.getLogger('py4j') loggerSpark.setLevel('WARNING')

0

santifinland May 03 '16 at 2:43 pm

source share

Software method

 spark.sparkContext.setLogLevel("WARN")

Available Options

 ERROR WARN INFO

0

Achyuth Jul 08 '19 at 16:14

source share

poiuytrez · Accepted Answer · 2014-09-30 14:36

Just run this command in the spark directory:

 cp conf/log4j.properties.template conf/log4j.properties

Edit log4j.properties:

 # Set everything to be logged to the console log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Replace in the first line:

 log4j.rootCategory=INFO, console

by:

 log4j.rootCategory=WARN, console

Save and restart the shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.

How to disable INFO logging in Spark?

More articles: