I want my Spark driver program written in Python to output some basic logging information. There are three ways to do this:
- Using the PySpark py4j bridge to access the Java log4j logging tool used by Spark.
log4jLogger = sc._jvm.org.apache.log4j LOGGER = log4jLogger.LogManager.getLogger(__name__) LOGGER.info("pyspark script logger initialized")
Just use standard console printing.
logging the Python standard library. This seems like the ideal and most Python approach, however, at least out of the box, it doesn't work, and logged messages don't seem to be recoverable. Of course, this can be configured to log into py4j-> log4j and / or the console.
So, the official programming guide ( https://spark.apache.org/docs/1.6.1/programming-guide.html ) does not mention logging at all. This is disappointing. There should be a standard documented recommended way to log in from the Spark driver program.
looked for this problem and found the following: how do I log in from my Python Spark script
But the content of this topic was unsatisfactory.
In particular, I have the following questions:
- Am I missing the standard way to log in from the PySpark driver program?
- Are there any pros / cons when logging into py4j-> log4j against the console?
logging apache-spark pyspark
clay
source share