Apache Spark 2.x Troubleshooting Guide
https://www.slideshare.net/jcmia1/a-beginners-guide-on-troubleshooting-spark-applications
https://www.slideshare.net/jcmia1/apache-spark-20-tuning-guide
Check your cluster UI to ensure that workers are registered and have sufficient resources
可能是你指定的 --executor-memory
超過了 worker 的 memory。
你可以在 Spark Master UI http://localhost:8080/ 看到各個 worker 總共有多少 memory 可以用。如果每台 worker 可以用的 memory 容量不同,Spark 就只會選擇那些 memory 大於 --executor-memory
的 workers。
SparkContext was shut down
可能是 executor 的記憶體不夠,導致 Out Of Memory (OOM) 了。
Container exited with a non-zero exit code 56 (or some other numbers)
可能是 executor 的記憶體不夠,導致 Out Of Memory (OOM) 了。
ref:
http://stackoverflow.com/questions/39038460/understanding-spark-container-failure
Exception in thread "main" java.lang.StackOverflowError
解決辦法:
ref:
https://stackoverflow.com/questions/31484460/spark-gives-a-stackoverflowerror-when-training-using-als
https://stackoverflow.com/questions/35127720/what-is-the-difference-between-spark-checkpoint-and-persist-to-a-disk
Randomness of hash of string should be disabled via PYTHONHASHSEED
解決辦法:
ref:
https://issues.apache.org/jira/browse/SPARK-13330
It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
因為 spark.sparkContext
只能在 driver program 裡存取,不能被 worker 存取(例如那些丟給 RDD 執行的 lambda function 或是 UDF 就是在 worker 上執行的)。
ref:
https://spark.apache.org/docs/latest/rdd-programming-guide.html#passing-functions-to-spark
https://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/
Spark automatically creates closures:
- for functions that run on RDDs at workers,
- and for any global variables that are used by those workers.
One closure is send per worker for every task. Closures are one way from the driver to the worker.
ref:
https://gerardnico.com/wiki/spark/closure
Unable to find encoder for type stored in a Dataset
解決辦法:
Task not serializable
通常是你在 closure functions 裡使用了 driver program 裡的某個 object,因為 Spark 會自動 serialize 那個被引用的 object 一起丟給 worker node 執行,所以如果那個 object 或是 class 沒辦法被 serialize,就會出現這個錯誤。
ref:
https://www.safaribooksonline.com/library/view/spark-the-definitive/9781491912201/ch04.html#user-defined-functions
http://www.puroguramingu.com/2016/02/26/spark-dos-donts.html
https://stackoverflow.com/questions/36176011/spark-sql-udf-task-not-serialisable
https://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
https://mp.weixin.qq.com/s/BT6sXZlHcufAFLgTONCHsg
如果你只有在 Databricks Notebook 裡遇到這個錯誤,因為 Notebook 的運作機制跟一般的 Spark application 稍微有點不同,你可以試試 package cell。
ref:
https://docs.databricks.com/user-guide/notebooks/package-cells.html
java.lang.IllegalStateException: Cannot find any build directories.
可能的原因是沒有設置 SPARK_HOME
或是你的 launch script 沒有讀到該環境變數。