I am getting this error while running AWS Glue job using 40 workers and processing 40GB data
Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device
How can i optimize my job to avoid such error on pyspark
Here is the pic of metrics glue_metrics
Job Execution: Active Executors, Completed Stages & Maximum Needed Executors
andData Shuffle Across Executors
. I found no cloudwatch metrics