There are several problems with your task.
Hadoop does not process images as you saw. But you can export all file names and paths as a text file and call some Map function on it. Therefore, the call to ImageMagick in the files on the local disk should not be large.
But how do you deal with the location of the data?
You cannot run ImageMagick in files in HDFS (only the Java API and FUSE mount are unstable), and you cannot predict task scheduling. So, for example, a map task can be assigned to a host where the image does not exist.
Of course, you can just use only one machine and one task. But then you have no improvement. Then you will just have overhead.
There is also a memory issue when exiting a Java task. I made a blog post about this [1].
and should be able to be done using Bash
This is the next problem, you will need to write a map task at least. You need a ProcessBuilder
call ImageMagick with a specific path and function.
I can’t find anything about this work with Hadoop: starting with a set of files that perform the same action for each of the files, and then writes a new file as its own file.
Guess why ?: D Hadoop is not suitable for this task.
Basically, I would recommend manually splitting your images into multiple hosts in EC2 and running a bash script on it. This is less stress and faster. To perform parallel configuration on the same host, separate the files in several folders for each core and run bash scripts with it. This should use your machine well enough, and better than Hadoop ever could.
[1] http://codingwiththomas.blogspot.com/2011/07/dealing-with-outofmemoryerror-in-hadoop.html