Image processing using hadoop

I'm new to hadoop, and I'm going to develop an application that processes multiple images using hadoop and will show users the real-time results while they are being calculated. The main approach is to distribute the executable file and set of images and collect the results.

Can I get results interactively during the calculation process?

Are there any other alternatives besides streaming data for such use?

How can I download executable files? I cannot find any examples other than submitting it to stdin.

+6
source share
3 answers

For image processing on Hadoop, the best way to organize calculations is:

  • Saving images in a sequence file. The name of the key image or its identifier, the binary data Value - image. Thus, you will have one file with all the images that you need to process. If you have images added dynamically to your system, consider combining them in daily sequence files. I don’t think you should use compression for this sequence file since general compression algorithms do not work with images
  • Process images. Here you have several choices. First, use Hadoop MapReduce and write a program in Java, as with Java, you can read the sequence file and directly get a β€œValue” from it at each step of the map, where β€œvalue” is a binary file. Given this, you can run any processing logic. The second option is Hadoop Streaming. It has the limitation that all data goes to the stdin of your application, and the result is read from stdout. But you can overcome this by writing your own InputFormat in Java, which will serialize the binary image data from the sequence file as a Base64 string and pass it to your shared application. A third option would be to use Spark to process this data, but again, you are limited in programming languages: Scala, Java, or Python.
  • Hadoop was designed to simplify the batch processing of large amounts of data. The spark is very important - it is a batch tool. This means that you cannot get the result before all the data has been processed. Spark Streaming is a slightly different case - there you work with micropackets for 1-10 seconds and process each of them separately, so, in general, you can make it work for your business.

I do not know your complete case, but one of the possible solutions is to use Kafka + Spark Streaming. Your application should place the images in binary format in the Kafka queue while Spark will consume and process them in micropackages on the cluster, updating users through some third component (at least by placing the image processing status in Kafka for another application to process it)

But in general, the information you provide is not complete to recommend a good architecture for your particular case.

+2
source

As 0x0FFF says in another answer, the question does not provide enough details to recommend the correct architecture. Although this question is old, I am simply adding my research that I have done on this topic so that it can help anyone in their research.

Spark is a great way to handle distributed systems. But he does not have a strong community working on OpenCV. Storm is another open source Apache freeware system distributed in real time. Storm makes it easy to reliably handle unlimited data streams, doing for processing in real time what Hadoop did for batch processing.

StormCV is an Apache Storm extension specifically designed to support the development of distributed computer video surveillance systems. StormCV allows you to use Storm for video processing by adding specific operations and computer vision (CV) data models. The platform uses Open CV for most of its CV operations, and it is relatively easy to use this library for other functions.

There are several examples of using the storm with OpenCV. Examples on the official git hub page. Perhaps you should take a look at this example of face detection and try to find a person - https://github.com/sensorstorm/StormCV/blob/master/stormcv-examples/src/nl/tno/stormcv/example/E2_FacedetectionTopology.java .

0
source

In fact, you can create your own logic using the Hadoop Storm framework. You can easily integrate any functionality of any specific Computer Vision library and distribute it among the bolts of this structure. In addition, Storm has a large extension called a DRPC server that allows you to use your logic as simple RPC calls. You can find a simple example of how you can process video files through Storm using OpenCV face detection in my article Consuming OpenCV through Hadoop Storm DRPC Server from .NET

0
source

All Articles