(New to computer vision)
The goal is to recreate the level of the game using image stitching or any other method. The level that someone is playing is recorded on video, these frames will be the input.
Expected Result: (Level 4-4 SMB from http://www.vgmaps.com/ ) 
This is my first attempt to solve this problem using OpenCV (EmguCV). So far, the results are excellent, but I was wondering if there are more suitable methods, knowing that my input will be strictly in 2D?
I am open to try another framework / technique that is not too complicated.
Here are the source images:

Result of the first 7 images: (for some reason, Stitcher in OpenCV did not accept 10 right away ...)

Result of the last 3 images:

source share