Visual processing occurs at a subconscious level. In order to quickly perform these tasks, certain assumptions (heuristics) have been hardwired into our brains. An example of this is the Crater Illusion. We evolved with light sources (Sun and Moon) overhead, so our perception of whether an object is concave or convex can be altered by how the light is shone on the object. By taking advantage of other assumptions, we can also fool our perception of color and movement.
Photography can produce dramatic pictures by masking or altering the visual cues present in a scene. While perusing your friend's Facebook or Instagram feed you may see humorous pictures taking advantage of parallax errors and forced perspective. For example, by carefully lining up the camera, the subject and a distant object can appear to be in the same plane and of similar scale. This can make people look much bigger or smaller, depending on the positioning and identity of the other object. This trick has been used in films such as Lord of the Rings to allow actors of average height to play hobbits. Part of this effect is achieved by using a wide depth of field, that is most of the scene is in focus. A narrow depth of field results in a limited portion of the scene in focus and the rest blurry. Known as "bokeh", the out of focus portion of an image can be used to draw the eye to the subject of the picture. This technique is often employed in portraiture and mimics our vision when we are looking directly at an individual.
The ability of our eyes to selectively focus is limited, particularly close up. With age this can become even more difficult, requiring the use of reading glasses or bifocals. However, the takeaway is that the closer a group of objects is, the harder it will be to have them all in focus. With objects in the distance, such as a landscape, our eyes or the camera can focus to infinity, providing a wide depth of field. Therefore, what and how much of a scene is in focus provide clues as to relative size.
Using specialized lenses or post-processing, a scene can be altered to give the impression that one is looking at a miniature. The lenses, tilt-shift, are horrendously expensive and of limited usefulness. Fortunately, programs like Photoshop and Instagram include filters designed to mimic this effect.
If you've been following up till now then you can probably guess why this miniaturization effect works. In the top picture the entire scene is in focus, just as it would be if we were staring off into the distance. The bottom picture has the top and bottom blurred with only small area in the center focused. This is how we would see a diorama on a table in front of us. Even armed with the knowledge of an object's true size, it can be hard to "see" past our brain's initial interpretation.
Facial recognition is very important to humans. For children, being able to identify their parents is critical to their survival. Before they can walk infants can recognize when their mother or father enters a room. Later in life, the ability to distinguish friend from foe is equally necessary for survival. Due to this, evolution has led to the dedicated neural processing of faces. This means we have a bias toward detecting faces...even when they aren't actually there. Examples of mis-identification (pareidolia) include the face on Mars and the countless variations in "Jesus is on my grilled cheese". Certain individuals suffer from prosopagnosia or the inability to recognize faces, famously Brad Pitt has suggested he was born with the condition. The late, great neuroscientist Oliver Sacks chronicled the difficulties of a man possessing a degenerative brain disorder that slowly robbed him of his ability to identify objects (The Man Who Mistook His Wife for a Hat). Brain damage can also suddenly induce this condition.