Do we speak about what we see, or do we see what we speak about? Visually perceiving information has been uncovered to be a process in which both, top-down and bottom-up processing plays a role. Top-down processes guide the eye towards information necessary to fulfill a given task. But not only that, also irrelevant, potentially distracting information is blocked from processing. This cognitive mechanism is at play, for example, when you search for your keys, or your glasses. At the same time bottom-up processes attract attention. Objects, for example, that are moving in an otherwise steady environment are automatically looked at and processed. The same is true for things that stand out by a stark contrast against the background etc.
Recent findings show that language at least to some extent exerts influence on how visual information is processed. Depending on the linguistic task at hand and the language used to fulfill the task, visual attention patterns vary. Thus, language must be considered as one of the factors that modulate top-down processing.
The details on how language exactly shapes visual behavior have mainly been studied using eye tracking. With eye tracking it is possible to measure which region a subject fixates at a given point in time during an experimental trial, (i.e. in a fairly controlled environment). By employing this methodology, it is thus possible to determine where overt attention is when. However, there are two (severe) problems. 1.) Visual information uptake is possible without directly fixating the region that contains the relevant information (parafoveal processing), and 2.) fixations reflect scene comprehension, object identification, as well as information retrieval from the mental lexicon, which makes it challenging to relate fixation patterns to specific phases during the verbalization process.
The aim of this project is to develop and evaluate an experimental procedure with which it will be possible to tackle the problems pointed out above. The basic idea is to trigger the cognitive processes that are involved in describing a visual stimulus (visual perception, scene apprehension, object identification, planning and constructing the linguistic representation) and to modulate features of the visual stimulus while these processes are ongoing. The point in time of the modulation should then be informative about the processes that have been completed up to that point, that are currently ongoing, or that have not been started yet.
This idea is based on the fact that people often miss changes in their visual environment when their cognitive capacity is captured by high level processing.
By combining online stimulus feature manipulation with eye tracking it should be possible to evaluate previous claims about the time course of the visual and the perception process, speech planning and preparation processes, as well as the way both are interleaved.
As a test case for our method, we will test hypotheses, that are derived from linguistic typology in the domain of motion events.