Eye Tracking Pilot: #betterposter Design v2 Enabled Faster and Easier Understanding of Results

Abstract

Every academic is confronted with the need to stay on top of trends and advancements within their field and to detect those aspects and involved researchers that help their own research progress and vice versa. One important path to that end is scanning the posters during poster sessions at academic conferences efficiently and effectively. By applying eye tracking technology to various academic poster designs, this exploratory experimental study aims to reveal how design choices foster or hinder information collection during poster sessions.

The eye tracking experiment was conducted using an eyelink 1000 Pro. Eight academic posters were presented on screen, 20 sec. each, en bloc, but in randomized order. The test persons were thirteen academics. Time to search for the desired information, findability of relevant messages, as well as cognitive load during the information collection process were examined.

Putting all limitations aside (e.g. n=13, arbitrary stimuli selection and AOI definition, missing baseline corrections, large individual differences), results show, that reduction of content (as inherent to #betterposter design) forced the gaze to the main parts of the poster. But it was only the landscape layout of #betterposter v2 design, that attracted all participants to look at all relevant parts and it’s been the “Presenter Mode” of #betterposter v2 design, that best matched the gaze sequence, established in western culture (top left as starting point). Both findings are supported by corresponding good ratings in questionnaire. Therefore the #betterposter v2 “Presenter Mode” is recommended for use.

This pilot study expanded the knowledge on information collection and cognitive load, that is of growing importance with the increasing amount of research produced. This pilot study enabled the author to learn dos and don’ts, compile recommendations, and collect what to consider when conducting the next eye tracking experiments on academic poster design.

Keywords:eye trackingbetterposter designexperimental studyusability

Video results

Eye tracked scientific posters

Introduction

As Mike Morrison has pointed a out (Morrison, 2020), the way how academic posters are presented has not changed during the last 30 years, although research has advanced. To tackle our global problems, help research advancing, and prevent research from being ineffective and ignored[^avgvisits], we need new ways to communicate research results and ideas (Morrison, 2020; Morrison, Merlo, and Woessner, 2020; Oronje et al., 2022; Rowe & Ilic, 2015; Ilic & Rowe, 2013). That was and is the motivation of Mike Morrison, who in 2019 created and presented the first #betterposter design and sparked a movement (Morrison, 2019).

By applying digital tools (eg QR codes pointing to a full paper that is documenting the research process and/or linking to all resources, necessary to reproduce the study, as required e.g. by the Guidelines for Good Scientific Practice of the Austrian Agency for Research Integrity (2015), the academic posters’ content may be reduced to the most important messages (Morrison et al., 2020). By further applying findings of UI/UX design and psychological research on communication and information foraging theory (Pirolli & Card, 1999; Mayer & Moreno, 2003), academic posters may be designed, that are faster and easier to understand and thus better serve their purpose of spreading knowledge.

By means of eye tracking technology, this exploratory experimental study aims at revealing how design choices foster or hinder information collection. Various academic poster designs are examined regarding time and effort to search for the desired information, as well as cognitive load expressed during the information collection process. The findings shall inform the next #betterposter design, supporting research and academics with user experience (UX) tested communication and sharing facilities, better adjusted to the increasing amount of research produced.

Traditional scientific posters received an average of 6.4 visitors, according to presenters’ own subjective count” (Morrison, Merlo, and Woessner, 2020)

Methods

Eye tracking technology is used to locate and follow a person’s gaze while looking at objects. Since it is difficult to look at something and think about something different, the focus of the eye is often equated with the focus of the brain (Krasich et al., 2020). This may be observed by a changing focus and altered gaze traces, when the tasks given to the participants change (Buswell, 1935, p. 136 and Yarbus, 1967, p. 174).

In user experience testing the think-aloud approach[1], observation, questionnaires, as well as eye tracking technology, and combinations are used (Bojko, 2013. p. 106). Eye tracking is used to a smaller degree, since it requires expensive equipment and trained personnel. Eye tracking delivers high-precision measurements and outcomes, that in general are not necessary for most applications in UX testing (Bojko 2013, p. 44). The current study relies on eye tracking technology based on the fact, that cognitive load[2] may be reliably measured by the participants’ pupil size and average fixation duration, and this was key in the context of this study.

Study Design

Following Aga Bojko’s (Bojko, 2013, p. 124ff) UX procedure, mental workload and cognitive processes (providing insights on how easy /difficult the message was to convey) were measured by pupil diameter and average fixation duration; effective target identification (disclosing how well topic and results are presented) were determined by the percentage of test persons that fixated on AOIs, efficiency by the number of fixations and timespan, before the first fixation on any of the targets (AOI) took place.

As proposed by Aga Bojko (Bojko, 2013, p. 80), a within-subjects[3] approach was chosen, carryover effects were controlled by stimuli presented in randomized order.

Experiment Setup

The experiment took place at the MediaLab of the University of Vienna, November 23rd 2022, in the afternoon. The experiment was conducted in a separated, quiet room with two persons present: the experimenter on the one hand, an experienced member of the MediaLab who supervised the whole procedure, including the calibration, and the participant on the other hand.

Each of them was sitting in front of a computer, the two desks were separated, with no visual contact between the persons present.

The heads of the participants were not fixed (no chinrest), but freely moveable. Three different blocs of stimuli were presented to every participant, every stimulus for 20 seconds. The data collection process (including introduction and calibration) took about 10 minutes per participant. After the data collecting process, the participants filled in a questionnaire in a separate room with the stimuli presented as printouts, recording their ratings regarding understandability and how much they liked the overall appearance of the various posters (Likert Scale 1-5).

The eye tracking experiment was conducted using an Eyelink 1000 Pro with a data sample size of ~1000 data points per second. In total more than two million data points were collected and analyzed.

The data sets, derived from the Eyelink 1000 Pro eyetracking system, were compiled, analyzed, and visualized in R (R version 4.2.2 (2022-10-31 ucrt) -- “Innocent and Trusting”), as provided by the R Core Team, together with the documentation of the research process, using the IDE RStudio/Posit (RStudio Team, 2020; Posit team, 2022) and RMarkdown.

Measurements

In this experiment academic posters were investigated by using seven measurements associated with four categories: cognitive load, efficiency, effectivity, subjective ratings/opinions.

Signs of cognitive load were measured by the average fixation duration and by pupil size (Bojko, 2013, pp. 36, 96, 135 and Holmqvist et al., 2011, p. 381ff), targeting relevant information (effectivity) was measured as percentage of participants, who fixated all relevant parts of the posters (defined as Area of Interest) at least once (Bojko, 2013, p. 127ff), efficiency was measured by counting fixation steps to target (relevant messages predefined as AOI area of interest) and time to target (Bojko, 2013, p. 126f). Opinions and ratings were collected via a questionnaire.

Participants

Participants were four members of the MediaLab and nine participants of the university course “Introduction to Eye Tracking”, 13 in total, all academics. They were introduced to the experiment setting, did a calibration run and watched the stimuli with no task assigned.

Their experiences with the #betterposter initiative, posters and poster sessions at conferences differed greatly: 11 participants had never heard of, 1 was unsure, 1 (the author) was familiar with #betterposter design; 7 have already created poster(s) themselves; 4 participants have never been at academic conferences, 7 have attended 1-5 conferences, 2 have attended more than 5 conferences.

Stimuli

Eight academic posters were used as stimuli (Figure 2): two related to the traditional poster layout, two in #betterposter version 1 design, two in text-only #betterposter version 1 design, two in #betterposter version 2 design, all in landscape format. They stem from various fields of research.

Results

The averages of the measures of the 13 participants are displayed in tabular form (Table 1) and are shown in overview as parallel coordinates chart in Figure 3. The data in this chart are normalized, a procedure that allows to compare data with disparate sizes and units, as created by eye tracking. The results back the assumption, that the most recent version of #betterposter design (posters 1 and 2, Figure 2) work best in comparison.

Cognitive Load

Larger pupil diameter and longer fixation duration are associated with higher cognitive load (Bojko, 2013, pp. 36, 96, 135; Galley et al., 2015; and Holmqvist et al., 2011, p. 381ff). Low measures in both aspects for #betterposter version 2 indicate easy to grasp and easy to process information.

Regarding fixation duration the #betterposter version 2 designs are ranked first and third in this aspect (Table 1, Figure 3). In pupil size #betterposter version 2 designs are ranked first and fourth (Table 1, Figure 3).

Efficiency

Efficiency measured as steps2target (number of fixations) and time2target (in milliseconds) till the first fixation within any area of interest takes place, results in #betterforest version 2 designs ranked first and fourth (both measurements are highly correlated). Overall, five out of eight posters score quite low in this aspect (Table 1, Figure 3), indicating that at least one area of interest is looked at quite quickly (Figure 4 suggests this to be the title text).

Effectivity

Regarding effectivity only #betterposter version 2 designs attracted all participants to look at all relevant parts of the posters (Table 1, Figure 3). Although all participants looked at all posters undisturbed for 20 seconds, relevant parts of the other posters were not even looked at (Figure 4).

Questionnaire

Results derived from analyzing the questionnaire show as well high scores for #betterposter version 2 designs: regarding understandability of the results, they were ranked first and fourth, regarding overall appearance they were ranked first and third (Table 1, Figure 3).

Discussion

As Aga Bojko (Bojko, 2013, p. 123) reminds us “interpretation depends on goals and stimuli”. Therefore, we have to mention, that one (of the many) limitation of this pilot studyin[4] is associated with poor experiment design, e.g. not providing tasks (what to watch for) for the participants, - something that changes significantly the way how an image is looked at, as already pointed out by Alfred Yarbus (Yarbus, 1967, pp. 171 ff.). As the raincloud plots (Figure 5 and Figure 6) for average fixation duration and pupil size show, the measures of the 13 participants are widespread, this holds true for the other measures including the questionnaire as well. An additional hint, why we may not conclude from this small sample to a larger population.

A non-exhaustive list of things to consider for the next study is listed in the section Conclusion and Next Steps.

Regarding the treatment and interpretation of measures, derived from EyeLink 1000 Pro, here some criticism:

Results in averages of fixation duration might not be as significant, as they are (almost all) situated within the normal range for reading (200 -250 ms according to Bojko, 2013, p. 135 and 200 ms for light fiction and 260 ms for texts on biology and physics according to Holmqvist et al., 2011, p. 382), but looking at the gaze signature (Figure 7) depicting the number of fixations and their duration as timelines, we see spikes up to 500 – 700 ms (even much higher spiking for other posters). There is also evidence (Holmqvist et al., 2011, p. 383) that shorter fixation duration may occur with high stress levels, - an additional measure e.g. NASA Task Load Index [TLX] needs to be applied to make the distinction.

Pupil size, a highly idiosyncratic measure (Bojko, 2013, p. 131 and Holmqvist et al., 2011, p. 393) needs a special experiment set-up (scrambled pictures for brightness adjustments) and a different treatment of resulting data (no merging all participants averages into one average, but calculating individual baselines, measuring differences, and building rankings).

Steps2target and time2target as well show widespread data, it is recommended to measure these for all AOIs of a poster separately (not just for the one, that is looked at first).

Effectivity is the only category for which this study provides a clear and unquestionable outcome: only #betterposter design version 2 designs attract all participants to look at all areas of interest (Figure 4). This result most probably is caused by radical de-cluttering of the vers. 2 posters, an effect, that might be observed by comparing the heatmaps (Figure 8) of all posters, but although the #betterposter version 2 designs are extremely reduced, there are still parts outside the AOIs that are looked at, as shown by the participants’ gaze signatures of poster n°2 (Figure 7). Effectivity measures shall include dwell time in future experiments.

Questionnaire data again show large individual differences, providing no indications for generalization.

Conclusion and Next Steps

This exploratory experimental pilot study shows promising ways to answer pressing questions regarding better research communication with academic posters and has detected and explicated many shortcomings in experiment design, analysis, and interpretation of results.

Figures & Tables

Table 1:Measurements

postIDavgFixDuravgPupilSizesteps2targettime2targetpercOnAOIquest_resultsquest_appear
bpv1_p03233.63399.845.311095.0069.252.922.77
bpv1_p05232.86425.640.3132.0896.001.922.08
bpv2_p01217.25369.190.6285.62100.003.542.85
bpv2_p02208.64384.180.3112.08100.004.463.69
bv1t_p06213.91414.330.4659.1592.504.232.85
bv1t_p07253.60407.400.6992.1566.673.692.77
trad_p04231.48376.382.54495.1574.331.852.38
trad_p08225.06380.212.15359.6288.502.772.38

The table with the measurements shows, that the differences are sometimes quite small and of questionable significance. Note. Average Fixation Duration (avgFixDur) measures the average of all fixations timespans in milliseconds (not to mix up with dwell time!); Eyelink 1000 Pro provides the Average Pupil Size (avgPupilSize) as value, not related to any unit; Steps2target is derived from counting the number of fixations until the first AOI (any) is reached; Time2target, by taking the average of all participants’ summed up fixation durations until an AOI is reached in milliseconds; Percentages on AOI (percOnAOI) denotes the percentage of the participants who looked at each AOI of the poster (100% = have seen all relevant parts); quest_results is an assessment (given in a separate questionnaire) regarding the understandability of the results on a Likert scale (1= difficult, 5=very easy); quest_appear rates the overall appearance of the poster (how much the participants liked it) on a Likert scale (1=dislike, 5=like very much);

Different poster designs, numbered.

Figure 2:The Stimuli: Poster Material (8+4: traditional poster design, 1+2: #betterposter version 2 designs); 3+5: #betterposter version 1 designs; 7+6 #betterposter version 1 designs text-only) Note: Original posters are colored and just presented in black and white here; Posters 1 and 2, the “winner” poster designs are marked red and orange in graphs and tables throughout this paper.

All measures in one chart

Figure 3:Parallel Coordinates Chart Depicting all Measurements Shown in Table 1 in Comparison Note. Parallel Coordinate Charts show values as shown in the data table (Table 1) with all normalized values of one variable lined up vertically and the observations (the posters) connected by wavelike lines horizontally. This connectedness makes it easier to assess ranks and positions in relation. The data are normalized between min = 0 and max = 1. The normalization procedure was provided by the R library(GGally) used.

Bar chart

Figure 4:Bar Chart depicting the number and percentages of participants who looked at the posters’ AOIs. Note: All participants looked at all AOIs of #betterposter version 2 designs. Number of participants are shown within the bars, percentages are shown outside the bars. This graph does not indicate though, how long the participants looked at the areas of interest.

Raincloud plot

Raincloud Plot Depicting the Average Fixation Duration of Participants as Dots, Centrality Measures, and Density Distributions. Note: The mean value is marked with an “x”. The participants’ values are spread out (except those of poster N°2) and there is an outlier present, that perhaps should have been removed.

Raincloud plot of pupil size

Figure 6:Raincloud Plot depicting the Average Pupil Size of participants as dots, Centrality measures, and Density Distributions. Note: The mean value is marked with an “x”. The participants’ values are spread out and there are outliers present, that perhaps should have been removed.

Gaze signatures

Figure 7:Gaze Signatures of Participants Watching Poster N° 2 based on Fixation Duration Timelines. Note: Orange are Fixations within AOIs and black those outside AOIs. The duration of the fixations is marked as line (spike=longer duration). This graph was meant to reveal patterns on how easy / difficult the poster content was to digest. The idea was, that good poster design should produce no spikes (shorter fixation duration is associated with lower cognitive load) and should be mostly orange (participants looking at the relevant parts of the poster mostly). Although this poster n°2 (considered the best #betterposter design) does invoke long and unspiked orange line segments (especially in comparison with other posters), it remains unclear (to the author) how to interpret the results.

#betterposter heatmaps

(a)#betterposter heatmaps

Traditional poster heatmaps

(b)Traditional poster heatmaps

Figure 8:Heatmaps, Depicting the Focal Points for all Poster Designs in Comparison. Note: Red = areas that are looked at most often, no Color = nobody looked at this part of the poster. Compare e.g. poster n° 4 on the lower left side (esp. 3rd column: few to non-views in the result section) with poster n° 1 on the lower right side (all parts visited quite equally).

Author’s Note

Footnotes
  1. “In a concurrent verbal protocol (CVP, also known as the “think-aloud protocol”), participants articulate their thoughts in real time during the execution of a task.” (Bojko, 2013, p. 106ff)

  2. 3 Cognitive Load can be assessed by performance measures (e.g. time spent on task and scores achieved), subjective rating on task difficulty, and psychophysical tools, e.g. eye tracking, with longer fixation duration and increased pupil dilation as signs for higher cognitive load (Katona, 2022; Holmqvist, 2011, p. 393ff)

  3. In within-subjects study designs all participants are exposed to all tested stimuli (Bojko, 2013, p. 79).

  4. For a non-exhaustive list of shortcomings of this study and necessary improvements for subsequent studies see the sections “Discussion” and “Conclusion and Next Steps”

References
  1. Morrison, M., Merlo, K., & Woessner, Z. (2020). How to Boost the Impact of Scientific Conferences. Cell, 182(5), 1067–1071. 10.1016/j.cell.2020.07.029
  2. Oronje, B., Morrison, M., Suharlim, C., Folkman, K., Glaude-Hosh, A., & Jeisy-Scott, V. (2022). A step in the right direction: Billboard-style posters preferred overall at two conferences, but should include more methods and limitations. 10.32388/p7n5bo
  3. Rowe, N., & Ilic, D. (2015). Rethinking poster presentations at large‐scale scientific meetings – is it time for the format to evolve? The FEBS Journal, 282(19), 3661–3668. 10.1111/febs.13383
  4. Ilic, D., & Rowe, N. (2013). What is the evidence that poster presentations are effective in promoting knowledge transfer? A state of the art review. Health Information & Libraries Journal, 30(1), 4–12. 10.1111/hir.12015
  5. Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. 10.1037/0033-295x.106.4.643
ScienceUXScienceUX
Research, design, training & community for #betterscience