The Future of VR and AR How These Technologies Will Change Our Lives
Augmented reality and virtual reality displays: emerging technologies and future perspectives
Unlike VR displays with a relatively fixed optical configuration, there exist a vast number of architectures in AR displays. Therefore, instead of following the narrative of tackling different challenges, a more appropriate way to review AR displays is to separately introduce each architecture and discuss its associated engineering challenges. An AR display usually consists of a light engine and an optical combiner. The light engine serves as display image source, while the combiner delivers the displayed images to viewers eye and in the meantime transmits the environment light. Some performance parameters like frame rate and power consumption are mainly determined by the light engine. Parameters like FoV, eyebox and MTF are primarily dependent on the combiner optics. Moreover, attributes like image brightness, overall efficiency, and form factor are influenced by both light engine and combiner. In this section, we will firstly discuss the light engine, where the latest advances in micro-LED on chip are reviewed and compared with existing microdisplay systems. Then, we will introduce two main types of combiners: free-space combiner and waveguide combiner.
Light engine
The light engine determines several essential properties of the AR system like image brightness, power consumption, frame rate, and basic etendue. Several types of microdisplays have been used in AR, including micro-LED, micro-organic-light-emitting-diodes (micro-OLED), liquid-crystal-on-silicon (LCoS), digital micromirror device (DMD), and laser beam scanning (LBS) based on micro-electromechanical system (MEMS). We will firstly describe the working principles of these devices and then analyze their performance. For those who are more interested in final performance parameters than details, Table 1 provides a comprehensive summary.
Working principles
Micro-LED and micro-OLED are self-emissive display devices. They are usually more compact than LCoS and DMD because no illumination optics is required. The fundamentally different material systems of LED and OLED lead to different approaches to achieve full-color displays. Due to the green gap in LEDs, red LEDs are manufactured on a different semiconductor material from green and blue LEDs. Therefore, how to achieve full-color display in high-resolution density microdisplays is quite a challenge for micro-LEDs. Among several solutions under research are two main approaches. The first is to combine three separate red, green and blue (RGB) micro-LED microdisplay panels106. Three single-color micro-LED microdisplays are manufactured separately through flip-chip transfer technology. Then, the projected images from three microdisplay panels are integrated by a trichroic prism (Fig. 7a).
Another solution is to assemble color-conversion materials like quantum dot (QD) on top of blue or ultraviolet (UV) micro-LEDs107,108,109 (Fig. 7b). The quantum dot color filter (QDCF) on top of the micro-LED array is mainly fabricated by inkjet printing or photolithography110,111. However, the display performance of color-conversion micro-LED displays is restricted by the low color-conversion efficiency, blue light leakage, and color crosstalk. Extensive efforts have been conducted to improve the QD-micro-LED performance. To boost QD conversion efficiency, structure designs like nanoring112 and nanohole113,114 have been proposed, which utilize the Frster resonance energy transfer mechanism to transfer excessive excitons in the LED active region to QD. To prevent blue light leakage, methods using color filters or reflectors like distributed Bragg reflector (DBR)115 and CLC film116 on top of QDCF are proposed. Compared to color filters that absorb blue light, DBR and CLC film help recycle the leaked blue light to further excite QDs. Other methods to achieve full-color micro-LED display like vertically stacked RGB micro-LED array61,117,118 and monolithic wavelength tunable nanowire LED119 are also under investigation.
Micro-OLED displays can be generally categorized into RGB OLED and white OLED (WOLED). RGB OLED displays have separate sub-pixel structures and optical cavities, which resonate at the desirable wavelength in RGB channels, respectively. To deposit organic materials onto the separated RGB sub-pixels, a fine metal mask (FMM) that defines the deposition area is required. However, high-resolution RGB OLED microdisplays still face challenges due to the shadow effect during the deposition process through FMM. In order to break the limitation, a silicon nitride film with small shadow has been proposed as a mask for high-resolution deposition above 2000 PPI (9.3m)120.
WOLED displays use color filters to generate color images. Without the process of depositing patterned organic materials, a high-resolution density up to 4000 PPI has been achieved121 (Fig. 7c). However, compared to RGB OLED, the color filters in WOLED absorb about 70% of the emitted light, which limits the maximum brightness of the microdisplay. To improve the efficiency and peak brightness of WOLED microdisplays, in 2019 Sony proposed to apply newly designed cathodes (InZnO) and microlens arrays on OLED microdisplays, which increased the peak brightness from 1600 nits to 5000 nits120. In addition, OLEDWORKs has proposed a multi-stacked OLED122 with optimized microcavities whose emission spectra match the transmission bands of the color filters. The multi-stacked OLED shows a higher luminous efficiency (cd/A), but also requires a higher driving voltage. Recently, by using meta-mirrors as bottom reflective anodes, patterned microcavities with more than 10,000 PPI have been obtained123. The high-resolution meta-mirrors generate different reflection phases in the RGB sub-pixels to achieve desirable resonant wavelengths. The narrow emission spectra from the microcavity help to reduce the loss from color filters or even eliminate the need of color filters.
LCoS and DMD are light-modulating displays that generate images by controlling the reflection of each pixel. For LCoS, the light modulation is achieved by manipulating the polarization state of output light through independently controlling the liquid crystal reorientation in each pixel124,125 (Fig. 7d). Both phase-only and amplitude modulators have been employed. DMD is an amplitude modulation device. The modulation is achieved through controlling the tilt angle of bi-stable micromirrors126 (Fig. 7e). To generate an image, both LCoS and DMD rely on the light illumination systems, with LED or laser as light source. For LCoS, the generation of color image can be realized either by RGB color filters on LCoS (with white LEDs) or color-sequential addressing (with RGB LEDs or lasers). However, LCoS requires a linearly polarized light source. For an unpolarized LED light source, usually, a polarization recycling system127 is implemented to improve the optical efficiency. For a single-panel DMD, the color image is mainly obtained through color-sequential addressing. In addition, DMD does not require a polarized light so that it generally exhibits a higher efficiency than LCoS if an unpolarized light source is employed.
MEMS-based LBS128,129 utilizes micromirrors to directly scan RGB laser beams to form two-dimensional (2D) images (Fig. 7f). Different gray levels are achieved by pulse width modulation (PWM) of the employed laser diodes. In practice, 2D scanning can be achieved either through a 2D scanning mirror or two 1D scanning mirrors with an additional focusing lens after the first mirror. The small size of MEMS mirror offers a very attractive form factor. At the same time, the output image has a large depth-of-focus (DoF), which is ideal for projection displays. One shortcoming, though, is that the small system etendue often hinders its applications in some traditional display systems.
Comparison of light engine performance
There are several important parameters for a light engine, including image resolution, brightness, frame rate, contrast ratio, and form factor. The resolution requirement (>2K) is similar for all types of light engines. The improvement of resolution is usually accomplished through the manufacturing process. Thus, here we shall focus on other three parameters.
Image brightness usually refers to the measured luminance of a light-emitting object. This measurement, however, may not be accurate for a light engine as the light from engine only forms an intermediate image, which is not directly viewed by the user. On the other hand, to solely focus on the brightness of a light engine could be misleading for a wearable display system like AR. Nowadays, data projectors with thousands of lumens are available. But the power consumption is too high for a battery-powered wearable AR display. Therefore, a more appropriate way to evaluate a light engines brightness is to use luminous efficacy (lm/W) measured by dividing the final output luminous flux (lm) by the input electric power (W). For a self-emissive device like micro-LED or micro-OLED, the luminous efficacy is directly determined by the device itself. However, for LCoS and DMD, the overall luminous efficacy should take into consideration the light source luminous efficacy, the efficiency of illumination optics, and the efficiency of the employed spatial light modulator (SLM). For a MEMS LBS engine, the efficiency of MEMS mirror can be considered as unity so that the luminous efficacy basically equals to that of the employed laser sources.
As mentioned earlier, each light engine has a different scheme for generating color images. Therefore, we separately list luminous efficacy of each scheme for a more inclusive comparison. For micro-LEDs, the situation is more complicated because the EQE depends on the chip size. Based on previous studies130,131,132,133, we separately calculate the luminous efficacy for RGB micro-LEDs with chip size20m. For the scheme of direct combination of RGB micro-LEDs, the luminous efficacy is around 5lm/W. For QD-conversion with blue micro-LEDs, the luminous efficacy is around 10lm/W with the assumption of 100% color conversion efficiency, which has been demonstrated using structure engineering114. For micro-OLEDs, the calculated luminous efficacy is about 48lm/W120,122. However, the lifetime and EQE of blue OLED materials depend on the driving current. To continuously display an image with brightness higher than 10,000 nits may dramatically shorten the device lifetime. The reason we compare the light engine at 10,000 nits is that it is highly desirable to obtain 1000 nits for the displayed image in order to keep ACR>3:1 with a typical AR combiner whose optical efficiency is lower than 10%.
For an LCoS engine using a white LED as light source, the typical optical efficiency of the whole engine is around 10%127,134. Then the engine luminous efficacy is estimated to be 12lm/W with a 120lm/W white LED source. For a color sequential LCoS using RGB LEDs, the absorption loss from color filters is eliminated, but the luminous efficacy of RGB LED source is also decreased to about 30lm/W due to lower efficiency of red and green LEDs and higher driving current135. Therefore, the final luminous efficacy of the color sequential LCoS engine is also around 10lm/W. If RGB linearly polarized lasers are employed instead of LEDs, then the LCoS engine efficiency can be quite high due to the high degree of collimation. The luminous efficacy of RGB laser source is around 40lm/W136. Therefore, the laser-based LCoS engine is estimated to have a luminous efficacy of 32lm/W, assuming the engine optical efficiency is 80%. For a DMD engine with RGB LEDs as light source, the optical efficiency is around 50%137,138, which leads to a luminous efficacy of 15lm/W. By switching to laser light sources, the situation is similar to LCoS, with the luminous efficacy of about 32lm/W. Finally, for MEMS-based LBS engine, there is basically no loss from the optics so that the final luminous efficacy is 40lm/W. Detailed calculations of luminous efficacy can be found in Supplementary Information.
Another aspect of a light engine is the frame rate, which determines the volume of information it can deliver in a unit time. A high volume of information is vital for the construction of a 3D light field to solve the VAC issue. For micro-LEDs, the device response time is around several nanoseconds, which allows for visible light communication with bandwidth up to 1.5Gbit/s139. For an OLED microdisplay, a fast OLED with ~200MHz bandwidth has been demonstrated140. Therefore, the limitation of frame rate is on the driving circuits for both micro-LED and OLED. Another fact concerning driving circuit is the tradeoff between resolution and frame rate as a higher resolution panel means more scanning lines in each frame. So far, an OLED display with 480Hz frame rate has been demonstrated141. For an LCoS, the frame rate is mainly limited by the LC response time. Depending on the LC material used, the response time is around 1ms for nematic LC or 200s for ferroelectric LC (FLC)125. Nematic LC allows analog driving, which accommodates gray levels, typically with 8-bit depth. FLC is bistable so that PWM is used to generate gray levels. DMD is also a binary device. The frame rate can reach 30 kHz, which is mainly constrained by the response time of micromirrors. For MEMS-based LBS, the frame rate is limited by the scanning frequency of MEMS mirrors. A frame rate of 60Hz with around 1K resolution already requires a resonance frequency of around 50kHz, with a Q-factor up to 145,000128. A higher frame rate or resolution requires a higher Q-factor and larger laser modulation bandwidth, which may be challenging.
Form factor is another crucial aspect for the light engines of near-eye displays. For self-emissive displays, both micro-OLEDs and QD-based micro-LEDs can achieve full color with a single panel. Thus, they are quite compact. A micro-LED display with separate RGB panels naturally have a larger form factor. In applications requiring direct-view full-color panel, the extra combining optics may also increase the volume. It needs to be pointed out, however, that the combing optics may not be necessary for some applications like waveguide displays, because the EPE process results in systems insensitivity to the spatial positions of input RGB images. Therefore, the form factor of using three RGB micro-LED panels is medium. For LCoS and DMD with RGB LEDs as light source, the form factor would be larger due to the illumination optics. Still, if a lower luminous efficacy can be accepted, then a smaller form factor can be achieved by using a simpler optics142. If RGB lasers are used, the collimation optics can be eliminated, which greatly reduces the form factor143. For MEMS-LBS, the form factor can be extremely compact due to the tiny size of MEMS mirror and laser module.
Finally, contrast ratio (CR) also plays an important role affecting the observed images8. Micro-LEDs and micro-OLEDs are self-emissive so that their CR can be >106:1. For a laser beam scanner, its CR can also achieve 106:1 because the laser can be turned off completely at dark state. On the other hand, LCoS and DMD are reflective displays, and their CR is around 2000:1 to 5000:1144,145. It is worth pointing out that the CR of a display engine plays a significant role only in the dark ambient. As the ambient brightness increases, the ACR is mainly governed by the displays peak brightness, as previously discussed.
The performance parameters of different light engines are summarized in Table 1. Micro-LEDs and micro-OLEDs have similar levels of luminous efficacy. But micro-OLEDs still face the burn-in and lifetime issue when driving at a high current, which hinders its use for a high-brightness image source to some extent. Micro-LEDs are still under active development and the improvement on luminous efficacy from maturing fabrication process could be expected. Both devices have nanosecond response time and can potentially achieve a high frame rate with a well-designed integrated circuit. The frame rate of the driving circuit ultimately determines the motion picture response time146. Their self-emissive feature also leads to a small form factor and high contrast ratio. LCoS and DMD engines have similar performance of luminous efficacy, form factor, and contrast ratio. In terms of light modulation, DMD can provide a higher 1-bit frame rate, while LCoS can offer both phase and amplitude modulations. MEMS-based LBS exhibits the highest luminous efficacy so far. It also exhibits an excellent form factor and contrast ratio, but the presently demonstrated 60-Hz frame rate (limited by the MEMS mirrors) could cause image flickering.
Free-space combiners
The term free-space generally refers to the case when light is freely propagating in space, as opposed to a waveguide that traps light into TIRs. Regarding the combiner, it can be a partial mirror, as commonly used in AR systems based on traditional geometric optics. Alternatively, the combiner can also be a reflective HOE. The strong chromatic dispersion of HOE necessitates the use of a laser source, which usually leads to a Maxwellian-type system.
Traditional geometric designs
Several systems based on geometric optics are illustrated in Fig. 8. The simplest design uses a single freeform half-mirror6,147 to directly collimate the displayed images to the viewers eye (Fig. 8a). This design can achieve a large FoV (up to 90)147, but the limited design freedom with a single freeform surface leads to image distortions, also called pupil swim6. The placement of half-mirror also results in a relatively bulky form factor. Another design using so-called birdbath optics6,148 is shown in Fig. 8b. Compared to the single-combiner design, birdbath design has an extra optics on the display side, which provides space for aberration correction. The integration of beam splitter provides a folded optical path, which reduces the form factor to some extent. Another way to fold optical path is to use a TIR-prism. Cheng et al.149 designed a freeform TIR-prism combiner (Fig. 8c) offering a diagonal FoV of 54 and exit pupil diameter of 8mm. All the surfaces are freeform, which offer an excellent image quality. To cancel the optical power for the transmitted environmental light, a compensator is added to the TIR prism. The whole system has a well-balanced performance between FoV, eyebox, and form factor. To release the space in front of viewers eye, relay optics can be used to form an intermediate image near the combiner150,151, as illustrated in Fig. 8d. Although the design offers more optical surfaces for aberration correction, the extra lenses also add to system weight and form factor.
Regarding the approaches to solve the VAC issue, the most straightforward way is to integrate a tunable lens into the optical path, like a liquid lens152 or Alvarez lens99, to form a varifocal system. Alternatively, integral imaging153,154 can also be used, by replacing the original display panel with the central depth plane of an integral imaging module. The integral imaging can also be combined with varifocal approach to overcome the tradeoff between resolution and depth of field (DoF)155,156,157. However, the inherent tradeoff between resolution and view number still exists in this case.
Overall, AR displays based on traditional geometric optics have a relatively simple design with a decent FoV (~60) and eyebox (8mm)158. They also exhibit a reasonable efficiency. To measure the efficiency of an AR combiner, an appropriate measure is to divide the output luminance (unit: nit) by the input luminous flux (unit: lm), which we note as combiner efficiency. For a fixed input luminous flux, the output luminance, or image brightness, is related to the FoV and exit pupil of the combiner system. If we assume no light waste of the combiner system, then the maximum combiner efficiency for a typical diagonal FoV of 60 and exit pupil (10mm square) is around 17,000 nit/lm (Eq. S2). To estimate the combiner efficiency of geometric combiners, we assume 50% of half-mirror transmittance and the efficiency of other optics to be 50%. Then the final combiner efficiency is about 4200 nit/lm, which is a high value in comparison with waveguide combiners. Nonetheless, to further shrink the system size or improve system performance ultimately encounters the etendue conservation issue. In addition, AR systems with traditional geometric optics is hard to achieve a configuration resembling normal flat glasses because the half-mirror has to be tilted to some extent.
Maxwellian-type systems
The Maxwellian view, proposed by James Clerk Maxwell (1860), refers to imaging a point light source in the eye pupil159. If the light beam is modulated in the imaging process, a corresponding image can be formed on the retina (Fig. 9a). Because the point source is much smaller than the eye pupil, the image is always-in-focus on the retina irrespective of the eye lens focus. For applications in AR display, the point source is usually a laser with narrow angular and spectral bandwidths. LED light sources can also build a Maxwellian system, by adding an angular filtering module160. Regarding the combiner, although in theory a half-mirror can also be used, HOEs are generally preferred because they offer the off-axis configuration that places combiner in a similar position like eyeglasses. In addition, HOEs have a lower reflection of environment light, which provides a more natural appearance of the user behind the display.
To modulate the light, a SLM like LCoS or DMD can be placed in the light path, as shown in Fig. 9b. Alternatively, LBS system can also be used (Fig. 9c), where the intensity modulation occurs in the laser diode itself. Besides the operation in a normal Maxwellian-view, both implementations offer additional degrees of freedom for light modulation.
For a SLM-based system, there are several options to arrange the SLM pixels143,161. Maimone et al.143 demonstrated a Maxwellian AR display with two modes to offer a large-DoF Maxwellian-view, or a holographic view (Fig. 9d), which is often referred as computer-generated holography (CGH)162. To show an always-in-focus image with a large DoF, the image can be directly displayed on an amplitude SLM, or using amplitude encoding for a phase-only SLM163. Alternatively, if a 3D scene with correct depth cues is to be presented, then optimization algorithms for CGH can be used to generate a hologram for the SLM. The generated holographic image exhibits the natural focus-and-blur effect like a real 3D object (Fig. 9d). To better understand this feature, we need to again exploit the concept of etendue. The laser light source can be considered to have a very small etendue due to its excellent collimation. Therefore, the system etendue is provided by the SLM. The micron-sized pixel-pitch of SLM offers a certain maximum diffraction angle, which, multiplied by the SLM size, equals system etendue. By varying the display content on SLM, the final exit pupil size can be changed accordingly. In the case of a large-DoF Maxwellian view, the exit pupil size is small, accompanied by a large FoV. For the holographic display mode, the reduced DoF requires a larger exit pupil with dimension close to the eye pupil. But the FoV is reduced accordingly due to etendue conservation. Another commonly concerned issue with CGH is the computation time. To achieve a real-time CGH rendering flow with an excellent image quality is quite a challenge. Fortunately, with recent advances in algorithm164 and the introduction of convolutional neural network (CNN)165,166, this issue is gradually solved with an encouraging pace. Lately, Liang et al.166 demonstrated a real-time CGH synthesis pipeline with a high image quality. The pipeline comprises an efficient CNN model to generate a complex hologram from a 3D scene and an improved encoding algorithm to convert the complex hologram to a phase-only one. An impressive frame rate of 60 Hz has been achieved on a desktop computing unit.
For LBS-based system, the additional modulation can be achieved by integrating a steering module, as demonstrated by Jang et al.167. The steering mirror can shift the focal point (viewpoint) within the eye pupil, therefore effectively expanding the system etendue. When the steering process is fast and the image content is updated simultaneously, correct 3D cues can be generated, as shown in Fig. 9e. However, there exists a tradeoff between the number of viewpoint and the final image frame rate, because the total frames are equally divided into each viewpoint. To boost the frame rate of MEMS-LBS systems by the number of views (e.g., 3 by 3) may be challenging.
Maxwellian-type systems offer several advantages. The system efficiency is usually very high because nearly all the light is delivered into viewers eye. The system FoV is determined by the f/# of combiner and a large FoV (~80 in horizontal) can be achieved143. The issue of VAC can be mitigated with an infinite-DoF image that deprives accommodation cue, or completely solved by generating a true-3D scene as discussed above. Despite these advantages, one major weakness of Maxwellian-type system is the tiny exit pupil, or eyebox. A small deviation of eye pupil location from the viewpoint results in the complete disappearance of the image. Therefore, to expand eyebox is considered as one of the most important challenges in Maxwellian-type systems.
Pupil duplication and steering
Methods to expand eyebox can be generally categorized into pupil duplication168,169,170,171,172 and pupil steering9,13,167,173. Pupil duplication simply generates multiple viewpoints to cover a large area. In contrast, pupil steering dynamically shifts the viewpoint position, depending on the pupil location. Before reviewing detailed implementations of these two methods, it is worth discussing some of their general features. The multiple viewpoints in pupil duplication usually mean to equally divide the total light intensity. In each time frame, however, it is preferable that only one viewpoint enters the users eye pupil to avoid ghost image. This requirement, therefore, results in a reduced total light efficiency, while also conditioning the viewpoint separation to be larger than the pupil diameter. In addition, the separation should not be too large to avoid gap between viewpoints. Considering that human pupil diameter changes in response to environment illuminance, the design of viewpoint separation needs special attention. Pupil steering, on the other hand, only produces one viewpoint at each time frame. It is therefore more light-efficient and free from ghost images. But to determine the viewpoint position requires the information of eye pupil location, which demands a real-time eye-tracking module9. Another observation is that pupil steering can accommodate multiple viewpoints by its nature. Therefore, a pupil steering system can often be easily converted to a pupil duplication system by simultaneously generating available viewpoints.
To generate multiple viewpoints, one can focus on modulating the incident light or the combiner. Recall that viewpoint is the image of light source. To duplicate or shift light source can achieve pupil duplication or steering accordingly, as illustrated in Fig. 10a. Several schemes of light modulation are depicted in Fig. 10be. An array of light sources can be generated with multiple laser diodes (Fig. 10b). To turn on all or one of the sources achieves pupil duplication or steering. A light source array can also be produced by projecting light on an array-type PPHOE168 (Fig. 10c). Apart from direct adjustment of light sources, modulating light on the path can also effectively steer/duplicate the light sources. Using a mechanical steering mirror, the beam can be deflected167 (Fig. 10d), which equals to shifting the light source position. Other devices like a grating or beam splitter can also serve as ray deflector/splitter170,171 (Fig. 10e).
Nonetheless, one problem of the light source duplication/shifting methods for pupil duplication/steering is that the aberrations in peripheral viewpoints are often serious168,173. The HOE combiner is usually recorded at one incident angle. For other incident angles with large deviations, considerable aberrations will occur, especially in the scenario of off-axis configuration. To solve this problem, the modulation can be focused on the combiner instead. While the mechanical shifting of combiner9 can achieve continuous pupil steering, its integration into AR display with a small factor remains a challenge. Alternatively, the versatile functions of HOE offer possible solutions for combiner modulation. Kim and Park169 demonstrated a pupil duplication system with multiplexed PPHOE (Fig. 10f). Wavefronts of several viewpoints can be recorded into one PPHOE sample. Three viewpoints with a separation of 3mm were achieved. However, a slight degree of ghost image and gap can be observed in the viewpoint transition. For a PPHOE to achieve pupil steering, the multiplexed PPHOE needs to record different focal points with different incident angles. If each hologram has no angular crosstalk, then with an additional device to change the light incident angle, the viewpoint can be steered. Alternatively, Xiong et al.173 demonstrated a pupil steering system with LCHOEs in a simpler configuration (Fig. 10g). The polarization-sensitive nature of LCHOE enables the controlling of which LCHOE to function with a polarization converter (PC). When the PC is off, the incident RCP light is focused by the right-handed LCHOE. When the PC is turned on, the RCP light is firstly converted to LCP light and passes through the right-handed LCHOE. Then it is focused by the left-handed LCHOE into another viewpoint. To add more viewpoints requires stacking more pairs of PC and LCHOE, which can be achieved in a compact manner with thin glass substrates. In addition, to realize pupil duplication only requires the stacking of multiple low-efficiency LCHOEs. For both PPHOEs and LCHOEs, because the hologram for each viewpoint is recorded independently, the aberrations can be eliminated.
Regarding the system performance, in theory the FoV is not limited and can reach a large value, such as 80 in horizontal direction143. The definition of eyebox is different from traditional imaging systems. For a single viewpoint, it has the same size as the eye pupil diameter. But due to the viewpoint steering/duplication capability, the total system eyebox can be expanded accordingly. The combiner efficiency for pupil steering systems can reach 47,000 nit/lm for a FoV of 80 by 80 and pupil diameter of 4mm (Eq. S2). At such a high brightness level, eye safety could be a concern174. For a pupil duplication system, the combiner efficiency is decreased by the number of viewpoints. With a 4-by-4 viewpoint array, it can still reach 3000 nit/lm. Despite the potential gain of pupil duplication/steering, when considering the rotation of eyeball, the situation becomes much more complicated175. A perfect pupil steering system requires a 5D steering, which proposes a challenge for practical implementation.
Pin-light systems
Recently, another type of display in close relation with Maxwellian view called pin-light display148,176 has been proposed. The general working principle of pin-light display is illustrated in Fig. 11a. Each pin-light source is a Maxwellian view with a large DoF. When the eye pupil is no longer placed near the source point as in Maxwellian view, each image source can only form an elemental view with a small FoV on retina. However, if the image source array is arranged in a proper form, the elemental views can be integrated together to form a large FoV. According to the specific optical architectures, pin-light display can take different forms of implementation. In the initial feasibility demonstration, Maimone et al.176 used a side-lit waveguide plate as the point light source (Fig. 11b). The light inside the waveguide plate is extracted by the etched divots, forming a pin-light source array. A transmissive SLM (LCD) is placed behind the waveguide plate to modulate the light intensity and form the image. The display has an impressive FoV of 110 thanks to the large scattering angle range. However, the direct placement of LCD before the eye brings issues of insufficient resolution density and diffraction of background light.
To avoid these issues, architectures using pin-mirrors177,178,179 are proposed. In these systems, the final combiner is an array of tiny mirrors178,179 or gratings177, in contrast to their counterparts using large-area combiners. An exemplary system with birdbath design is depicted in Fig. 11c. In this case, the pin-mirrors replace the original beam-splitter in the birdbath and can thus shrink the system volume, while at the same time providing large DoF pin-light images. Nonetheless, such a system may still face the etendue conservation issue. Meanwhile, the size of pin-mirror cannot be too small in order to prevent degradation of resolution density due to diffraction. Therefore, its influence on the see-through background should also be considered in the system design.
To overcome the etendue conservation and improve see-through quality, Xiong et al.180 proposed another type of pin-light system exploiting the etendue expansion property of waveguide, which is also referred as scanning waveguide display (SWD). As illustrated in Fig. 11d, the system uses an LBS as the image source. The collimated scanned laser rays are trapped in the waveguide and encounter an array of off-axis lenses. Upon each encounter, the lens out-couples the laser rays and forms a pin-light source. SWD has the merits of good see-through quality and large etendue. A large FoV of 100 was demonstrated with the help of an ultra-low f/# lens array based on LCHOE. However, some issues like insufficient image resolution density and image non-uniformity remain to be overcome. To further improve the system may require optimization of Gaussian beam profile and additional EPE module180.
Overall, pin-light systems inherit the large DoF from Maxwellian view. With adequate number of pin-light sources, the FoV and eyebox can be expanded accordingly. Nonetheless, despite different forms of implementation, a common issue of pin-light system is the image uniformity. The overlapped region of elemental views has a higher light intensity than the non-overlapped region, which becomes even more complicated considering the dynamic change of pupil size. In theory, the displayed image can be pre-processed to compensate for the optical non-uniformity. But that would require knowledge of precise pupil location (and possibly size) and therefore an accurate eye-tracking module176. Regarding the system performance, pin-mirror systems modified from other free-space systems generally shares similar FoV and eyebox with original systems. The combiner efficiency may be lower due to the small size of pin-mirrors. SWD, on the other hand, shares the large FoV and DoF with Maxwellian view, and large eyebox with waveguide combiners. The combiner efficiency may also be lower due to the EPE process.
Waveguide combiner
Besides free-space combiners, another common architecture in AR displays is waveguide combiner. The term waveguide indicates the light is trapped in a substrate by the TIR process. One distinctive feature of a waveguide combiner is the EPE process that effectively enlarges the system etendue. In the EPE process, a portion of the trapped light is repeatedly coupled out of the waveguide in each TIR. The effective eyebox is therefore enlarged. According to the features of couplers, we divide the waveguide combiners into two types: diffractive and achromatic, as described in the followings.
Diffractive waveguides
As the name implies, diffractive-type waveguides use diffractive elements as couplers. The in-coupler is usually a diffractive grating and the out-coupler in most cases is also a grating with the same period as the in-coupler, but it can also be an off-axis lens with a small curvature to generate image with finite depth. Three major diffractive couplers have been developed: SRGs, photopolymer gratings (PPGs), and liquid crystal gratings (grating-type LCHOE; also known as polarization volume gratings (PVGs)). Some general protocols for coupler design are that the in-coupler should have a relatively high efficiency and the out-coupler should have a uniform light output. A uniform light output usually requires a low-efficiency coupler, with extra degrees of freedom for local modulation of coupling efficiency. Both in-coupler and out-coupler should have an adequate angular bandwidth to accommodate a reasonable FoV. In addition, the out-coupler should also be optimized to avoid undesired diffractions, including the outward diffraction of TIR light and diffraction of environment light into users eyes, which are referred as light leakage and rainbow. Suppression of these unwanted diffractions should also be considered in the optimization process of waveguide design, along with performance parameters like efficiency and uniformity.
The basic working principles of diffractive waveguide-based AR systems are illustrated in Fig. 12. For the SRG-based waveguides6,8 (Fig. 12a), the in-coupler can be a transmissive-type or a reflective-type181,182. The grating geometry can be optimized for coupling efficiency with a large degree of freedom183. For the out-coupler, a reflective SRG with a large slant angle to suppress the transmission orders is preferred184. In addition, a uniform light output usually requires a gradient efficiency distribution in order to compensate for the decreased light intensity in the out-coupling process. This can be achieved by varying the local grating configurations like height and duty cycle6. For the PPG-based waveguides185 (Fig. 12b), the small angular bandwidth of a high-efficiency transmissive PPG prohibits its use as in-coupler. Therefore, both in-coupler and out-coupler are usually reflective types. The gradient efficiency can be achieved by space-variant exposure to control the local index modulation186 or local Bragg slant angle variation through freeform exposure19. Due to the relatively small angular bandwidth of PPG, to achieve a decent FoV usually requires stacking two187 or three188 PPGs together for a single color. The PVG-based waveguides189 (Fig. 12c) also prefer reflective PVGs as in-couplers because the transmissive PVGs are much more difficult to fabricate due to the LC alignment issue. In addition, the angular bandwidth of transmissive PVGs in Bragg regime is also not large enough to support a decent FoV29. For the out-coupler, the angular bandwidth of a single reflective PVG can usually support a reasonable FoV. To obtain a uniform light output, a polarization management layer190 consisting of a LC layer with spatially variant orientations can be utilized. It offers an additional degree of freedom to control the polarization state of the TIR light. The diffraction efficiency can therefore be locally controlled due to the strong polarization sensitivity of PVG.
The above discussion describes the basic working principle of 1D EPE. Nonetheless, for the 1D EPE to produce a large eyebox, the exit pupil in the unexpanded direction of the original image should be large. This proposes design challenges in light engines. Therefore, a 2D EPE is favored for practical applications. To extend EPE in two dimensions, two consecutive 1D EPEs can be used191, as depicted in Fig. 13a. The first 1D EPE occurs in the turning grating, where the light is duplicated in y direction and then turned into x direction. Then the light rays encounter the out-coupler and are expanded in x direction. To better understand the 2D EPE process, the k-vector diagram (Fig. 13b) can be used. For the light propagating in air with wavenumber k0, its possible k-values in x and y directions (kx and ky) fall within the circle with radius k0. When the light is trapped into TIR, kx and ky are outside the circle with radius k0 and inside the circle with radius nk0, where n is the refractive index of the substrate. kx and ky stay unchanged in the TIR process and are only changed in each diffraction process. The central red box in Fig. 13b indicates the possible k values within the system FoV. After the in-coupler, the k values are added by the grating k-vector, shifting the k values into TIR region. The turning grating then applies another k-vector and shifts the k values to near x-axis. Finally, the k values are shifted by the out-coupler and return to the free propagation region in air. One observation is that the size of red box is mostly limited by the width of TIR band. To accommodate a larger FoV, the outer boundary of TIR band needs to be expanded, which amounts to increasing waveguide refractive index. Another important fact is that when kx and ky are near the outer boundary, the uniformity of output light becomes worse. This is because the light propagation angle is near 90 in the waveguide. The spatial distance between two consecutive TIRs becomes so large that the out-coupled beams are spatially separated to an unacceptable degree. The range of possible k values for practical applications is therefore further shrunk due to this fact.
Aside from two consecutive 1D EPEs, the 2D EPE can also be directly implemented with a 2D grating192. An example using a hexagonal grating is depicted in Fig. 13c. The hexagonal grating can provide k-vectors in six directions. In the k-diagram (Fig. 13d), after the in-coupling, the k values are distributed into six regions due to multiple diffractions. The out-coupling occurs simultaneously with pupil expansion. Besides a concise out-coupler configuration, the 2D EPE scheme offers more degrees of design freedom than two 1D EPEs because the local grating parameters can be adjusted in a 2D manner. The higher design freedom has the potential to reach a better output light uniformity, but at the cost of a higher computation demand for optimization. Furthermore, the unslanted grating geometry usually leads to a large light leakage and possibly low efficiency. Adding slant to the geometry helps alleviate the issue, but the associated fabrication may be more challenging.
Finally, we discuss the generation of full-color images. One important issue to clarify is that although diffractive gratings are used here, the final image generally has no color dispersion even if we use a broadband light source like LED. This can be easily understood in the 1D EPE scheme. The in-coupler and out-coupler have opposite k-vectors, which cancels the color dispersion for each other. In the 2D EPE schemes, the k-vectors always form a closed loop from in-coupled light to out-coupled light, thus, the color dispersion also vanishes likewise. The issue of using a single waveguide for full-color images actually exists in the consideration of FoV and light uniformity. The breakup of propagation angles for different colors results in varied out-coupling situations for each color. To be more specific, if the red and the blue channels use the same in-coupler, the propagating angle for the red light is larger than that of the blue light. The red light in peripheral FoV is therefore easier to face the mentioned large-angle non-uniformity issue. To acquire a decent FoV and light uniformity, usually two or three layers of waveguides with different grating pitches are adopted.
Regarding the system performance, the eyebox is generally large enough (~10mm) to accommodate different users IPD and alignment shift during operation. A parameter of significant concern for a waveguide combiner is its FoV. From the k-vector analysis, we can conclude the theoretical upper limit is determined by the waveguide refractive index. But the light/color uniformity also influences the effective FoV, over which the degradation of image quality becomes unacceptable. Current diffractive waveguide combiners generally achieve a FoV of about 50. To further increase FoV, a straightforward method is to use a higher refractive index waveguide. Another is to tile FoV through direct stacking of multiple waveguides or using polarization-sensitive couplers79,193. As to the optical efficiency, a typical value for the diffractive waveguide combiner is around 50200 nit/lm6,189. In addition, waveguide combiners adopting grating out-couplers generate an image with fixed depth at infinity. This leads to the VAC issue. To tackle VAC in waveguide architectures, the most practical way is to generate multiple depths and use the varifocal or multifocal driving scheme, similar to those mentioned in the VR systems. But to add more depths usually means to stack multiple layers of waveguides together194. Considering the additional waveguide layers for RGB colors, the final waveguide thickness would undoubtedly increase.
Other parameters special to waveguide includes light leakage, see-through ghost, and rainbow. Light leakage refers to out-coupled light that goes outwards to the environment, as depicted in Fig. 14a. Aside from decreased efficiency, the leakage also brings drawback of unnatural bright-eye appearance of the user and privacy issue. Optimization of the grating structure like geometry of SRG may reduce the leakage. See-through ghost is formed by consecutive in-coupling and out-couplings caused by the out-coupler grating, as sketched in Fig. 14b, After the process, a real object with finite depth may produce a ghost image with shift in both FoV and depth. Generally, an out-coupler with higher efficiency suffers more see-through ghost. Rainbow is caused by the diffraction of environment light into users eye, as sketched in Fig. 14c. The color dispersion in this case will occur because there is no cancellation of k-vector. Using the k-diagram, we can obtain a deeper insight into the formation of rainbow. Here, we take the EPE structure in Fig. 13a as an example. As depicted in Fig. 14d, after diffractions by the turning grating and the out-coupler grating, the k values are distributed in two circles that shift from the origin by the grating k-vectors. Some diffracted light can enter the see-through FoV and form rainbow. To reduce rainbow, a straightforward way is to use a higher index substrate. With a higher refractive index, the outer boundary of k diagram is expanded, which can accommodate larger grating k-vectors. The enlarged k-vectors would therefore push these two circles outwards, leading to a decreased overlapping region with the see-through FoV. Alternatively, an optimized grating structure would also help reduce the rainbow effect by suppressing the unwanted diffraction.
Achromatic waveguide
Achromatic waveguide combiners use achromatic elements as couplers. It has the advantage of realizing full-color image with a single waveguide. A typical example of achromatic element is a mirror. The waveguide with partial mirrors as out-coupler is often referred as geometric waveguide6,195, as depicted in Fig. 15a. The in-coupler in this case is usually a prism to avoid unnecessary color dispersion if using diffractive elements otherwise. The mirrors couple out TIR light consecutively to produce a large eyebox, similarly in a diffractive waveguide. Thanks to the excellent optical property of mirrors, the geometric waveguide usually exhibits a superior image regarding MTF and color uniformity to its diffractive counterparts. Still, the spatially discontinuous configuration of mirrors also results in gaps in eyebox, which may be alleviated by using a dual-layer structure196. Wang et al. designed a geometric waveguide display with five partial mirrors (Fig. 15b). It exhibits a remarkable FoV of 50 by 30 (Fig. 15c) and an exit pupil of 4mm with a 1D EPE. To achieve 2D EPE, similar architectures in Fig. 13a can be used by integrating a turning mirror array as the first 1D EPE module197. Unfortunately, the k-vector diagrams in Fig. 13b, d cannot be used here because the k values in x-y plane no longer conserve in the in-coupling and out-coupling processes. But some general conclusions remain valid, like a higher refractive index leading to a larger FoV and gradient out-coupling efficiency improving light uniformity.
The fabrication process of geometric waveguide involves coating mirrors on cut-apart pieces and integrating them back together, which may result in a high cost, especially for the 2D EPE architecture. Another way to implement an achromatic coupler is to use multiplexed PPHOE198,199 to mimic the behavior of a tilted mirror (Fig. 16a). To understand the working principle, we can use the diagram in Fig. 16b. The law of reflection states the angle of reflection equals to the angle of incidence. If we translate this behavior to k-vector language, it means the mirror can apply any length of k-vector along its surface normal direction. The k-vector length of the reflected light is always equal to that of the incident light. This puts a condition that the k-vector triangle is isosceles. With a simple geometric deduction, it can be easily observed this leads to the law of reflection. The behavior of a general grating, however, is very different. For simplicity we only consider the main diffraction order. The grating can only apply a k-vector with fixed kx due to the basic diffraction law. For the light with a different incident angle, it needs to apply different kz to produce a diffracted light with equal k-vector length as the incident light. For a grating with a broad angular bandwidth like SRG, the range of kz is wide, forming a lengthy vertical line in Fig. 16b. For a PPG with a narrow angular bandwidth, the line is short and resembles a dot. If multiple of these tiny dots are distributed along the oblique line corresponding to a mirror, then the final multiplexed PPGs can imitate the behavior of a tilted mirror. Such a PPHOE is sometimes referred as a skew-mirror198. In theory, to better imitate the mirror, a lot of multiplexed PPGs is preferred, while each PPG has a small index modulation n. But this proposes a bigger challenge in device fabrication. Recently, Utsugi et al. demonstrated an impressive skew-mirror waveguide based on 54 multiplexed PPGs (Fig. 16c, d). The display exhibits an effective FoV of 35 by 36. In the peripheral FoV, there still exists some non-uniformity (Fig. 16e) due to the out-coupling gap, which is an inherent feature of the flat-type out-couplers.
Finally, it is worth mentioning that metasurfaces are also promising to deliver achromatic gratings200,201 for waveguide couplers ascribed to their versatile wavefront shaping capability. The mechanism of the achromatic gratings is similar to that of the achromatic lenses as previously discussed. However, the current development of achromatic metagratings is still in its infancy. Much effort is needed to improve the optical efficiency for in-coupling, control the higher diffraction orders for eliminating ghost images, and enable a large size design for EPE.
Generally, achromatic waveguide combiners exhibit a comparable FoV and eyebox with diffractive combiners, but with a higher efficiency. For a partial-mirror combiner, its combiner efficiency is around 650 nit/lm197 (2D EPE). For a skew-mirror combiner, although the efficiency of multiplexed PPHOE is relatively low (~1.5%)199, the final combiner efficiency of the 1D EPE system is still high (>3000 nit/lm) due to multiple out-couplings.
Table 2 summarizes the performance of different AR combiners. When combing the luminous efficacy in Table 1 and the combiner efficiency in Table 2, we can have a comprehensive estimate of the total luminance efficiency (nit/W) for different types of systems. Generally, Maxwellian-type combiners with pupil steering have the highest luminance efficiency when partnered with laser-based light engines like laser-backlit LCoS/DMD or MEM-LBS. Geometric optical combiners have well-balanced image performances, but to further shrink the system size remains a challenge. Diffractive waveguides have a relatively low combiner efficiency, which can be remedied by an efficient light engine like MEMS-LBS. Further development of coupler and EPE scheme would also improve the system efficiency and FoV. Achromatic waveguides have a decent combiner efficiency. The single-layer design also enables a smaller form factor. With advances in fabrication process, it may become a strong contender to presently widely used diffractive waveguides.