Color Management

From Qt Wiki
Jump to navigation Jump to search

This document contains personal draft notes on color management topics, and can not be considered a reference on the topic.

The CIE 1931 XYZ color space

CIE 1931 chromaticity diagram

The CIE XYZ 1931 color space lays the foundation for understanding color management concepts such as color spaces, gamut, white point, and transfer functions. Specifically, the CIE color space defines a generic color coordinate system that is used do define other color spaces. This coordinate system consists of three axes, with coordinates x, y, and z. Usually, the z axis is projected onto the x-y plane and looks like the characteristic shark fin as illustrated in figure 'CIE 1931 chromaticity diagram'. Around its locus we find the colors that corresponds to an ideal monochromatic light source when it is swept from 400 nm to 700 nm.

The point of the CIE 1931 XYZ color space is that it gives every visible color a well-defined (x, y) coordinate. This is important when we derive other color spaces, such as the sRGB color space, because concepts such as primary colors, gamut, and white point are defined based upon the corresponding (x, y) coordinate in the CIE coordinate system. Having a common reference becomes particularly important when we need to convert between different color spaces.

Derivation of the CIE XYZ coordinate system [1]

CIE 1931 matching functions

The idea behind the CIE 1931 XYZ color space is that to be able to reproduce colors, a camera needs 3 different sensors, each responding to different wavelengths of light. Since light exists across a spectrum of frequencies, such sensors need to respond to a wide range of wavelengths, although its sensitivity is highest at a specific frequency, and falls off for higher and lower frequencies. The CIE 1931 matching functions X(λ), Y(λ) and Z(λ) (see picture, downloadable as CSV from [2]) were established based on the responses of several observers to monochromatic light stimuli across the visible spectrum. These responses were then averaged and normalized to create the standard observer functions, which represent the eye's sensitivity to different wavelengths of light. The resulting standard observer functions are known as the CIE 1931 color matching functions, which describe how the human eye perceives color under standard viewing conditions, and these matching functions defines the CIE color space. We find the response of a single sensor as an integral over the spectrum of the light source and its corresponding matching function. For example, if a source emits light in a spectrum between 560 and 660 nm, a sensor with the characteristics of the X(λ) or Y(λ) matching functions will get a response, whereas the sensor following the Z(λ) matching function will have almost no response.

Given an input spectrum of light (colors), we calculate the CIE tristimulus values X, Y, and Z by accumulating its emission power at each discrete frequency, weighted by the corresponding CIE 1931 matching function X(λ), Y(λ) and Z(λ). In computer graphics, we are not aiming at reproducing the full scene brightness, and it is beneficial to remove the absolute power of the output values by normalizing them against the total power from all the sensors:

x = X / (X + Y + Z), y = Y / (X + Y + Z), and z = Z / (X + Y + Z)

The x, y, and z values are known as CIE chromaticity, and the x and y coordinates corresponds to the axes of the CIE 1931 chromaticity diagram. In practice z is redundant because x + y + z = 1. Also note that the x, y, and z chromaticity coordinates are abstract values, and have no direct physical interpretation[1]. The shark fin shaped chromaticity diagram can be plotted from the CIE 1931 color matching functions by first calculating x and y at a wavelength, and plotting y as a function of x.

It is worth noting that the locus of the CIE 1931 chromaticity diagram represents the coordinates of pure monochromatic light at a certain wavelength. This is an ideal light source that is impossible to build[3], and in practice any light source emits a spectrum of wavelengths. Consider for example backlit LCD screens. The back light is wide spectrum light, and color filters are placed between the light source and the observer to filter out the desired color. If we were able to construct a monochromatic filter, almost no light would be let through. This means that most light sources, including individual pixel element on a screen will emit a mixture of different frequencies. The corresponding color will therefore not be a pure single-frequency color, and will be found somewhere inside the locus of the shark fin. Although LED screens uses LEDs that can have a narrower bandwidth than filters of an LCD screen, they are not purely monochromatic, and can not reproduce every possible color[4].

Luminance

Among the three CIE tristimulus values, Y is interesting because it represents the physical definition of Luminance and has unit cd·m-2 (nits). It is a linear light quantity that is loosely coupled with what we think of as brightness or lightness. Mid-range green values will contribute more to the response than red and blue, but all visible colors contribute. For example, if we have three light sources, red, green, and blue, and they each emit the same power, the green light source will appear the brightest, while the blue light source will appear the darkest[1]. This means that formulas that calculates luminance from R, G, and B values, will put more weight on green colors than the other primaries, as shown in the Rec. ITU-R BT.709[5] which standardizes some classes of displays:

709Y = 0.2126 R + 0.7152 G + 0.0722 B

Here, luminance gets 21% of its power from red, 72% of its power from green, and 7% of its power from blue[1]. Note that luminance is typically not used in video processing, because we rarely intend to reproduce the absolute luminance of the actual scene[1]. Instead, the unit of X, Y, and Z is arbitrarily chosen so that Y = 1 or Y = 100 is the brightest white that a color display supports. In this case, Y is a relative luminance [6]. Relative luminance is a unitless quantity, and is proportional to the scene luminance up to the maximum luminance of the screen/device. Luminance must not be confused with Luma (Y'), which is calculated from gamma corrected R', G', and B' values, and is not linear.

Luma and Chroma [1]

Instead of transmitting or storing video images as RGB images, video systems often use a different representation of pixel values using luma and chroma. Luma represents lightness, and chroma represents the color, disregarding lightness. This representation is used because human vision is less sensitive to variations in color than lightness. We can therefore reduce the data size by sub-sampling chroma samples without visible loss of image quality, as long as the luma resolution is kept unchanged.

The idea of using non-linear luma instead of relative luminance comes from the sensitivity characteristics of human vision. The eye is less sensitive to variations/noise in bright colors than in dark colors. If we used 8 bit numbers to represent luminance, most of the dynamic range, or bits, would be spent on describing bright colors that we can't clearly distinguish. Therefore, relative luminance is transformed using a transfer function, typically on the form of a gamma curve with gamma ~0.4, before it is translated into 8 bit values. A perfect transfer function would make the luma values perceptually uniform, where a change in luma value from level 10 to level 11 is giving a similar change in perceived lightness as a change from level 210 to level 211.

Note that in practice, we do not gamma correct the luminance signal, but instead apply gamma correction on the linear R, G, and B values before calculating luma as a weighted sum of the resulting nonlinear R', G', and B' values.

For old-style SD TV, the coefficients for calculating luma is standardized in BT.601 as:

601Y' = 0.299 R' + 0.587 G' + 0.114 B'

In HD video, the luma is calculated using a different set of coefficients as specified in BT.709:

709Y' = 0.2126 R' + 0.7152G' + 0.0.0722 B'

Generally, luma is calculated as a weighted sum of the gamma corrected R', G', and B' values using the weights KR, KG, and KB:

Y' = KR * R' + KG * G' + KB * B'

Usually, the weights are chosen such that Y' is in the range [0, 1]. Luma represents the lightness of a color, or how light we perceive a color if we were to compare it against a white surface. In addition to a color's luma, we also need to represent its color. We obtain luma independent colors through the color differences B' - Y' and R' - Y' which are the color's chroma. Instead of calculating the color differences directly, we normalize the values to get a convenient range of [-0.5, 0.5]:

PB = 0.5 * (B' - Y') / (1 - KB) PR = 0.5 * (R' - Y') / (1 - KR)

Ideally, Y' now only contains information about a color's lightness, while the chroma values PB and PR only contains information about the color, without carrying any lightness information.

Quantization and scaling

Each component of a (possibly subsampled) Y'CBCR is often represented by 8 bit numbers in consumer applications and gives a reasonable compromise between image quality and data sizes. In digital video, we either use the full range values, where each Y'CBCR component is represented by values in the range [0, 255], or as video range values, where the Y' component is scaled and offset to the range 16-235 and gives an effective range of 219 possible values. PB and PR chroma values are scaled and offset to the range 16-240, giving a range of 224 values. Assuming that the Y' signal is in the range [0, 1], the quantized 219Y' value is given as:

219Y' = Y' *219 + 16 or 255Y'=Y' * 255

Similarly, PBPR values in the range [-0.5, 0.5] can be quantized into CBCR values as:

CB = PB * 224 + 128 and CR = PR * 224 + 128 or in the full range case: CB = PB * 255 + 128 and CR = PR * 255

Chroma subsampling

After calculating the chroma values, we have three parameters for each pixel Y'CBCR. In a video frame, the luma and chroma values can either be packed Y'CBCR values, similar as pixel values are stacked in RGB images, but it is also common to represent a single image frame using two or three image planes, one with luma, and either one plane with packed CB and CR samples or two planes with separated CB and CR values. Keeping luma samples separate from chroma samples allow us to down-sample the chroma images horizontally, or both horizontally and vertically.

TBD

The sRGB color space as defined within a CIE XYZ color space

Color spaces

Now that we have a reference color space, we can start defining other color spaces which are subsets of the CIE XYZ color space. The Rec. 709 defines a standard for HDTV screens, including their color space.

This standard defines the white point to be at x = 0.31271 and y = 0.32902, which is known as D65. This means that a color at this coordinate is considered the 'reference' white on HDTV screens, and is engineered such that equal amounts of R, G, and B primaries will appear white for a reference observer in a reference environment[7].

In addition, the standard defines the color primaries, or primaries for short. The primary blue color is at x = 0.15 and y = 0.06, red is at x = 0.64 and y = 0.33, while green is at x = 0.30 and y = 0.60 as illustrated in the CIE chromaticity diagram. The primaries denotes the maximum red, green, or blue that the screen can display. The triangle spanned by the three primaries are called the gamut, and the screen can only display colors within this triangle. Any color within the gamut is created by adding different amounts of the primary colors.

Gamma correction and transfer functions

Gamma compression and expansion at gamma 2.2

Gamma correction or just 'gamma' is a nonlinear operation used to encode and decode luminance or tristimulus values in video or still image systems[1]. Its origin comes from the way CRT screens worked, where the luminance produced at the face of the display is a non linear function of each (R', G', and B') voltage input. By coincidence, this is beneficial because the human vision is more sensitive to differences in dark colors than bright colors. In digital imaging we can utilize gamma compression to make better use of our bits by converting input values into a perceptually uniform space. In a perceptually uniform space, the perceived difference in lightness between RGB values (10, 10, 10) and (20, 20, 20) should be equal to the difference of RGB values (210, 210, 210) and (220, 220, 220) after expansion through the decoding gamma.

Gamma correction can be expressed as:

Vout = AVingamma

If the inputs and outputs are in the range [0...1], A = 1. A gamma < 1 is sometimes called an encoding gamma, and encoding a signal with this gamma is called gamma compression (Vcomp in the picture). Gamma compression was originally introduced on the imaging side to counteract the expansion made on the display side. A gamma > 1 is referred to as decoding gamma, and decoding a signal with a decoding gamma is called gamma expansion[8] (Vexp in the picture).

Note:The input to gamma compression is tristimulus values (linear light) as captured by an imaging device. Such values are denoted R, G, and B. The output of the gamma compression is a gamma-corrected video signal is denoted with a prime symbol, and is written R', G' and B' [1]. If a letter related to color management has the prime symbol, it means that the value is non-linear. For example, if we see the symbol Y', this means that this is the luma derived from a gamma compressed signal, not the luminance of the input tristimulus values. In computer graphics, this is an important distinction. For example, (incorrectly) averaging two compressed pixel values does not yield the same result as (correctly) averaging two uncompressed tristimulus values.

Luma, denoted Y' , is calculated as a weighted sum of gamma corrected R', G', and B' components[1]. Luma is therefore not a linear quantity.

Transfer functions map numerical coordinates to and from a color space. The transfer functions can be linear, or non-linear. Typical examples of non-linear transfer functions are Gamma 2.2 and the non-linear transfer function defined by sRGB [7].

References