Management of the transmission and storage of digital images is critical to the adoption of IP-Surveillance. Images digitised to standards such as CIR 601, can consume as much as 165 Mbps (165 million bits every second). In most networked surveillance systems such bandwidth is not available.
For example, a raw or uncompressed image from a network camera or video server with a resolution of 640x480 is over 600 kilobytes (2-bytes per pixel). At 25 frames per second this represents more than 15 Mbytes per second.
High quality digital images contain a large amount of redundant information that can be eliminated without a great loss in perceived picture quality. The technique of reducing image size (and bit rate) is generally referred to as compression. The ability of compression algorithms to reduce image size is quantified by the compression ratio. The higher the compression ratio, the smaller is the resultant image. However, increasing compression causes increasing degradation of the image.
Image Data Reduction
The techniques for compressing video images have their origin in the storage of still images on computers. The following criteria are commonly employed to reduce the size of image data:
Reduce the colour resolution with respect to the prevailing light intensity
Remove small, invisible parts, of the picture
In the case of a video sequence, make use of redundancy between adjacent frames
All of these techniques are based on an acute understanding of how the human brain and eyes work together to form a complex visual system. A significant reduction in the file size for an image sequence is achievable with little adverse effect in the visual quality. The perceived degradation in image quality is typically dependent upon the degree to which the chosen compression technique is used. In general, each compression method is effective up to a certain point, or “Knee", beyond which image quality degrades quickly.
Two basic standards JPEG and MPEG
The two basic compression standards are JPEG and MPEG. In broad terms, JPEG is associated with still digital pictures, whilst MPEG is dedicated to digital video sequences. The JPEG and more resent JPEG 2000 image formats can be further subdivided into Motion JPEG and Motion JPEG 2000 formats which are also appropriate for digital video.
The group of MPEG standards that include the MPEG-1, MPEG-2 and MPEG-4 formats, share some similarities as well as some notable differences.
JPEG stands for Joint Photographic Experts Group – the committee responsible for developing this standard and its successor the JPEG 2000 standard. JPEG is the single most widespread compression format in use today. It was designed, as the name implies, to handle the compression of single still images and treats video output as captured still images. Using JPEG compression the knee occurs at approximately 8:1 compression. It offers very high compression ratio but low picture quality or lower compression ratios with good picture quality. If ‘artefacts’ appear in the image (sometimes called ‘blockiness’), this is an indication that the compression ratio is too high as in the example given below right:
Over Compressed JPEG Image
JPEG 2000 uses a different compression algorithm that essentially eliminates the ‘blockiness’ of JPEG images at higher compression levels and replaces it with an overall fuzziness that many consider is less disturbing to the eye.
Compressed JPEG 2000 Image
The compression ratio for JPEG 2000 is higher than for JPEG and the difference increases with a higher compression ratio. For moderate compression ratios, JPEG 2000 produces pictures typically about 25% the size of JPEG at equal picture quality. The price to pay is a far more complex compression technique.
Motion JPEG or M-JPEG offers a higher compression ratio and is specifically designed for moving images or video. Essentially M-JPEG sees digital video output as a series of JPEG pictures. The advantages of this standard are the same as JPEG as the compression techniques are the same. Both standards are relatively straightforward in terms of the technology they use to compress data and as such cost-effective encoding and decoding engines can be built using them. MJPEG 2000 is also available. The main disadvantage of Motion JPEG is that since it uses only a series of still images it is unable to make us of “video streaming” techniques. Using M-JPEG compression the knee occurs at approximately 15:1 compression.
MPEG stands for Motion Picture Experts Group. This committee was formed in the late 1980s in order to create a standard for coding of moving pictures and audio. It has since produced MPEG-1, MPEG-2 and MPEG-4. The next generation of standards will include MPEG-7 and MPEG-21.
MPEG compression is based on the principle that temporal and spatial redundancy in motion pictures makes up the majority of the visual information that humans perceive. MPEG makes use of the redundancy between adjacent frames. By comparing changes from frame to frame and removing as much of the redundant (“same”) information as possible a high degree of compression is achieved.
MPEG contains three types of encoded frames. Intracoded frames (I-frames) are single compressed frames that contain all the spatial information required to make a complete image. Predictive frames (P-frames) are computed on the nearest I or P frame. P-frames are more highly compressed than I-frames and provide a reference for the calculation of B-frames. Bi-directional Predictive frames (B-frames) are generated using both previous and future frames to calculate the compressed frame data. A complete sequence of frames is made up of a sequence of the three different frame types. This process is known as inter-frame correlation and allows compression ratios of 100:1 to be achieved. A standard sequence of (I-B-B-P-B-B-P-B-B-P-B-B-P-B-B-I) would have a “Group of Picture” of 15, with 15 representing the interval at which I-Frames repeat.
MPEG-1 was the first public standard and was released in 1993. MPEG-1 took video compression techniques developed for the JPEG standard and added more techniques for efficient coding of video sequences. With MPEG compressed video, only the new parts of a video sequence are included while the bits of the image that are unchanged are simply reused resulting in the need for less data to be compressed. MPEG-1 is the standard used to store digital video onto CDs for example. The focus is on compression ratio rather than picture quality. Many consider it to create VCR-quality output despite being digital. Normally when MPEG-1 is used it gives a performance of 352x240 pixel, 30 fps (NTSC) or 352x288, 25 fps (PAL).
MPEG-2 was approved in 1994 as a standard and was designed for high quality digital video (DVD), digital high-definition TV (HDTV), interactive storage media (ISM), digital broadcast video (DBV), and cable TV (CATV). The MPEG-2 project focused on extending the MPEG-1 compression technique to cover larger pictures and higher quality at the expense of a lower compression ratio and higher bit-rate. MPEG-2 also provides additional tools to enhance the video quality at the same bit-rate; thus producing very high image quality when compared to other compression technologies. The frame rate is locked at 30 (NTSC)/25 (PAL) fps, just as in MPEG-1but performance is normally 720x480 (NTSC) or 720x576 (PAL) or better. Only modern computers can decode MPEG-2 as it requires a lot of computing power.
MPEG-4 supports even lower bandwidth consuming applications such as mobile phones and PDAs but also caters for high quality images and almost unlimited bandwidth applications. Films are now being compressed using MPEG-4. Video mobile phones also use this standard. Both MPEG-2 and -4 cover a range of picture sizes, picture rates, and bandwidth usages.
Despite the high compression ratios achievable with MPEG compression there are a number of disadvantages. Firstly, in order for MPEG to achieve high compression the video signal must not change abruptly from frame to frame. Camera multiplexing is not therefore possible as the rapid change from frame to frame as cameras are switched defeats the inter-frame correlation technique used in MPEG compression. Secondly, MPEG compression requires much more electronics than JPEG making the application of the technique much more expensive.
* Note: CCIR 601 – A standard for digital video with an image size of 720 × 485 at 60 interlaced images per second or 720 × 576 at 50 interlaced images per second.
H.264 or MPEG-4 Part 10/AVC
H.264, also known as MPEG-4 Part 10/AVC for Advanced Video Coding, is the latest MPEG standard for video encoding. H.264 is expected to become the video standard of choice in the coming years. This is because an H.264 encoder can, without compromising image quality, reduce the size of a digital video file by more than 80% compared with the Motion JPEG format and as much as 50% more than with the MPEG-4 standard. This means that much less network bandwidth and storage space are required for a video file. Or seen another way, much higher video quality can be achieved for a given bit rate.
H.264 was jointly defined by standardization organizations in the telecommunications (ITU-T’s Video Coding Experts Group) and IT industries (ISO/IEC Moving Picture Experts Group), and is expected to be more widely adopted than previous standards. In the video surveillance industry, H.264 will most likely find the quickest traction in applications where there are demands for high frame rates and high resolution, such as in the surveillance of highways, airports and casinos, where the use of 30/25 (NTSC/PAL) frames per second is the norm. This is where the economies of reduced bandwidth and storage needs will deliver the biggest savings.
H.264 is also expected to accelerate the adoption of megapixel cameras since the highly efficient compression technology can reduce the large file sizes and bit rates generated without compromising image quality. There are tradeoffs, however. While H.264 provides savings in network bandwidth and storage costs, it will require higher performance network cameras and monitoring stations.
Axis’ H.264 encoders use the baseline profile, which means that only I- and P-frames are used. This profile is ideal for network cameras and video encoders since low latency is achieved because B-frames are not used. Low latency is essential in video surveillance applications where live monitoring takes place, especially when PTZ cameras or PTZ dome cameras are used.