A Robust High Capacity Information Hiding Algorithm Based on DCT High Frequency Domain

XIE Jianquan 1, 2, XIE Qing 2,HUANG Dazu 1, 2

(1.School of Information Science and Engineering, Central South University, Changsha 410083, 2.Department of Information Management, Hunan Finance and Economics College, Changsha 410205?

Abstract —Improving hiding capacity and resisting compression ability is an important problem in information hiding application. In this paper, invariance of JPEG compression is analyzed firstly.

Then, according t

o invariance of JPEG compression, an informat ion hiding algorit hm which can embed informat ion in DCT median and high frequency coefficien t s is proposed. Informat ion embedding capacit y adapt ively is det ermined by smoo t h s t a t e of subblocks. Hence hiding capaci t y can be

increased under the condition of satisfying imperceptibility. The

algorithm has strong robustness against lossy compression within defaul t quali t y fac t or. Besides, experimen t s show t ha t t

his informat ion hiding algorit hm has a cert ain degree of ant inoise ability

Keywords- information hiding; hiding capacity; DCT transformation; invariance of JPEG compression

I.I NTRODUCTION Because information hiding algorithm on DCT domain is competitive with compression standard (J PEG, MPEG, H261/263), using DCT coefficients as host signal comes to be one of main choices of image information hiding technology since Cox proposed that spread spectrum information hiding method based on DCT domain[1]. As a result of that human vision system is much more sensitive to signal in low frequency than in high frequency, hiding information in low frequency of DCT has better robustness while hiding in median

and high frequency has better imperceptibility. In standard quantization matrix commended by JPEG, high frequency has bigger quantization value, thus information embedded in high frequency will be easily filtered by J

PEG compression. Besides, because rounding error exists in DCT inverse

transformation, embedded information may be destroyed even

without J PEG compression which has been taken by image

carried information. In addition, experiments show that noise

data may easily break information embedded in high frequency. Therefore many documents tend to choose median frequency coefficients[2, 3] and low frequency coefficients[1] even dc component[4] as host sequence. However, owing to that human vision system is more sensitive to variance of median and low frequency coefficients, capacity and intensity of information embedded in median and low frequency coefficients can’t be too large, this has became the biggest deficiency of algorithm based in DCT transformation domain[5]. What’s more, many information hiding information

algorithms based on DCT domain need original carrier image in testing while capacity of algorithms which don’t need carrier image is much lower than capacity of algorithm based on spatial domain just like LSB. Hence application of DCT transformation algorithm has been confined in some territory

like copy protection but hard to be used in other territory like secret storage and secret communication.

There are many scholars pay attention to increase hiding capacity of information hiding algorithm based on DCT transformation in order to spread its application to wider territory. For example, Chen et al.[6] proposed a hiding algorithm on DCT domain which based on adjustment of quantization table. Yu et al.[7] proposed a hiding algorithm on DCT domain. This algorithm adjusts plus-minus value of DCT coefficients which have smaller absolute value to represent 1 and 0. Liu Guangjie et al.[8] proposed a improved quantization embedding algorithm using steepest decent method to choose control parameter in embedded algorithm. Those algorithms have much higher capacity compared with classical DCT

domain algorithm[1, 9]. Nevertheless those algorithms have an apparent deficiency that they can’t extract all the embedded

information exactly after lossy J PEG compression. While in

practical application, applying lossy compression to image

without vision quality declined process for propose to decline transmission or storage capacity. Therefore how to extract all of hidden information accurately after lossy compression is a problem waiting for solution. Hiding algorithm based on DCT transformation domain, which can resist J PEG compression and embed information in median and high frequency, has been proposed. It uses invariance of JPEG compression[10] of DCT transformation coefficients. The algorithm has good capacity to

resist J PEG compression. Besides, it has higher embedding capacity, and what’s more, the algorithm doesn’t need original image and other assistant parameter in information extraction.. II.I NVARIANCE ATTRIBUTE OF JPEG COMPRESSION J PEG achieves compression through first dividing DCT

coefficients by corresponding coefficients in quantization table

then rounding. Quantization table used in compression is co-decided by standard quantization matrix Q ?shown in table 1?and quality factor q . The function of quality factor is to zoom on a batch of quantization table with a certain ratio (algorithm) to form a new quantization table. For example, JPEG realization, provided by Independent JPEG Group (IJG), utilize integer in [1,100] to act as quality factor: 100 represents the best quality while 1 represent the worst. IJG transform its quality factors then multiply them as coefficients with standard quantization table to form a new quantization table, for propose

of realizing different compression effect. Suppose quality factor is q (1

Hence 0

T ABLE 1S TANDARD QUANTIZATION TABLE OF LIGHTNESS ADVISED BY JPEG

16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99

Suppose F (u ,v ) is a DCT coefficient matrix of some non-overlaped 8×8 subblock of a image X ,Q m is J PEG lossy

compression quantization table corresponding with a

predetermined quality factor. To random u ,v =∈{0,1,…,7}.

Define )

,()),(),((),(^v u Q v u Q v u F round v u F m m ?= and )

,()),(),((),(^

~v u Q v u Q v u F round v u F ?=.If ),(),(v u Q v u Q m ≤, then the equation below is always holds. )

,(),(),(),((^~

v u F v u Q v u Q v u F round m m =? ?2?Certification if formula (2) is as below:

From ),())

,()

,((),(^~v u Q v u Q v u F round v u F ?=, we can get )

,(21

),(),(),(21),(~^~

v u Q v u F v u F v u Q v u F +≤≤? ?3?Namely

)

,(21

),(),(),(21),(^~^

v u Q v u F v u F v u Q v u F +≤≤? ?4?When

)

,(),(v u Q v u Q m ≤, we can get

)

,(21

),(),(),(21),(^~^

v u Q v u F v u F v u Q v u F m m +≤

established.

Formula (2) indicates that: if one predetermined quantization step Q m is used to quantize a DCT coefficients F (u ,v ) and get coefficients matrix ),(^

v u F ; and if quantization matrix Q is used , whose quantization step is smaller than Q m ,to quantize after that, then ),(^

v u F can still be reconstructed accurately. The method of reconstruction is to use the same quantization step Q m to divide compressed coefficients, and then implementing quantization and rounding. Because the higher the quality factor is, the smaller the quantization step is.

Namely if some image has taken JPEG lossy compression by some predetermined quality factor, the image can remain unchanged to every follow JPEG lossy compression which has bigger quality factor; and if quality factor of follow J PEG compression is smaller than the predetermined one, DCT coefficients of original quantization image can’t be reconstructed. III.

M EDIAN AND HIGH COEFFICIENTS INFORMATION HIDING ALGORITHM BASED ON INVARIANCE OF JPEG COMPRESSION According to invariance of J PEG compression, if embed secret information on the basis of quantization with some predetermined quantization step, the secret information can resist JPEG compression with smaller step comparing with the quantization step. In other words, whatever coefficient information is embedded in, the information can be extracted

after J PEG compression. Hence median and high frequency

can be used to embed information. And we can embed more

information on the basis of ensuring a certain degree of robustness. Supposing carrier image is I ={f (x ,y ),x ,y =0,1,…,N -1}, information waiting for hiding is W ={wj ,j =0,1,…,L -1},

algorithm description is as below: (1) Chose a quality factor q . Generally set it as the lowest

quality factor human eye could accept. That’s to say every

image with quality factor lower than that one is unacceptable.

Acceptable quality factor commended by J PEG standard is from 50 to 75. In this algorithm, it can be decided by

compression quality factor according to the algorithm’s sustain. But it should be higher than 50, otherwise perceptive distortion will be caused when too much information is embedded. (2) Get scaling parameter k of quantization table according to chosen quality factor q and formula (1). (3) Multiply scaling parameter k with each term of table 1,

and then get a predetermined quantization table Q m .(4?Partition the image I into 8×8 subblocks.

(5) Apply information embedding to every subblocks. The procedure is describled as below:

step 1: Perform DCT transformation to subblock, and obtain DCT coefficients matrix S ={s (u ,v ),u ,v =0,1,…,7}.

step 2: Arrange DCT coefficients according to zig-zag

order in inverse sequence. Choose t coefficients to be used in information embedding. The bigger t is, the more information can be hidden. Yet imperceptibility will decline. In order to

ensure good imperceptibility, value of t should be decided by

smooth degree of subblock: t of smoother subblock should be

smaller while t of rougher subblock can be bigger. That is, for more embedded information and better imperceptibility, value

of t should adapt to smooth degree of subblock. Because after

DCT transformation, most coefficients of smooth area are

smaller while coefficients of rough area are bigger, smooth situation can be decided through distribution state of coefficient after subblock has been DCT transformed. And value of t can be decided by distribution state of DCT coefficient. Experiments show that it’s suitable to assign t as

half of number of coefficients. The number is nonzero coefficients account of subblock’s DCT coefficients which

have been quantized using table 1.

step 3: Directly modify those high frequency coefficients, which have been chosen to embed information, to embed information w j (embed 1 bit information in one coefficient). The modifying way is described as below:

ˉ?

-===1),(00

),(j m

j w v u Q w v u s (6) One point to emphasize is that those DCT coefficients not

chosen to embed information should not be quantized. This can decrease image degradation caused by quantization.

step 4: Take inverse DCT transformation to DCT coefficient matrix of modified subblock, and then get subblock containing secret information.

(6) Reset subblocks which have embedded information and then get image containing secret information.

The procedure of information extraction is basically like the embedding procedure. Namely that firstly get predetermined quantization table Q m according to quality factor q chosen in information embedding and table (1). Secondly separate image I into subblocks of 8×8. Then extract embedded information from every subblocks. When extracting information from subblocks, we adapt the same method of embedding process to ensure that DCT coefficient contains information. But while calculating the count of high frequency coefficients which embedded information, only median and low frequency coefficients are considered. Then the formula below is used to confirm information embedded in corresponding DCT coefficient.

ˉ?

-≥<=5

.0),(/),(15.0),(/),(0v u Q v u s v u Q v u s w m m j ?7?IV.S IMULATION EXPERIMENTS AND DISCUSSION

According to parameter provided by J PEG when image takes lossy compression, if quality factor is bigger than 75, degradation of the image is imperceptible to human eye; if quality factor is from 50 to 75, it is still acceptable; if quality factor is smaller than 50, it is unacceptable. In information hiding application, quality factor smaller than 50 definitely won’t be used in image compression. Hence we choose quality factor q as 50 to perform experiments. Separately utilize algorithm in this paper to execute fully embedding experiments to Lena and mandrill of 512×512 in fig.1(a) and fig.2(a). Capacity of embedded information is 28642 and 68025 bit respectively. Ratio of embedded bits to pixels is 10.93% and 25.96% respectively. Average capacity of information embedded in each subblock is 6.9952bit and 16.6143 bit respectively. The embedding capacity is much higher than the algorithm’s proposed by document [1-4, 9], and is approximate with document [8]. However, document [8] can’t resist compression. From those data, it can indicate that capacities of images which have different smooth degree are distinctly different. It reflects the characteristic human vision system has, that the system is sensitive to noise in smooth area while insensitive to noise in rough area. Choose random noise to execute fully embedding to fig.1(a) and fig.2(a) then get results as shown in fig.1(b) and fig.2(b). PSNR of original image and

image in which information has embedded separately is 30.6817 and 32.5544. Both of them are bigger than the 30, which is recognized to be the lowest value to satisfy imperceptibility request. Actually, naked eye can’t distinguish original image and image in which information has embedded in fig.1 and fig.2. If using measuring method mentioned in document [8], which utilizes distortion structural similarity to weigh distortion degree after information hiding, the distortion of image embedded information and the distortion of original image in fig.1 and fig.2 are totally the same. It indicates that imperceptibility of PSNR information hiding algorithm has

deficiency.

(a) original image (b) image carried secret

Figure 1. Comparison 1 of original image and image carried secret

After fig.1(b) and fig.2(b) take compression with quality factor 70, 60, 51 respectively, extract hidden information to them. Hidden information can be 100% accurately extracted, which is coherent with theory analyze. But after the two images take compression with quality factor 40, 20, almost all the information embedded are missed. This is accordant with theory analyzed as well. Namely, the algorithm is robust to compression with quality factor higher than predetermined value.

(a) original image (b) image carried secret

Figure 2. Comparison 2 of original image and image carried secret

Use 128×128 binary image shown in fig.3(a) as secret information and embed it into fig.2(a) to execute anti-compression and anti-jamming experiments. PSNR of image embedded information and original image is 36.6231. Perform extraction after compression with quality factor higher than the predetermined one, and the result is shown in fig.3(b). It’s completely coherent with fig.3(a). The ratio of accurately extraction is 100%. Then, perform extraction after compression with quality factor lower than the predetermined one, and result is shown in fig.3(c). Separately add salt and pepper noise and Gaussians noise into image in which secret information embedded, then apply extraction. Extraction results are shown in fig.3(d) and fig.3(e). If noise has been added and compressing with quality factor not lower than the predetermined one, then extract. The result is almost the same. That’s to say the algorithm propose in this paper has a degree of robustness to noise interference.

Figure 3. Rsults of anti-compression and anti-jamming

In order to compare the algorithm proposed in this paper

and algorithm based on median and low frequency coefficients,

modify low frequency coefficients according to formula (6).

The embedding strength is the same as frequency coefficients

in fig.2(a). After embedding fig.3(a), the image carrying

information is shown in fig.4(a). PSNR of fig.4(a) and fig.2(a)

are 24.9564. The imperceptibility request is not satisfied and

even naked eye can perceive blocking effect of the image.

Extracted information is shown in fig.4(b). Although most of

content can be recognized, it’s apparently distinctive with

fig.3(a). Namely that even without any interference, embedded

information can’t be extracted correctly. The reason is that

there are rounding error and interactions among low frequency

coefficients in DCT inverse transformation. To majority of

algorithm based on low frequency coefficients in DCT domain,

this is a widespread problem when hiding large quantity of

information.

(a) image with hidden information (b) extracted information

Figure 4. Hidding effect based on low frequency

V.C ONCLUSIONS

Currently the information hiding algorithm based on DCT

domain is the most widely-used algorithm in transformation

domain. For better robustness, the algorithm generally chooses

DCT median frequency coefficients, low frequency

coefficients even dc components to serve as host sequence.

Because human vision is more sensitive to change of median

and low frequency, capacity and strength of information

embedded in median and low frequency coefficients can’t be

too large. Hence it’s hard to be used in the territories like secret

storage and secret communication. Utilizing invariance of

J PEG compression of DCT transformation coefficients, this

paper has proposed a hiding algorithm which embeds

information into median and high frequency coefficients and

can realize blind-extraction based on DCT domain. As it

embeds information into median and high frequency

coefficients, the embedded information has better

imperceptibility. And it has high capacity of hidden

information under condition that keeps good imperceptibility.

As a result of using invariance of J PEG compression, all the

embedded information can be extracted accurately after taking

lossy compression with quality factor higher than

predetermined one. Therefore the incompatible conflict has

been solved that information hiding technology has to use

redundant space of multi-media information while data

compression technology try to decrease the redundant space.

Besides, the algorithm has some degree of robustness against

noise interference. The deficiency of this algorithm is that after

JPEG compression, the image embedded information will have

more non-zero coefficients in high frequency than nature

images have. Although it won’t arouse variance of human

vision perception, security problem might be generated when

using some steganography analyze tool. The solution is to

appropriately decrease the number of coefficients in high

frequency and to increase the number of coefficients in median

and high frequency or median frequency. Nevertheless under

the same constraint of imperceptibility index, embedding

capacity will decrease while the ability of invariance of JPEG

compression can be preserved.

A CKNOWLEDGMENT

Project supported by the Hunan Provincial Science and

Technology Program (Grant No. 2009FJ3110).

R EFERENCES

[1] Cox I J, Kilian J, Leighton F T, et al. Secure spread spectrum

watermarking for multimedia[J]. IEEE Trans. on Image Processing,

1997, 6(12): 1673~1687.

[2]Kang X.?Huang J.?Zeng W. Improving robustness of Quantization-

Based image watermarking via adaptive receiver[J]. IEEE Transactions

on Multimedia?2008?10(6)?953~959.

[3]Tan L.?Fang Z. J. An Adaptive middle frequency embedded digital

watermark algorithm based on the DCT domain[A]. Proceedings of the

2008 International Conference on Management of e-Commerce and e-

Government[C], Jiangxi,2008, pp 382~385.

[4]Huang J. W., Yun Q. SHI, Cheng W. D.. Image Watermarking in DCT:

an Embedding Strategy and Algorithm [J]. ACTA ELECTROONICA

SINICA,2000,28(4):57~60.

[5]Shih F. Y., Wu S. Y. T. Combinational Image Watermarking in the

Spatial and Frequency Domains [J]. Pattern Recognition, 2003, 36(4):

969~975.

[6]Chang C. C., Chen T. S. Chung L.Z. A steganographic method based

upon JPEG and quantization table modification[J]. Information Science,

2002, 141:289~302.

[7]Yu P. F., Liu B. High capacity blind information hiding algorithm based

on DCT [J]. Computer Applications?2006, 26(4):815~ 817

[8]Liu G. J., Dai Y. W., Sun J. S., Wang Z. Q. . High Capacity Information

Hiding Scheme for J PEG Images [J]. Information and Control,2007,

36(1): 102~106

[9]Koch E, Zhao J. Toward robust hidden image copyright labeling[A].

Proceedings of IEEE Workshop on Nonlinear Signal and Image

Processing [C], Neos Marmaras?Greece?1995, pp 452~455

[10]iang X. M. Research on Several Foundational Theories and Key

Technologies of Copyright Protection and Authentication for Digital

Arts [D] Wuhan?Wuhan University of Technology?2007.

(c) extraction after low

quality compression

?a?image waiting

embedding

(b) extraction

after high quality

compression

(d extraction after add

salt and pepper noise

(c) extraction after add

Gaussians noise