Skip to main content

tplogin.cn首页 > 新闻资讯 >

压缩文件为什么不能一层层压缩自身(2)

2012-07-03 00:58 浏览:

你会认为这是玩笑是吗?事实上,用谷歌地球上的数据库,不就拼成了一张这样的照片么?但是其代价,绝不是一张胶片能包含的信息量,而是上百个TB的数据。

这意味着什么?意味着信息的本质是离散而不是连续的,如果要想看出「地球是个圆的」,也许巴掌大的照片就够了,但是想要看到地球上更清晰的细节,则没有捷径可走,只有增加照片的信息量——换言之,信息量和能量一样,是守恒的。

那么压缩文件的原理是什么?

就是利用「一个文件的容量与它的信息量并不相等,而且它的容量肯定要大于它的信息量」的特点,使用算法「将一个文件所包含的同样的信息量用更少的容量来描述」。

举个简单的例子,一张大小为 3x3 的图片是纯黑色的,那么我们用两种方法来描述这张图片:

这张图片是3x3的,且他们的颜色分布为:{纯黑色,纯黑色,纯黑色, 纯黑色, 纯黑色, 纯黑色, 纯黑色, 纯黑色, 纯黑色}

这张照片是3x3的,且它们的颜色全部是{纯黑色}

可以明显看出,第二种描述的方法比第一种描述的方法占用的空间(容量)要小得多。无损图像压缩算法中的「行程算法」就部分借鉴了这样的思想。

再举一个例子,让我们来看看,在英语中为何会出现一种这样情况——使用频率越高的单词,其包含的字母越少,而使用频率越低的单词,其包含字母越多?

比如说在猫王的一首歌《she》中,我们把英语中的「她(she)」统一换成一个长一些的单词,比如说「轻浮的女人(flibbertigibbet,仅仅是随意找一个比较长的单词,并非故意「政治不正确」XD)」那么起后果就是这样的:

修改前:

she may be the face I can't forget , a trace of pleasure I regret, may be my treasure or the price I have to pay she may be the song that Summer sings, may be the chill that autumn brings, may be a hundred different things, within the measure of the day. she may be the beauty or the beast, may be the famine or the feast, may turn each day into heaven or a hell, she may be the mirror of my dream a smile reflected in a stream, she may not be what she may seem, inside her shell, she who always seems so happy'n proud, who's eyes can be so private and so proud, no one's allowed to see them when they cry, she may be the love that can and hope to last, may come to me from shadows of the past, that I remember till the day I die, she may be the reason I survive, the why and wherefor I'm alive the one I'll care for through the rough and rainy years, me I'll take her laughter and her tears, and make them all my souvenirs, for where she goes I got to be the meaning of my life is she she she

修改后:

flibbertigibbet may be the face I can't forget , a trace of pleasure I regret, may be my treasure or the price I have to pay flibbertigibbet may be the song that Summer sings, may be the chill that autumn brings, may be a hundred different things, within the measure of the day. flibbertigibbet may be the beauty or the beast, may be the famine or the feast, may turn each day into heaven or a hell, flibbertigibbet may be the mirror of my dream a smile reflected in a stream, flibbertigibbet may not be what flibbertigibbet may seem, inside her flibbertigibbetll, flibbertigibbet who always seems so happy'n proud, who's eyes can be so private and so proud, no one's allowed to see them when they cry, flibbertigibbet may be the love that can and hope to last, may come to me from shadows of the past, that I remember till the day I die, flibbertigibbet may be the reason I survive, the why and wherefor I'm alive the one I'll care for through the rough and rainy years, me I'll take her laughter and her tears, and make them all my souvenirs, for where flibbertigibbet goes I got to be the meaning of my life is flibbertigibbet flibbertigibbet flibbertigibbet

修改后与修改前的歌词相比,整整长了一行半。可是这两首歌词要表达的意思是一模一样的啊!那么用更短的字母来描述使用更频繁的含义是不是就意味着,这本书的厚度可以减少?你猜对了,这就是著名的无损压缩算法「霍夫曼编码」的基本原理。

说了这么多,就是想解释一下,压缩算法只不过是想将「文件容量大于其所含信息量的那一部分」当成海绵中的水挤出来。但是总不能挤到连海绵都被挤消失的地步。你要问怎样判断一个文件的信息量有多少?一个粗略的解释就是,将所有计算机上的文件都看作是数学上0和1的排序的话,通过数学方法就可以计算出这个数列的有序度。这个有序度就是指不管采用什么办法,能够翻译出这组序列最少最少,需要一个多长的序列,才有实力做到这一点。具体如何计算嘛...略微复杂,有兴趣的同学还是去看看相关的资料叭 XD

本文出自 “技术帝还是技术低” 博客.

相关文章

et文件怎么在电脑跟手机中打开

2022-04-15 08:40:33    浏览: 74

crx文件安装到edge浏览器教程

2022-04-14 12:21:03    浏览: 149

crx文件安装到谷歌浏览器教程

2022-04-14 12:22:27    浏览: 187

crx文件安装到火狐教程

2022-04-14 12:22:58    浏览: 177

crx文件安装到360浏览器教程

2022-04-14 12:23:28    浏览: 71