greebledness is an intuitive measure of an object's Kolmogorov complexity
the more greebled something is, the more incompressible (aka asymmetric) detail it possesses
as such, degreebling is the process of reducing the dimensionality of the vector space required to represent it
the more greebled something is, the more incompressible (aka asymmetric) detail it possesses
as such, degreebling is the process of reducing the dimensionality of the vector space required to represent it
first, complexity: we mostly all possess an intuitive notion of what it means, but how can it be measured? the Russian mathematician Kolmogorov proposed a technique within the domain of information theory to do so, based on the idea of compression.
in short, all objects are made of information (location & composition of its components), & it easier to discuss the complexity of information than of physical objects.
the Kolmogorov complexity of a piece of information can be denoted by "# of chars required to represent it"
the Kolmogorov complexity of a piece of information can be denoted by "# of chars required to represent it"
consider a piece of information such as "aaaaaa"; naively, its K-complexity would seem to be 6, & were that "a" to be repeated 1k times, it would increase proportionally. but, this is suboptimal; is it really linearly more complex?
intuitively, no! but how do we measure that?
intuitively, no! but how do we measure that?
here a crucial concept is introduced: compression. what does it mean to compress information? to find a symmetry in it, and "fold" along it. to simplify it so it takes less space, in a way which can later be reversed.
how is this done? a compression algorithm, which are myriad.
how is this done? a compression algorithm, which are myriad.
consider a v simple algo: "a char followed by 'xN;', where N is an positive integer, should be repeated N times". this algo, when applied, allows us to compress strings (sequences of chars) w/ many sequentially repeated chars, such as "aaaaa", or the longer version, quite a bit.
the longer version, represented thus, would appear as: "ax1000;", meaning "repeat 'a' 1000 times". this simplifies it quite a bit, requiring much fewer characters to express.
what is its K-complexity, however? naively, we'd say 7, but here's the rub: the compression algo counts.
what is its K-complexity, however? naively, we'd say 7, but here's the rub: the compression algo counts.
how "complex" is the compression algo? if we count the chars, there are 85. so, the real K-complexity of the string in the previous tweet is 7+85=92. quite a bit less than 1000!
important to mention that K-complexity is not necessarily objective; there could be a better algo.
important to mention that K-complexity is not necessarily objective; there could be a better algo.
to recap, a string's K-complexity is its length plus the length of the compression algorithm we're using; this means that the same string, with different algos, could have different complexities.
one could posit an absolute K-complexity measure using the theoretical best algo...
one could posit an absolute K-complexity measure using the theoretical best algo...
but in practice, this is not particularly useful, besides as a sort of efficiency measure.
another example: what is the K-complexity of the string described as "1000 'a's followed by 1000 'b's"? we'd write it as "ax1000;bx1000;", so, 7+7+85=99; much better than 2000!
and so on
another example: what is the K-complexity of the string described as "1000 'a's followed by 1000 'b's"? we'd write it as "ax1000;bx1000;", so, 7+7+85=99; much better than 2000!
and so on
now, what if we want to measure the complexity of the string described by "1000 of every letter in the English alphabet, written sequentially"? we could certainly use the same algo, which would result in a K-complexity of 7*26+85=127, but we could prob do better, w/ a better algo
how could we improve this algo? let's go back to what compression is: finding symmetries in information & "folding" along them.
the initial symmetry we used is "this character is repeated N times", a sort of translational symmetry
what new symmetry does this last string have?
the initial symmetry we used is "this character is repeated N times", a sort of translational symmetry
what new symmetry does this last string have?
every 1000 chars, the same pattern repeats but with a different char, which can be predicted w/ knowledge of the English alphabet. as this is a form of rotational symmetry (imagine a decoder ring), we can posit a modulo algorithm, one which iterates through a known sequence, A-Z.
this algo takes a bit more effort to express, and requires some embedded information. in practice, this informal description likely wouldn't be sufficient, and we'd use an actual program for some language to express it properly. the next
"a char followed by 'xN;', where N is an positive integer, should be repeated N times. the substring '<modE(...)>' should be replaced by iterating thru the following list & substituting each char as the term M in the expression denoted by '...': abcdefghijklmnopqrstuvwxyz"
whew!
whew!
we'd use this new capability as follows: "<modE(Mx1000;)>"
much shorter! but the algo is now longer. let's measure: the algo's length is 271, so 15+271=286. oof, so, turns out we did much worse than with the previous algorithm! so, was all this work for nothing?
well, not quite
much shorter! but the algo is now longer. let's measure: the algo's length is 271, so 15+271=286. oof, so, turns out we did much worse than with the previous algorithm! so, was all this work for nothing?
well, not quite
these example strings are pretty contrived, & not indicative of real-world strings we typically try to compress. imagine it we had one like "repeat the alphabet 1000 times, repeating each letter 1000 times in a row". in that case, this new algo would do much better! it's relative
in practice, one does not typically customize the compression algo per string, as there are available optimized algos which can be used generally, & its often good enough to compare relative complexity using the same algorithm, rather than objective.
so, now we more or less understand K-complexity, but what's that got to do w/ greebles?
well, consider this: what if we had a string of 10000 'a's, with a single 'b' every 1000 chars? that's pretty complex! but what if we.... simply remove the 'b's, simplifying it to: "ax10000;"
well, consider this: what if we had a string of 10000 'a's, with a single 'b' every 1000 chars? that's pretty complex! but what if we.... simply remove the 'b's, simplifying it to: "ax10000;"
now that, pals, is a degreebling. we've removed information from a string in such a way that exposes underling symmetries & makes it quantifiably less complex! (in technical terms this is called a lossy encoding)
so far, we've just been dealing with strings. what about objects?
so far, we've just been dealing with strings. what about objects?
well, one of the key insights of information theory is that everything is information, expressed in a variety of ways. the arrangement of atoms comprising some object is information, & could be used to reconstruct it (compression algo)
now we can see how this applies to anything
now we can see how this applies to anything
to summarize, remember this: if you really want to annoy one of your online friends, simply change your phrasing from "I'm going to degreeble you" to "I'm going to use a lossy encoding to expose your underlying symmetries & reduce your Kolmogorov complexity"
how fun!
how fun!