The Folly of Infinite Compression is one that crops up all too often in the world of Data Compression. The man-hours wasted on comp.compression arguing about magic compressors is now approaching that of the Manhattan Project.
In a nutshell, a magic compressor would be one that violates the Abraham Lincoln’s little-known compression invariant: “You can’t compress all of the files all of the time”. It’s trivial to prove, and I won’t do it here, that no single compressor can losslessly and reversibly compress every file.
The easiest way to foil most compressors is with a so-called “Random File”. In the world of Information Theory, Randomness is a tricky thing to define, but I like to fall back on Kolmogorov complexity, where we define the complexity or randomness of a file as the Komogorov complexity w/r/t a Turing Machine.
Several years ago I posted a challenge on comp.compression that I hoped could be used to silence any proponents of Magic Compression, and I’m reposting it here so I have a permanent location I can point to.
How does it work? I took a well-known file of random digits created by the RAND group (no doubt at taxpayer expense), and converted that file into binary format, which squeezed out all the representational fluff.
The result was AMillionRandomDigits.bin, 415,241 bytes of incompressible, dense data.
The challenge is to create a decompressor + data file that when executed together create a copy of this file.
The size of the decompressor and data file added together must be less than 415,241 bytes.
So far nobody has been able to do this. I wouldn’t have been surprised to see somebody get a byte or two, but that hasn’t happened.
The only real rule here is that we have to negotiate exactly how to measure the size of the program plus data file. I’m willing to exclude lots of stuff. For example, if you wrote the program in C++, how would we measure the size of the program?
In this case, I would measure it as length of the source file, after passing through a resaonble compactor. The run time libraries and operating system libraries could be reasonably excluded. But that’s just one example, the rest are all subject to public interpretation.
Really, the only rule we need is that the executable program can’t have free access to a copy of the million random digits file. For example, you can’t create a new programming language called JG++ that includes a copy of this file in its Run Time library. You can’t hide the digits in file names. And so on.
As long as those rules are obeyed, the challenge will never be met.
Recursive Compression
This challenge has a special place for the variant of Magic Compressors known as Recursive Compressors. Some savants will claim that they have a compressor that can compress any file by a very small amount, say 1%. The beauty of this is that of course they can repeately use the output of one cycle as the input to another, compressing the file down to any size they wish.
The obvious absurdity to this is that if we compress every file to a single bit, it’s going to be kind of hard to represent more than two files using this algorithm. So most people in this subspecialty will claim that their process peters out around some fixed lower limit, say 512 bytes.
For those people, a working program should be able to meet the challenge quite easily. If their compressed data file is a mere 512 bytes, that leaves 400K of space for a decompressor that can be called repeatedly until the output is complete.
The Payoff
The first person to achieve the challenge, while staying within the spirit of the challenge, will receive a prize of $100.
157 users commented in " The Million Random Digit Challenge Revisited "
Follow-up comment rss or Leave a Trackback*Cracks knuckles*
*fires up compiler*
*scratchs head dumbly - wonders how long this might take*
Wow - you’ll give me $100 for doing that - um ok. I’ll get back to you - later.
It’s not just the $100. You’d be famous!
Hello!
When it comes to recursive magic, I think you might need to explicitly include the requirement that the number of cycles be included along with the actual decompressor and compressed data. Otherwise, the challenge would be easy. (But you already know this, of course, but I wouldn’t be surprised if the magic-doers don’t realise this, with all that that could entail.)
The way I’d do it is to treat each file as a string of bits, add an extra one on one end, and interpret the result as being a nonzero natural number in binary (with that extra one being the most significant bit). Then, my compressor would subtract one from that number, unless that number is already one, in which case it cannot be compressed any further. As the result must be at least one, it’s going to have a most significant one in its binary representation. Chop that one off, and take the remaining string of bits as the ‘compressed’ file.
Most files don’t compress with just one compression cycle, but, if enough cycles are applied, all files (of finite length) end up compressed to zero length.
To recursively decompress such a file, you just have to apply the (obvious) decompressor as many times as the compressor was applied. But how many times is that? “Oh!” I can imagine the magic-doers saying, “You never said the cycle count counted, too! You’re cheating by changing the rules! It’s a CONSPIRACY!!!1! Yet more proof that the ‘experts’ won’t admit that they’ve been wrong all these years!…”
:-)
I think the challenge as is is safe against this. Basically, your method requires hiding information in the compression cyle, and then using it in the decompression cycle.
I’m trying to avoid most of those tricks by asking the user to create a decompressor only - a single file, that when executed, creates the random file. So even if the user counted up the cycles while compressing the data, there’s no place to store them.
So for example, if you created a shell script or batch file that executed your decompressor, it would have one line that said:
for i = 0 ; i < big_big_number ; i )
decompress
Because that number is so big, the shell script itself would have to be bigger than the million random digit file.
You could try to finesse this by hiding the number in a huge package of 0 length files, each with a name that makes up part of the number. But we’ll nip that in the bud!
Sachin Garg has a post on c10n.info that highlights a classic magic compressor here. Jeff Fries seems to be claiming spooky compression powers that would enable him to get a slam-dunk on this million digit test - however, there’s a bit of a problem in that I think he needs some hardware to make it happen.
fifty quid? hardly worth getting out of bed.
i’ll keep the file though as it could be a good test.
>fifty quid?
fifty of your quid plus worldwide fame.
of course, you can always wait for the exchange rate to improve.
Don’t tease or Peter St. George and Hamby of Zeosync will come swirling out of the sky on giant bat wings and roost everywhere.
- truth
@truth:
I try not to tease, but it is kind of fun to watch the challenge roll on for years and years with (of course) nobody even able to submit a challenge!
Hey, Mark.
I thought you would remember when the “truth” brought Zeosync down. Seems to me they started to fold about three weeks after Sam Costello’s article in PC World.
- truth
Hi everyone!
If someone manages to compress the million random digits file by a few bytes, would it have any significance? Would it be an accomplishment?
Wouldn’t it simply mean that the specific sequence of digits wasn’t completely random after all?
@Emil:
A few people did some analysis on the file and have found some minor weaknesses in it. In theory one could squeeze a few bytes out of it.
But of course, to win the challenge, you’re decompressor would have to be vanishingly small. This will be tough to do.
So… if you did create a decompressor + data file that was smaller then the challenge file, but only by a few bytes, you will be lauded for your accomplishment, but you won’t be breaking any laws.
The flaw in the idea of counting up to the number represented in the file’s bits is, as stated, flawed because inserting that number into the source code (or data file) is exactly as big as the original.
But I like the tactic. Are there ways around this limitation? Assuming infinite computer time: factor out the primes, and write each in an efficient form to the data file. The decompressor would then multiply the numbers. Obviously, that’s extremely vast division and multiplication, but the algorithm for doing so with bit streams as the operand is probably not TOO large.
Assuming the decompressor also has unlimited runtime memory, it opens a few more options. Say instead of just multiplication we also allow for certain meta-instructions. Each entry in the table is either a bitstream or a manipulation of one of the other (preceding) entries. With addition, subtraction, multiplication, maybe division, left shift, right shift, and/or/xor, you could start to get pretty expressive.
As the size of the compressor is not in question, and the efficiency on both ends is not part of the question, the success of this approach begins to depend less on the algorithm - over an arbitrarily large enough file even a very small compression ratio would overcome the size of such an algorithm. Instead, we’d have to ponder whether or not the data file, with its instructions *and the separation of the component bitstreams (factors)* would actually be any smaller than the source set, or smaller than any source set, or smaller than only very large source sets, etc.
My suspicion is that it would not. Even merely the storage of any given prime factor from the original number requires overhead. Where the source file is considered to contain the number and nothing but the number, we must be able to store multiple streams of arbitrary length in ours.
I don’t think you could demark them in a header to the data file because the bit positions are ALSO arbitrarily long. Would you store them in 4 bits? 8 bits? 32 bits? 128 bits?
Instead, each bitstream should be preceded by its length. The length itself must have a length; to store the length, you could count each 1 up to the first 0, then use that as the number of bits to read for the length, and use that as the number of bits to read for the string. If you wanted to meta it out further, after each ‘length’ string you could insert 1 for ‘the stream begins next’ or ‘0′ for read a new length string with a length of the last length. :D
I did ramble a bit. It caught my eye. Any thoughts?
The idea that there are shortcuts to counting things is appealing, but ultimately fails to bear fruit. Counting any random thing will lead to some numbers that compress, but many, many more that don’t.
*shrugs* Alright, how about this idea (I’m no expert in compression or codeing, just throwing an idea)
Alright assuming these are numbers. 0-9
why not have it search for the inevitable patterns that arise. Find the most common ones (say if 12345 appears 100 times for example) of as long of string as possible, and re-assign that to a different character
so for example:
123453726482912345375482
would be
X37264829X375482
Find the most common repeating patterns, and longest running ones. Then you have, for example, 1 saved (12345) and 100 X’s.
If the system finds there’s 10 occurances of a 250 character instance, that shortens things by 2000 characters or so no? (including your new X’s and one copy of the 250 char key)
Patterns WILL occur in any random number, especially at a million characters.
0-9 only uses a few of the veriations of binary, leaving you a lot of characters to convert them into.
Now if the program could be made small enough to fit into it, and left to run for a LONG ass time to find the patterns for all lengths and combo’s… (but processing time wasn’t a factor)… that should do it?… no?
@Digital:
You’ve done a good job of describing a class of common compression algorithms, sometimes known as the Macro Replacement Model.
It works great when some patterns appear more frequently than others. But if this is not the case, if patterns appear randomly, it takes more bits to describe and replace them than it did in the first place.
Think about it: let’s say that the string 1234567890 only appears once in the million digit file. To use macro replacement, I have to insert a macro definition in the file, something like:
X=123456789
Then I replace that occurence in the file with ‘X’.
At a minimum, that takes two chacters plus change more space in the file than it was using before. Fail.
If the string occurs more than once, you win. And when long sequences start showing up with some regularity, we describe that file as not being random.
@ Mark
Figures something that simple would have been thought of already, haha
I would be interested to see how well it would work on this, though I’m sure it’s been tried. A million is a large number, large enough that odd things start happening with probability. I would think that there would have to be quite a few 3-4 character matches, and at least a few larger ones.
You’d also have to have the program try all possible combo’s, and THAT’s the processor eater.
00…01…02… ~~~ 99…000…001…002
after each one, check number of replacements vs length of string to check total saved space. That would have to go up to 500,000 characters looping to verify every combo has been included. (now that I think about it, go backwards, starting at 999999, 999998, 999997… hate to kill a larger combo by finding a smaller one… although with unlimited time, run it both ways to be sure)
It would have to keep a record of every single one of them with a ’space saved’ tag next to them…
Now that I think about it… since time isn’t an issue here, after it goes thru all combos, run that sucker again! who knows, maybe we’ll get to compress our new ‘letters’ we’re using as place holders… as long as we keep track of what order we compressed them nothing should be lost.
Of course, before finishing, all the non-used combo’s could be deleted out, so you don’t have a dozen strings of 0 matches, or 1 matches that aren’t used to compress.
___
bah, now I’m rambling again. I’m sure all of this is already done and been rendered useless by the next generation of compression. Still interesting to re-invent it in your own head :P
@mark
I’m going to read up on “Macro Replacement Model” some, I’m sure it will be interesting to see the completed idea. ty for the info
@Digital:
>I would be interested to see how well it would work on
>this, though I’m sure it’s been tried. A million is
>a large number, large enough that odd things start
>happening with probability. I would think that there
>would have to be quite a few 3-4 character matches,
>and at least a few larger ones.
I encourage you to work through the problem, I think you’ll find it instructive. A lot of people run into the trap of coming up with ideas like this and then not following through with actual tests to see what the real costs are.
@mark
the extent of my programing knowlage is VBA unfortunatly… and though I think it would be possible in VBA it would be,… lets just say very very ugly. haha
Reading up on different compression methods, there’s a lot of interesting stuff in there. Of course now I’m trapped in the ‘wiki-hop’ of reading every artical that’s linked to it, so I’m sure I’ll be sucked in for a few days.
You really have not received a *single* challenge? That’s amazing. And pretty convincing. Is the challenge “real”? Is it still available (July, 2008)? Maybe potential challengers don’t think that you really intend to give out the money.
Of course, your prize is “only” $100. Is Mike Goldman still offering the $5,000 prize that I read about years ago?
@Ak3la:
No, after all these years no serious challenge.
And I think it unlikely that there will be. Some hard workers on comp.compression identified a tiny bit of redundancy in RAND’s million digits, but the amount is so small that I’d consider it impossible for even the most skillful coder to take advantage of.
Are those Dollars are American Dollars or Canadian?
I think Canadian is worth more these days.
I’m still trying to make it happen.
I have another idea so I’m working on it, again.
I was curious as to the status of th project so I did a search for million digits and here is this nice site.
I should know something in about a month.
Wish me luck.
Hi Mark,
I came across this challenge while casually browsing the internet. Is this challenge still on? What’s the best compression reported to date? I tried WinZip and Winrar, both gave compressed files bigger than the original file, so I realize this is not a trivial challenge.
Anash
@Anash:
I don’t think anyone has bothered to report “best compression” on this - it’s kind of an all or nothing deal. So far nobody has been able to come up with a good algorithm for compressing this random data. Nor is it likely that anyone ever will.
Hmm…
There’s no mention of isolating the decompressor’s output to a single file. Nothing says the decompressor is responsible for identifying which output is the match, only that it’s been created. If your script simply created 415,241 files with the bytes incrementing from all 0’s, this random data should be in there, yes?
On the other hand, this sort of “letter of the law” vs spirit has inspired me to look at different techniques for hiding information; thanks A Million Random Digit Challenge.
Len
@Len:
Stop a minute and think about how many files you’d have to create in order to be sure the random file was included… that would be 2^415,241, not 415,241.
That would be a lot of files.
Alright well a month may be a little bit of an underestimate.
Still trying new ideas.
Thought I had a winner the other day but I’ve learned:
Great results… Magic compression? “There is a mistake someplace.”
I hope that make ya snicker…
Anyway Happy new Year to all of ya still trying to compress million digit file.
Ernst
Hi Mark
Here is my recipe (awnfull pseudocode).
mydigest = Digest the file with md5 (as a example)
Decompresión
iterate a from 0 to max_length
{
if md5(a) == mydigest then ouput(”File found: a”) ; exit }
outpur “error!!”
As md5 can collide maybe a two digest aproach adds more against finding a collision on the way..
Well this is akind of joke but you never said the algo has to be deterministic…
Regards Angel
@Angel:
Nice idea, if only it worked!
- Mark
Anyone else working with recoding the file?
The number of collisions of a hash function on a file storing number around 10^1,000,000 is going to be around 10^1,000,000 divided by 2^(size of hash in bits). The count of the number collisions to reach the orignal data plus the size of the hash is going to be about equal to the origninal size of the data.
If you use multiple hash functions it is more difficult to determine the number of collisions. However it would take near infinite time to encrypt or decrypt by going through every possiblity and calculating the hashes.
You could hash the orignal file, drop every Nth bit or byte, and calculate a hash(es) of the dropped data. This may (or may not) reduce the number of collisions significantly, but it would still take near infinite time to reconstruct the original file.
Never give up. Never surrender.
Hey everyone.
Helping a friend move this weekend and I needed a break so I thought to post some here and there.
Happy Valentines day all you ladies.
I’ve been working on the Million Digit challenge these past few weeks. It’s funny how looking for work lets a man focus his mind.
I’ve had a few interesting results but nothing works all the way yet.
This next week I will try a new algorithm I worked out and see if it does any better.
What did they say in the original Hitchhiker guide to the galaxy? “Keep banging those rocks together.”
http://en.wikiquote.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy
Good luck to all.
Ernst
How about...
Yay! I win! Ok, I know that this is cheating really...
@mike40033:
Yeah, kind of violates this rule:
>the executable program can’t have free access to a
>copy of the million random digits file
- Mark
Hmm... that comment didn't quite look like I meant it to. I guess I didn't read the instructions for the iG:Syntax Hiliter plugin properly........
The rule about "not inventing new programming languages" should perhaps be tightened to not allow new programming languages at all, or to require that the compiler/interpreter for a non-standard language be included in the bit count.
Otherwise, for example, I could invent a programming language as follows. The programming language is called "MikesMillionMunger". As the rules stand, the compiler for MikesMillionMunger doesn't count towards the number of bits. The language doesn't contain the file in its runtime libraries.
1) Any sequence of bits is a valid program. The two instructions are '1' and '0'.
2) When the program starts, it outputs the bits corresponding to hexadecimal 9b1b.
3) The instruction '1' means 'output the bit 1'
4) The instruction '0' means 'output the bit 0'.
Now that I've defined a programming language, I need a program to run on it. The program I will use will be the last 415239 bytes of 'AMillionRandomDigits.bin'
My program is 2 bytes less than the file, size of the input is 0 bytes, and I can argue (against the spirit, but within the letter) of the contest that my implementation of my language won't count towards the bit-count.
Mike H...
@mike40033:
As I've said before, the whole point of this is not to finesse the rules, but actually achieve the goal, so I'm not going to go crazy on definitions.
Anyone who really things they have won the prize should be able to have a program that works on the real file, then works again on the same file encoded with a randomly chosen key using DES or whatever.
- Mark
But you only require a decompressor. To make it work on the encoded file needs a compressor as well.
If I understand the challenge correctly, the whole point of this is not to 'achieve the goal' but to silence people who don't understand the pigeonhole principle. I would suggest that this requires closing loopholes in the rules.
However, it is *your* challenge, so I shall respectfully defer to your interpretation.
Mike H...
Oh it's fun and the history behind it all as well.
It's a tough challenge. Fun for someone like me.
I will be very impressed if I or anyone else does well.
Best to have a bit of humor about it all.
I'll place a side bet the winning method will be clever.
Ernst
Maybe someone here can help me remember something I've been trying to find for years...
A long time ago (early 90's or so) there was actually an ad in Byte magazine for a company that claimed to have a magic compressor that could compress anything.
If I recall correctly, they got some interesting letters to the editor and then quietly went bankrupt.
It's the only time I've ever seen this in print in a reasonable magazine, and I'd love to see it again. Anyone remember this ad, or the name of the company?
David
@deg:
Search for the comp.compression FAQ, and when you find it, I think you'll find that the company was called WEB Technologies or something like that.
- Mark
Thanks Mark, that was it. Details in answer #9 of that FAQ.
Turns out that the story played out over three years --1992-95. But, in an ironic mental compression, my memory had reduced the story down to just a few consecutive issues of Byte.
David
I am assuming that someone has already thought of this, but I would be interested in learning about why this doesn't work. Could you use a similar algorithm to the Macro Replacement Model, but choose macros that fit in with Benfords law? That is, random numbers will tend to have more 1's than 2's, etc. Could you use this to your advantage when writing a compressor?
not sure if this way has been mentioned (as i didn't read the entire list of comments)
encode the data using a prime factor algorithm:
for a string of digits "102310"
you would compute 2^1 * 3^0 * 5^2 * 7^3 * 11^1 * 13^0
you could then split this number into larger factors that you would be able to store in a much smaller filesize (IE: 450^345 + 6344^231 or something (completely made up)
This would of course be an extremely inefficient way of compressing the data and would require hardware to prime-factor numbers with multiplicands of the millionth prime number (assuming a file of 1 million digits)
I just remember reading about this code (can't remember the name now) in a scifi book in middle school.
@aloishis89:
Benford's law applies to data that is in a non-linear distribution. The random million digits should be linear.
No way this is going to be any help.
@soda:
You've described a different way of encoding the data. Is there any reason to think that it results in compression? Simply finding a new way to represent a number is not enough, it must also be a more compact representation.
Mark mentioned that a few minor weaknesses have been identified in the data file, and in theory it might be compressible by a few bytes. The problem is of course that's is hard to write a decompresser that's only a a few bytes long. For fun I tried to see how small a script could be while still doing anything meaningful.
Suppose for a moment that the data file happened to contain the sequence '134113410686196639649008076861966396490080729' starting at location 531441.
By coincidence 134113410686196639649008076861966396490080729 can be constructed/compressed using the ruby expression 7**25 (which calculates 7 to the power of 25), and 531441 can be compressed using the expression 9**6.
I'm feeling lucky! Now I can write the following decompresser in Ruby:
The program reads the file 'f', which is a copy of the million digits file, but with the sequence removed. It then inserts the sequence at location 531441, which recreates the original data. The inserted sequence is 45 chars long, and my decompresser is only 28. I saved 17 bytes!
From this litle experiment, it seems that to do anything meaningful, a ruby script needs to be about 30 chars, with 10 chars for just reading the file. I guess the problem here is that any 30 char sequence that can be calculated simply would represent an island of order in the sea or randomness, and be a significant weakness, probably far beyond the actual weakness identified in the file.
In any case achieving a few bytes of compression would not prove anything. It would simply be neat. Wether it's possible depend really on the exact nature and location of the weakness in this specific file. If the weakness is only one or two bytes, I would say it's impossible.
I was wondering if there's a way to calculate the absolute value of randomness/orderliness of a specific integer?
take 0. what's the orderliness of zero? probably undefined. or low, probably 0.
1: seems no different. what the chance of accidentally hitting on the number 1? depends on the range!
2: we might still be at 0?
3: hmm maybe we're not exactly at 0 anymore.
4: high. here it seems some level of orderliness can already be identified. 2+2=4, a kind of symmetry.
5. low. it's a prime. all primes must have a low orderliness value?
6. high, maybe higher than 4? less of a coincidence than 4, since 6 is bigger than 4.
seems the bigger the number, the more order it can contain, which seems right since it becomes more unlikely to hit on the number by chance.
is there a meassure for this? is there a way to calcualte the absolute value or it?
A question about the rules. Suppose I run the a program using:
> ruby -n million.rb random.txt >> output.txt
The program million.rb is run with the file random.txt as input, and the result is saved to output.txt.
How would the size of the program be meassured? As the number of byte inside the million.rb file?
(btw, I know the data is binary not text...)
@Emil:
Like I said in the rules, we are looking for the total size of the program plus the data file. So in the case shown above, I suppose it would be the length of million.rb plus the length of random.txt.
Creating a build of ruby that stores the data in the runtime would obviously not count.
Someone suggested a while ago factorising the number and just storing the primes. I can't see that working, since storing the primes will require as much space as storing the number (it's the same amount of information, after all). However, it did give me an idea:
The file given is random in terms of sequences of digits. It may not be so random in terms of primes. If the number is not square-free (and we have no reason to expect it to be), then you can reduce the space required to store the primes by not storing the duplicates (just storing the power required).
Unfortunately, I can't test to see if this works because the largest numbers that can be reasonably factored by modern clusters are in the 100s of bits, far smaller than this number. I think the idea may have merit, though - whenever you see the word "random" your first thought should always be "what's the distribution?" It may be a uniform distribution from one point of view, but heavily skewed from another. If you provide a somewhat smaller random number (100 bits, maybe), I'll give it a go, though. (Although it may turn out to be too small to have a large enough repeated prime to absorb the overhead of the decompresser, as small as it would be.)
if it turned out that the million random digit number was simply the factor of a few primes, you could express it by the ordinal numbers of those primes, and then indeed you would have great compression.
but imagine the luck involved for that to be true!
the folks who created this file spent a lot of time making sure it was random in every way they could imagine, but they couldn't have done a factoring check too easily.
so it could happen, but the odds are incredibly high against it.
an interesting experiment would be to start cranking out a list of numbers that *can* be compressed by prime factorization, and seeing what percentage actually meet the cut.
A number theory guy might be able to help with the answer.
I forgot to come back and check for replies, sorry!
For my method to work, the number doesn't need to be the product of a small number of primes, it just needs to not be square-free (and have the repeated prime be fairly large). The vast majority of numbers are not square-free.
It's not actually necessary to factorise the whole number, just to find a (large) square factor, so perhaps this is tractable. I have exams coming up (one of which happens to be number theory!), but I'll give it a go in a few weeks.
Having actually looked it up, rather than guessing (I should know better than to guess about such things!), I need to correct myself - it's not a majority of numbers than aren't square free, it's actually a slight minority. So, the odds of this compression method working is not great... If it weren't for the limits of computing power, it would still be reasonable, but even if this number has a repeated prime factor it will probably be difficult to find... Oh well...
Hi Mark,
Is the challenge still open? I'd like to try,
So all you want is a file (exe) smaller than this one, that outputs this file. Obv, as per the spirit of the competition.
What i am asking, is , I wont give any code /source at this stage.
Thanx
Waiting for a promt reply.
JITENDER BEDWAL
New Delhi
India
@Jitender, sure, there's no requirement that you disclose any source code to win this challenge.
Hi Nelson,
Is there any limit to the acceptable time need to be taken for the task to complete ? (AmillionRandomDigits Challenge) eg, if it acceptable if the decompression need 3 weeks time. (joke).
Regards,
Ray
@Ray:
You take as long as you want, Ray.
Hi Mark,
Can u tell me, how do we know, how much a file is compressible to, Or , how exactly do we calculate the 'Entropy' of it.
I mean, that if i consider its entropy as composed of Symbols of the 256 symbol alphabet (read 8 bit byte alphabet), it comes out something, and if i consider it as the 256*256 symbol alphabet then its different.... So, how do we exactly get to know ( any STANDARD or absolute scale), that i can say, this file is compressible to 'this much'.
Thanks.
Jitender Bedwal.
@JitenderJB:
There is no such thing as an absolute measure of compressibility.
Any measure of compressibility is *always* relative to some specific model.
And for any file, there is some model that can compress that file down to 1 bit.
So when you hear people talk about the entropy of a file, you have to know how they are measuring the entropy.
Even something like Kolmogorov Complexity is not an adequate way to define an absolute measure of the complexity of a sequence - because it is going to differ depending on what sort of machine you choose to run it on. For example, a machine that has a single instruction that can generate the million-random-digit file will have super-low Kolmogorov Complexity on that machine, but may be very complex on some other machine.
- Mark
Mark,
What i am asking is, on what scale (or basis) [if any], did the RAND people, rate the 'A Million Random Digits' as random.
Jitender
@JitenderJB:
There are a few places you can read up on the history of these digits, but the most important thing is that they generated the digits using an unbiased, memoryless source. For their purposes, this qualifies as random.
Or maybe, is it exhaustive manual scrutiny, of the 'Tests of Randomness'.
Mark,
Is there any function, which does the reverse of what BigInteger did to the Random Digits file?
QUERY ABOUT SUBMISSION PROCEDURE
--------------------------------
I have been trying to compress THE file for a while no, I hav got 'a little' (but still some) success. Please elaborate the possible combination of the contents of the final file to be submitted.
So far, I hav a .CPP file, a .JAVA file, and a .BIN file, which when run Create the EXACT File u are providing. And the Total Size of the above contents is 232Bytes less than Ur File.
I hav tried to remain strictly in accordance with the spirit of the Challenge.
Thanks again..
JITENDER BEDWAL
@Jitender:
If you can mail those files, along with build procedures, and I can reproduce your results, you get $100, it's that simple.
- Mark
Thank you very much Mark.
I'll publish (or mail) them very soon. I am not actually that concerned about the patents or something. And I am not in any way afraid of posting my code (or the THEORY ). But still, Mark, please guide me, what should be a BETTER way, (like for example Publishing it in some magazine/paper/website or something so that i don't loose the credit for it, in long run) to do it. In short of course.
And btw, i hav gone through numerous threads on Comp.Compression. I know there hav been a lot of Pranks there.
So if you like, we can end this discussion for now, and resume it later (in a week or so, when i Publish the files), but still some advice would be greatly appreciated.
Thank (yet again)
Jitender
Waiting here... for another 10 minutes.
looking forward to seeing jitender's claimed solution.
i'm curious what code was used to convert the digits to binary format?
@Emil:
See this post on comp.compression for the full Java source code:
http://groups.google.com/group/comp.compression/msg/e219c4f18a9f546e
thanks for the link to the original java encoder. for fun i tried to write the shortest decoder i could come up with:
p gets.unpack('B*')[0].to_i(2)
it's ruby, takes about 40 min to complete on my mac, and outputs the one million original digits when given the binary file as input (and disabling the record separator using -0777):
the script is 30 bytes long. can anyone beat that? :-)
Is there a restriction on the OS used? I have an idea, but it will only work correctly on Linux.
@AtomicCheese:
No restrictions on the O/S, but of course, it helps if it is something I can run to verify things.
If you have an idea, and it will only work on Linux, I am pretty skeptical.
- Mark
Is this competition still open?
I would like to give a try.
But I feel prize money of $100 is too low.
Hi Mark,
A few questions about the contest:
1. When measuring the size of the compressor / decompressor, I assume we are talking about the length of the source code in bytes, not the binary. Can you confirm this is the case?
2. Also, when including the length of the source code, whether binary or source code, is the packed size (with something standard, zip, or winrar) of the source / binary used or the uncompressed size? If uncompressed that would mean we need to use one letter variables, macros to hell, etc.
3. Is this offer really still good and nobody has won it?
@Parlance:
I'll be happy to measure your submission by the size of the source, even the size of the ZIP or RAR file that holds the source. I agree that it would be annoying to have to make the source unreadable.
Yes, the offer is still good, nobody has won, and I don't think anyone ever will. If it was me, I'd be trying to get the file inserted into the Linux kernel so I could use that to snag a back door victory.
- Mark
@keshavshetty:
Contest is certainly still open.
Winning it would of course give you much fame and acclaim, way more than $100 worth. As Matt Mahoney has pointed out, if you find a general purpose way to recompress compressed data, you would actually be eligible for at least one million dollar math challenge.
Obviously the money is simply a token - nobody is going to do this for cash value.
- Mark
i just found this Improbable Research video about the million random digits , i think its extra funny cuz im trying to compress that crazyness =)
njoy
http://www.youtube.com/watch?v=0y8Wa10Lm7c
@haggo:
That video is simply awesome! Thanks for posting.
Does anyone still visit this place?
I have a few questions.
Is this challenge impossible?
Has anyone successfully completed this challenge?
What about this:
http://cs.fit.edu/~mmahoney/compression/barf.html
Hi Noodlesgc,
About barf, reality can be found at
http://nikhilsheth.blogspot.com/2007/09/barf-compression-holy-grail-or-goose.html
Meh... If you are so confident that this will never be achieved, why is the reward only a measly $100.00? Why not a million? By offering such a piss-poor reward, you've pretty much guaranteed that no one with anything valuable will ever take part in your challenge. If someone has a real working program, their eyes will be more focused on the millions they can make by marketing their little compression program. Your cowardly offer will not capture the interest of anyone who is serious.
@NotImpressed:
First, I don't have a million dollars, sorry.
Second, the people who pursue impossible compression claims are going to do it regardless of whether I offer any prize or not. This prize is simply a stake in the ground that they can choose to shoot for.
The primary reason I created the challenge was to provide a benchmark I could use on comp.compression every time someone made a fantastic claim. It has done very well in that role. When someone says they have a great new algorithm, I just point them to the challenge and tell them to let me know when they are done.
And despite what you say, it has caught the attention of many, many people who are quite serious. Of course, none of them have managed to meet the challenge. Nor will they.
Sorry I disappointed you.
It it still open? I fully do understand the chellange:
decompressor executable + compressed file MUST BE LESS THAN ORIGINAL AND MUST DECOMPRESS TO ORIGINAL WHEN RUN.
I ma not after prize even if it is 1 million, am just interested, I have an algorithm (not yet published anywhere) that can bring it down, which I use as post compression technique to reduce data to almost 1/3rd and have tried it already, but rightnow seems a little tricky with your binary file, but yes I am 95% use it is possible and I would love to proove it with an example (only if it is still open). Machines can never ever be more powerful than human brain.
regards
Khan
@Khan:
Most certainly, the challenge is still open. Please give it a shot!
- Mark
There appears to be just one way to crack this problem, which is to find something about the data that's non-random.
But if you succeed in that you still haven't solved the actual problem (although you'd win the $100), since the data file wasn't fully random.
Therefore, as long as the data is as near to fully random every way you analyze it then this problem cannot be solved.
@Robin:
First, there are many people who think that they can write compressors that work on all datasets, regardless of randomness. This challenge is their opportunity to prove it - of course no one has done so.
As for a sequence like the million random digits - regardless of how random it looks, there may be a short program that can generate the sequence - it is not possible to refute this due to the halting problem. So no matter how random it looks, there is always a chance that it could be compressed.
- Mark
Been a while.. I worked a long time and then again fell into the hole of hopelessness..
You know i should get it all going again..
This last round gave me three unique encoders so it was worth the effort.
What I love about this is when i get within striking range, and I have been there a few times, it seems I run out of storage space.
I'm always stuck with one or more bits of information I need to store that i have no place for.
Now that I think of it.. i did have a new idea on on old encoder I didn't work out yet..
But nice to say Hi! I will never quit trying I'm sure..
@Ernst:
It might be impossible to solve, but this is unprovable, so it is not necessarily a fool's errand to continue. Kolmogorov shows that we cannot prove that a given program is the shortest one to produce a given sequence, so there is always hope.
Needless to say, though, I don't have much faith that one will be found.
- Mark
Has there been any real submission for the challenge, other then scams??
@Zoran:
I received one submission, but it was just a waste of my time - I guess the guy didn't really understand the contest. He submitted a program that did some kind of compression, but it did not meet the requirements to win the prize.
I don't ever expect to see a valid submission.
- Mark
[...] (Before you read further I suggest you to read Mark Nelson “The Million Random Digit Challenge“) [...]
Loss less random data compression – Is it possible?
The article I posted in my blog - http://blog.adityon.com/2009/12/random-data-compression-is-it-possible/
Sorry I couldn't post here for 2 reasons.
1. Blog is too long
2. I want the user comments to be available at my blog.
New article added at http://blog.adityon.com/2009/12/random-data-compression-is-it-possible-part-2/
Well Guys.. It's a real mental challenge and I believe that is why I phase back into it from time to time.
I am now working on an encoder that has some promise in recoding the data. I have a ways to go since I decided to start from scratch and prove each step as I go.
Well if there is a boundary we cannot cross to information then that is the way it is. I've been within one bit of one bit a few times. I know what that boundary is like first hand. Thanks to the great spirit of this challenge. I have enjoyed the time spent.
Hey Mark I have been checking pattern with your ARI from the BWT program. Is that as good as any? Do you recommend a different one?
I figure is there is pattern that ARI program will see it.
Well to all who are still smacking the keys in 2010 on this challenge I salute you! I will be at it for a few months with this one.
I have coined the term Dark Information to reference any information that can be assumed in a system.
This was inspired by the idea of an observer in quantum system terms and one encoder with a special state that has no spare bits if I want it to "compress" I need that external reference to signal the state of the system. It is an one to one but I wanted it to be Special :)
Well this is the basic representation of million digit data. Not that this form of it is anything special; it's not. It's the kind of reality I see over and over again in all the ways I have expressed the data of the million digit file.
Not any savings to brag about so far.
1 : 829609
2 : 415167
3 : 207993
4 : 104024
5 : 52223
6 : 25717
7 : 12968
8 : 6475
9 : 3197
10 : 1627
11 : 769
12 : 420
13 : 212
14 : 92
15 : 37
16 : 29
17 : 7
18 : 5
19 : 3
20 : 1
21 : 1
Any Ideas?
I've posted a short article on Dobb's Code Talk dealing with this challenge. Some of you might find some interesting ideas there:
http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million-Random-Digit-Challenge.html&Itemid=29
Thanks Mark..
Imagination is more important than knowledge...
Albert Einstein
US (German-born) physicist (1879 - 1955)
I realise this is a dumb question, but.
Don't you need 4 bits per 0-9 digit?
1million * 4 bits = 500,000 bytes.
So how come the .bin file is 415,241 bytes?
@thicky:
There is a form of coding numbers called BCD. In BCD, you encode two decimal digits in a single byte, just like you describe.
The million digit file does not use BCD coding - it is a single long string of binary digits. If you do the math, you will see that a single decimal digit takes approximately 3.32 bits to encode. So a million decimal digits takes 3.32M bits, or 415K bits.
If this doesn't make sense, start Googling for computer numeric formats.
- Mark
@thicky: if you use Binary-coded decimals (http://en.wikipedia.org/wiki/Binary-coded_decimal), you would be right. But converting a decimal number to a binary number is a bit different. Let's convert 11 in decimal to binary: 11 = 8*1 + 4*0 + 2*1 + 1*1 = 2^3 * 1 + 2^2 *0 + 2^1 * 1 + 2^0 * 1, so 11 in decimal is 1011 in binary. In BCD it would be 00010001 More information on http://en.wikipedia.org/wiki/Binary_numeral_system
@Mark @Ben
Thanks - makes sense now I think. The file contains the binary representation of a million digit number, not the binary representation of a million digits!
Happy New Year!
Hey.. I tested the new encoder and no real surprise it turns any file into random data as far as Zip is concerned.
LOL it is a one to one encoder but still no magic compression.
That must be Codex number 25. At least this one doesn't have any unexpected states like one I have does.
There are systems out there. Keep banging those rocks together!
Mark - in response to your article at Dobb's Code Talk: The normality of pi has yet to be proven. That is, it has yet to proven (although many mathematicians think it is probably true) that pi contains every sequence (actually, normality is a little stronger than that, but I don't think the weaker version has been proven either). Your Magic Function, therefore, may not work (but it probably will).
For more information, see:
http://en.wikipedia.org/wiki/Normal_number
@Tango:
As I said, I am obviously not a mathematician! Thinking it is true and proving it is true are obviously two different things. Thanks for the pointer!
- Mark
Pi is an data set then until we know more.
That works for me.
Hey Mark are folks looking for that Holy Grail of Data Compression Numerologists then?
Oh I'm hooked on trying that is for sure. Anything that helps the imagination.. Thumbs up..
There was a story I remember about a substitute teacher who teaches one Kindergarten class one day and a High School senior class the next.
Kindergarten kids when asked to tell the teacher what a dot of chalk on the black board was. They replied with enthusiasm one answer after another. A Sun, the top of a telephone poll.. a... a... and the answers poured in.
The High school seniors when asked the same question were mostly silent with a lone condescending voice or it's a chalk dot dweeb! Coming from the back of the class.
The moral of the story is love your inner chalk dot.
Hey Did I tell you guys I can compress any binary number by one bit?
Drum roll....
@Ernst:
There are numerous ways to do as you are describing, but they all rely on hiding information. You of course can't compress any stream by one bit without actually storing that bit elsewhere.
- Mark
Well.. This is meant to be on the Data Compression humor side.
So If I blunder on the terms or other proper defines please correct me gently.
Again try to see this as Data compression humor and not a Big Brag.
I need to say that all bit strings except zero value ie 0 or 00000000
-----------------
If we take a string and I will just type and make up a data
1010101 -- We can show how value and symbol can be exchanged and encoded.
1010101 Starting with a limit of length of MSSB ( remember this is humor ) Ask yourself what must we subtract at the location of MSSB - 1 to get the MSSB to reset?
In this case only a single set bit at the MSSB - 1 location resets the MSSB.
1010101
-0100000
-------------
so that is 0110101 now.. Choose a parity in this case a simple 0 = 1 and 1 = 2 works so the first symbol is a zero.
repeat and you will end with the value of one which is Dark Information we assume when decoding the Symbol string. we start with a value of one and not a value of zero
So, in humor, I can reduce all greater than 0 binary values by one bit.
:)
it's 1010101 = 010101.. Where 0 and 1 are symbols rather than value.
Poo you say just drop the MSSB? well if I change the symbol meanings then it's 101010..
It's okay.. Just a little Data compression humor. I compressed all greater than one values by one bit..
> 0 duhh.. LOL
Wow Tough crowd..
Hey maybe you guys see a way to manage that parity flip issue?
Oh well I will log into Random Data compression and chat.
Happy new Year all..
And again:
Here's my algorithm. It relies on parallel processing. Well, parallel universes to be precise.
Theoretically, that should do it in some universe (because the data file was only read in universe0 and probably doesn't exist in the one which returns the file). Of course your interpreter or executable would have to exist outside the universes, which could make it hard to invoke!
I have an official update and I'd like to ask your position on this, Mark.
I have had the good fortune to discover a bijective encoding algorithm.
The size of the decoder is under 8000 bytes.
The down side is to recode a packet the size of million digit file, which is over 3 million bits, it takes nearly 48 hours using a single core at 100% This version wouldn't divide over multi-cores.
My question is: If I find a file "some place down or up the line" sequentially that will compress, at least that 8000 bytes, with a publicly available data compressor will that qualify as compressing the million digit file?
Thanks
Ernst
@Ernst:
I'm not sure I know what you mean. If you are suggesting that you might find a sequence in the file that compresses by 8000 bytes, then you should be able to construct a decompressor that meets the challenge.
- Mark
@danielnash:
I'll let you know when I am able to test your code. Don't hold your breath!
Hey Mark.
I am able to encode or decode the entire file recursively here. I'm confident it can "go" many iterations in both encode and decode directions but, sequentially only.
D3, D2, D1... E1, E2, E3...
There may be a diamond in the rubble but, it's worthless if you don't buy it.
Just wanted to see where you stand on this.
A little "Kobayashi Maru" Cheat on the file but it's in-line with my personal theory that there is a "quantum" of information involved and the symbolism representing it is non-permanent.
The new files are exactly the same size as the original million digit binary you provided. I am looking for a way to compress but the best hope at the moment is recoding.
I can learn to write a traditional data compressor but I will be learning from the bottom up. Why reinvent the wheel right?
So, my question, relating to recoding the file is; how would we score that. Would it be 8000 bytes + the data compressor? Maybe we might waive the size of programs all together since the codex is bijective and encodes or decodes any binary data and a publicly available data compressor is common enough? Wouldn't that be nice...
Files don't have to be encoded with "Wave", the codex code name, first to be decoded.
Maybe recoding the file is out of bounds?
You are fair so I respect your rules.
Ernst
Hey everyone.
I have an update: No luck on finding any exploit on the million digit challenge however I am pleased to announce I discovered dynamic Unary encoding. It is the basis for a bijective codex I code-named "Wave" since Wave encodes down to and back from, two bits.
I don't know if this is a self discovery or a discovery in general so I could use some feedback on the concept.
I have posted in the Beginners thread of the Usenet group alt.comp.compression.
Mark I placed this here because of two reasons. 1.) I have your challenge to thank for the focal point and 2.)alt.comp.compression is really slow these days. No one is even BS'ing. I hope to make use of common knowledge of those in the know. :)
So Questions; the concept is dynamic encoding where data is in motion rather than static and, has anyone said or read anything on this general concept of recoding bijectively?
Just what to look for will be helpful and I'll bet it's more appropriate to reply in the alt.comp.compression group to the thread Beginners Thread.
Thanks Mark.. I am trying to find a way and sharing what I find. It's a challenge and I like that.
Ernst
Hello I want to stay anonymous and I want to explain why no one who is serious will actually do this. To begin I was reading over some old AIM (Instant Messenger) logs from 1999 based on what I and this german kid used to talk about.. now I am 25 years old and still don't understand half of the things he used to write to me but it all comes down to this, at the age of 15 as he states and checked by his DOB profile, I cannot get in touch with him anymore but he lived a crazy life don't know if he is in jail now or dead. So let me explain a few things he was a social outcast he used to talk about how easy he could control girls without even talking to him just by clapping his hands and staring at them he would get them to have anal sex with him. Don't stop reading because of that part it's funny I haven't understood what anal sex even met when I was 15 haha. He was socially awkward to talk to becasue of his vocabulary I can't say I was the best influence on him either when ever he used to talk in big words I would say just thats awesome and laugh and thats how it conversations ended. He used help me in magicially turning any shareware program into a freeware and tell me to enjoy it =]. Now I figured out he was a hacker. He also used to talk about how he takes some kind of drugs which induced attention deficit disorder which was half of the key and the other half was empirically (still have problems understanding what this ment) he needed to become into a millionaire by the age of 17 he never logged in AIM after that point he did mention he was rich. I don't know what made him into a hacker but he could take any program apart and recreate the source code to it in less then a week thats what he did to turn RICH and I was a big part of him becoming rich as I told him about this cool software which can detect letters in pictures like a human brain back to text now called OCR (optical character recognition) it was pretty primitive at that time and was a shareware, he quickly made it a freeware for me and continnued to turn it back to source code and work on improving it, later patenting his improvement at a VERY young age i may say 16!. Not sure exactly what he patented it since it was already patented I believe? but he started to license it to websites which were for blind/color blind people, which he later said after I asked him that blind people were definetly not using them it was for automated mail hackers. Thats pretty much the story I have logs here which are still VERY cryptic for me to understand and I'm pretty dang mature. Most of them talk about compression and life and a bunch of other things about space and aliens and things I found useless. He states some funny things he must of been a bit immature back then like he writes that life is just a closed circuit with in a cycled loops. I think he ment environmental evolution? rofl If I could still talk to him I'd comment about that by saying yes life is a cycle :P but its no dang circuit its but its a process of SEX, baby then and age which gives wisdom lol. His compression theories talked about how in life the compression take many cycles to unpack and it may take years for it to fully unpack things that could be done in just a month using a CPU. Haha yah he was pretty nutty talked about evolution I am guessing. I won't go into much details here because I'm still trying to understand half of his theories maybe I could patent some of them and get rich myself.
But I wrote all this to tell you that hackers could turn programs into source code so don't risk it even though hackers may be crazy like this guy but they exist.
I'm learning Visual Basc 6 and I may be able to recreate what this german guy was talking about near the future at the moment I'm struggling with bitwise stuff and loops at the moment it's progress.
If lets say someone did submit a program that qualified for the challenge, what are the steps you would take to submit it to the scientific community for verification, and so on??
@joe:
Verification would be done by me, maybe with whatever help I can get from the folks at comp.compression. Verification is not a matter of science, more of forensics.
Now, if you choose to disclose the algorithm, that opens the door to more interest from other people.
- Mark
Wow we took a turn for the Surreal.
Hey Whatever on the long Story.. I can get it but I doubt anyone else here would be able to get the story..
Anyway.
Update.
I now understand how to recode all binary strings 1 to 1 under simple integer count. I'll be looking at one exploit that came to mind, next. I only hope I don't become homeless here too soon. The Job market is still oppressing economic opportunities.
Does anyone recognize this integer sequence? 13,128,23,217... ? It's my Q0-B.
Just a shout out to all you other would be Nut Cases trying this challenge.. Catch me if you can!
Ernst
Hey Mark,
where i have to go to get my 100,- dollar if i had a solution for this problem?
@lilalula:
You don't have to go anywhere, I'll get you the $100 wherever you are - at least if you take Paypal!
- Mark
Hey Mark,
OK. What is about my idea?
I think i can compress any kind of data(even the AMillionRandomDigits.bin file). Well not endless but to a smaller amount. The compression is effective only on files with a certain minimum size. The bigger the file the better is the compression rate. If you're interested let me know. I'll give you some more detailed infos on this.
@lilalula:
Hey, the rules of the contest are right out there - don't tell me how you are going to do it, just do it.
You write a program that can compress any file without resorting to various cheating tactics and you'll be famous, don't worry about it.
- Mark
Hey Mark,
How are you today? So i'm back and i found a Way to do it. I was paying arround with the "AMillionRandomDigits.bin" file and it worked fine. Also i tryed to run some more tests with my program and i found this here:
http://www.stat.fsu.edu/pub/diehard/
This is a cd with 4.8 billion random bits in sixty 10-megabyte files. You can go there and download this files to see for your self.
OK. here it is:
I can save 16 MB(16777216 Bytes) data(ANY data) in a file with a size of 11296 Bytes. My program don't compress the original data. All i store is the information in the original data. It is mutch like tranforming it to something else. It is like tranforming energy from one to another form. Like making heat out of wood or electricity out of wind.
During my work on this project i realized that this is more than 100,- Dollar worth. So you can keep your bugs. If you want proof for my program than look out for a project with name ALICE. Also i can meet you to show you how it works. I hope you understand that i can't send you this program.
cu
lilalula
PS: for more questions: you got my email.
@lilalula:
It is rather trivial to prove that you cannot save any 16 MB file in an 11K container. Sorry, the pigeonhole principle shows that you are incorrect without even breaking a sweat.
The purpose of the challenge is not so much to get $100, it is to prove to the world that you can do it. I don't think anyone is going to believe your boast until you actually prove it.
- Mark
Hey Mark,
Well basicaly you are right it is not possible to save 16 MB in an 11KB container. As i sayed this is not what i'm doing. I'm not trying to save the 16MB content somehow in a smaller container. All i save is the information from the 16MB.
OK, i will make you an "dekompressor"(i don't like this word because it's not right) and i send it to you. You can use it ONLY to restore the original file. After you have the program you send me a file. I'll send the container file back to you. you can use the programm and restore your original file. Would you be happy with this as proof?
Hey Mark,
It's me again. You know that everybody was 100% sure that the earth is the center of the universe? Until 1610 a Guy in Italy(Galileo Galilei) profed that this is not right. He almost got killed for this. This is just one example of our way to understand things. It is often just simpler to belive what we don't know.
@lilalula:
Yes, your proposed proof is fine with me.
>Until 1610 a Guy in Italy(Galileo Galilei)
>profed that this is not right
This is a nice little story, but of course it was Copernicus who was credited with advancing the notion of heliocentrism, not Galileo.
- Mark
looks like somebody did homework...
Send me an e-mail so i get your e-mail.
Found another claim for recursive compression:
Even "explains" how it works :p
http://www.recursiveware.com/
@TonioRoffo:
Awesome site, thanks for the link. Looks like the guy is having trouble getting the final implementation complete though... Think I'll contact him.
- Mark
Well, I looked at this. Curiously, as I wrote pictorial representations of the phenomena described, I came up with a pattern that fitted the concept "a million random digits" and "so-many bytes of (supposedly) -incompressible 'random' 'data'" BEFORE I saw the direction Mark's description was going. This suggests my method so far is consistent with what Mark describes.
I found a very interesting pattern. I then intuitively found a candidate method to fulfill the "The challenge is to create a decompressor + data file that when executed together create a copy of this file."
I then altered my method to allow for the requirement that the information be in the form of "bytes".
Curiously, BEFORE I READ of the notion of "negotiating how to measure the result etc.", I produced the notion "so you get an approximation of bytes (cannot measure it))".
My intuition suggests "if write the program in C++" one could measure it with (say at this stage) any other 2 programs...?
Of course, the technical details of what I found could do with sponsorship more than $100 as I could do with a freehold house!
The alleged problem of "recursive compression" is apparently "solvable" ("what problem? " one may ask......?)
by a simple but very clever technique- true there is a natural "limit" but this is very low it appears.
The general subject area of ultra "compression" is what I call "space computing", it has possible ramifications for the study of astronomy.
Dr. Wong seems clued up;
5-d floating characters
nice work I think
yes Sir.
I can go much further apparently (If higher rate of data storage minimisation of container ( ) required)
with hyperspace bypass technology
entropy "vanishes" in a very delightful 'way'
!
Is this challenge still on? I love the idea. However, $100 is a very small incentive for a standing challenge 10 years old. The Clay Math Institute (http://www.claymath.org/millennium/) offers a million dollars. How about submitting this conjecture to them? Or, maybe you would be willing to up the incentive to say $10,000?
@jjdawson7:
Yes, the challenge is still on. Many have tackled the problem, but of course, nobody has even come close to defeating it.
I would love to offer a million dollar prize, but of course, I don't have a million dollars. And the Clay Math Institute would never have a challenge like this.
Anyone who manages to beat this problem will certainly get fame and acclaim, and I will do my best to help them.
- Mark
The Clay Math Institute wouldn't have this kind of challenge, but James Randi might accept a solution to this problem as winning his "One Million Dollar Paranormal Challenge" (http://www.randi.org/site/index.php/1m-challenge.html). One would need to compress it by a significant amount, though, not just a token reduction as you require.
Quote: "...nobody has even come close to defeating it."
I doubt this is accurate; I quite likely solved your challenge, and many others (such as all seven Clay Institute Millenium problems, and "the Goldbach Conjecture"- all these connect to a revolution in science that may be as profound and far-reaching as the Copernican Revolution.)
It is time "skeptics" stopped being so quick to discount these possibilities- to realise "the stone that the builder rejected possibly has become "the corner stone", as the saying .... ............... "
The reason why the Clay Institute problems are so "hard" is perhaps the manner in which the construction of these problems interacts with how mathematicians do math.
As my living conditions are very difficult in several ways, I would prefer to find a sponsor or commercial avenue to releasing what I have (apparently) discovered.
I now have maybe over 14 different ways to look at this discovery- if it was in error it wouldn't sustain so many perspectives?
I need a salesperson to find me a deal
.
@Alan:
> I quite likely solved your challenge
You "quite likely" solved it? Alan, either you did, in which case you could provide proof, or you didn't. There is no maybe about it.
The challenge is still open for you to solve, but until you do so, perhaps you should stop posting comments and get to work.
- Mark
I worked the apparent 'solution' on a piece of paper. That is the "proof". For intellectual property reasons, and severe living discomfort reasons, I consider it unwise to reveal the contents of the piece of paper without using a non-disclosure agreement and dealing with a genuine potential sponsor or licensee.
Surely you knew this would likely happen if someone solved it? Who can afford to give away that potential new technology?
I can tell you, the piece of paper includes drawings, i.e. patterns.
It was a rather strange problem, but was it was nice to find what appears to be the resolution.
@Alan:
Sorry Alan, you are wrong. You did not solve it, you will never demonstrate it, and you are living in a world of delusion. I've seen too many like you to be fooled.
You will always have some excuse for not demonstrating your results, and I expect 20 years from now you will still be whining and begging for money like you are now.
- Mark
Mark Nelson, EMI said that "guitar bands have had their day", and did not sign the band known as "The Beatles".
You have no evidence for your claim that I did not solve (or whatever) your challenge. You are apparently not facing reality- that people do not necessarily give away this sort of knowledge.
You appear to be living a self-fulfilling prophesy of destructive skepticism....... ???
A large pool of knowledge now exists, already many pages have been written for a scientist who has signed a non-disclosure agreement.
What I know is on the market.
Got to hand it to Mark. What he lacks in humility he makes up (in spades) with smarminess.
Alan, you seem to be misunderstanding the rules of the challenge. You don't need to reveal your method. Just send Mark the compressed file and the decoder (as an exe, no need for the source code), nothing else.
I don't know much about computers, this technology is (apparently) a massive jump in concept from how computer experts are thinking (though they get little bits of the idea at times)
If a file "writes its own (perfectly tailored) code", you automatically have a fantastically efficient use-of-space. The "decoder" and the ""file"" are one- a hyperspace dynamic.
This is (apparently) very new system (?)
The file was already a code- this an object-recognising system- you don't "decode" a file like this- you "upscale it"
All the space-efficiency-storage-enhanced file needs is "to see itself "in the mirror"" (laser interferometry)- one converts the file into space-efficient storage system; in doing so you create a 'run triangle' (inverted location) the file exists now "in 2 places at once".
To re-see the file as originally depicted, requires running it past 5 (or 6) directions more-or-less "simultaneously" - the easiest way to re-calibrate the file is a dot matrix calculator gizmo (like using a cross-word puzzle to unravel the location and links between each word (you have a jumble of words and the shape of the puzzle- now can find where each word fits)).
the prize is $ $ 100 or $ 100 million U.S. dollars?
@ademsam:
$100. I don't have $100 million.
- Mark
Not worth wasting time on a prize that I can Mediger up the street.
@ademsam:
>Not worth wasting time on a prize that I can
>Mediger up the street.
Not worth wasting time on a challenge that you have no chance of winning!
- Mark
I am still working on things Mark.
I found a way to drop bits but, I will have to work the program out to see if there is actual data reduction when all things are considered. It's possible that it could come in under the wire.
This design is a much more challenging structure then anything I have attempted before.
So it's a win-win situation for me. I get to improve my program writing skills and see if this algorithm can meet your requirements.
It is an extension of that dynamic unary encoding I was chatting up earlier.
Best of luck to everyone.
I'll check back in later this year.
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]