<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The Million Random Digit Challenge Revisited</title>
	<atom:link href="http://marknelson.us/2006/06/20/million-digit-challenge/feed/" rel="self" type="application/rss+xml" />
	<link>http://marknelson.us/2006/06/20/million-digit-challenge/</link>
	<description>Programming, mostly.</description>
	<lastBuildDate>Sun, 20 May 2012 20:52:57 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-545740</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Sun, 20 May 2012 20:52:57 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-545740</guid>
		<description>I have a very good system for providing an initial test of your claims. It will not compromise your work in any way, and will give some indication that you are doing what you claim.

Oddly, every time I offer this to someone who claims to have solved the problem, they decline!

- Mark</description>
		<content:encoded><![CDATA[<p>I have a very good system for providing an initial test of your claims. It will not compromise your work in any way, and will give some indication that you are doing what you claim.</p>
<p>Oddly, every time I offer this to someone who claims to have solved the problem, they decline!</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: KiwiCoder</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-545169</link>
		<dc:creator>KiwiCoder</dc:creator>
		<pubDate>Sat, 19 May 2012 22:58:52 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-545169</guid>
		<description>hey guys,
 
it is done :) tested and it works great, unfortunately commercialization is the only practical option here. I’m am not posting this to gloat rather to tell you guys that the answer is out there staring you in the face, you just need to apply more imagination than math and think out of the box.

Thanks and Good Luck to all</description>
		<content:encoded><![CDATA[<p>hey guys,</p>
<p>it is done :) tested and it works great, unfortunately commercialization is the only practical option here. I’m am not posting this to gloat rather to tell you guys that the answer is out there staring you in the face, you just need to apply more imagination than math and think out of the box.</p>
<p>Thanks and Good Luck to all</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ernst</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-544449</link>
		<dc:creator>Ernst</dc:creator>
		<pubDate>Sat, 19 May 2012 01:24:27 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-544449</guid>
		<description>Hey Ashish,

 As far as I know it started as a way to silence those claiming fantastic algorithms that do the &quot;Impossible.&quot;

 In it&#039;s long life it has been a point of humor for some. A source of ridicule for others.

 I may be the only one still working on it.  I seem to be the only one updating the blog somewhat regularly.

 Speaking of that.

 UPDATE:  

  I have been successful in generating a 16-gigabyte dataset from a 10kbyte program.
 I studied the dataset for a few weeks and have decided on the initial dimensions and layout of this Matrix.

 I am currently indexing the Million Digit file.  That will take some time the way I am doing it so I have time to ready other aspects of the encoder.
 In a few posts back I wrote about learning and designing hash functions.  I also wrote about being sidetracked with new maths for me to check out.
 Well it has come full circle and that effort with the hash functions is again the focus. Good thing I did all that work already. This hashing effort will be more fun than work I believe.
 If I can hash some of the indexing then compression is possible.
Wishful thinking suggests storing a byte in 6.5 bits.
 What is really true is until I have the indexed data I cannot test any hashing.
 It won&#039;t be long now but still it will take a while.

I thought to stop in and add to the blog.

Good luck Challenge people!</description>
		<content:encoded><![CDATA[<p>Hey Ashish,</p>
<p> As far as I know it started as a way to silence those claiming fantastic algorithms that do the &#8220;Impossible.&#8221;</p>
<p> In it&#8217;s long life it has been a point of humor for some. A source of ridicule for others.</p>
<p> I may be the only one still working on it.  I seem to be the only one updating the blog somewhat regularly.</p>
<p> Speaking of that.</p>
<p> UPDATE:  </p>
<p>  I have been successful in generating a 16-gigabyte dataset from a 10kbyte program.<br />
 I studied the dataset for a few weeks and have decided on the initial dimensions and layout of this Matrix.</p>
<p> I am currently indexing the Million Digit file.  That will take some time the way I am doing it so I have time to ready other aspects of the encoder.<br />
 In a few posts back I wrote about learning and designing hash functions.  I also wrote about being sidetracked with new maths for me to check out.<br />
 Well it has come full circle and that effort with the hash functions is again the focus. Good thing I did all that work already. This hashing effort will be more fun than work I believe.<br />
 If I can hash some of the indexing then compression is possible.<br />
Wishful thinking suggests storing a byte in 6.5 bits.<br />
 What is really true is until I have the indexed data I cannot test any hashing.<br />
 It won&#8217;t be long now but still it will take a while.</p>
<p>I thought to stop in and add to the blog.</p>
<p>Good luck Challenge people!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ashish</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-542212</link>
		<dc:creator>Ashish</dc:creator>
		<pubDate>Wed, 16 May 2012 08:30:05 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-542212</guid>
		<description>The prize amount ecouurages to try this. In a way, challange creator also doubtful about the impossibility of this work. Hence he seems has not taken much risk ;)</description>
		<content:encoded><![CDATA[<p>The prize amount ecouurages to try this. In a way, challange creator also doubtful about the impossibility of this work. Hence he seems has not taken much risk ;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ernst</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-523607</link>
		<dc:creator>Ernst</dc:creator>
		<pubDate>Sat, 21 Apr 2012 04:22:41 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-523607</guid>
		<description>on April 11th, 2012 at 8:39 am, Tobbi said:

Ernst, here are some useful comments for your work on challenge

&gt;&gt; 617 and 673 is 415241

617 and 673 is 1290, not 415241

&gt;&gt; ... but since we are all interested in prime pairs ...

log(a * b) = log(a) + log(b)

Let million-digit-file = x * y, then
log(x) + log(y) = log(x * y) = log(mdf) =~ 3320000

&gt;&gt; Hint: Well try recursive modulus.

My hint would be: Well try math!

Pardon Tobbi those are the factors of 415241 as in 617 times 673 = 415241.

 I can understand your confusion and I promise to not try and trick you again.

 Be Well!</description>
		<content:encoded><![CDATA[<p>on April 11th, 2012 at 8:39 am, Tobbi said:</p>
<p>Ernst, here are some useful comments for your work on challenge</p>
<p>&gt;&gt; 617 and 673 is 415241</p>
<p>617 and 673 is 1290, not 415241</p>
<p>&gt;&gt; &#8230; but since we are all interested in prime pairs &#8230;</p>
<p>log(a * b) = log(a) + log(b)</p>
<p>Let million-digit-file = x * y, then<br />
log(x) + log(y) = log(x * y) = log(mdf) =~ 3320000</p>
<p>&gt;&gt; Hint: Well try recursive modulus.</p>
<p>My hint would be: Well try math!</p>
<p>Pardon Tobbi those are the factors of 415241 as in 617 times 673 = 415241.</p>
<p> I can understand your confusion and I promise to not try and trick you again.</p>
<p> Be Well!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ernst</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-523606</link>
		<dc:creator>Ernst</dc:creator>
		<pubDate>Sat, 21 Apr 2012 04:17:43 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-523606</guid>
		<description>Hey Milcho!

 I see you found the challenge interesting!

 cool!  


 But hiding data is not in the spirit of the challenge for sure!

 Give it some time and maybe you will find a unique solution!

 Update :  

            I&#039;m working on bit-pattern matching using a constructed data-set.  Also I am learning about OpenMP and parallel processing along with this effort.
 So win or lose there are benefits for trying.
 

Ernst</description>
		<content:encoded><![CDATA[<p>Hey Milcho!</p>
<p> I see you found the challenge interesting!</p>
<p> cool!  </p>
<p> But hiding data is not in the spirit of the challenge for sure!</p>
<p> Give it some time and maybe you will find a unique solution!</p>
<p> Update :  </p>
<p>            I&#8217;m working on bit-pattern matching using a constructed data-set.  Also I am learning about OpenMP and parallel processing along with this effort.<br />
 So win or lose there are benefits for trying.</p>
<p>Ernst</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Milcho</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-518803</link>
		<dc:creator>Milcho</dc:creator>
		<pubDate>Fri, 13 Apr 2012 20:08:42 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-518803</guid>
		<description>@Mark

It makes sense, the rules of the game are to actually compress the data, not shuffle it around to unseen places. It took me all of one hour to write the code of what I posted above, and that&#039;s just the simplest way to use the filesystem to hide data.

It&#039;s amazing how recent some of these claims for incredible compression are, and that there&#039;s still people who buy into it. Oh well, they may be wasting their time, but it&#039;s at least slightly amusing to read about it.</description>
		<content:encoded><![CDATA[<p>@Mark</p>
<p>It makes sense, the rules of the game are to actually compress the data, not shuffle it around to unseen places. It took me all of one hour to write the code of what I posted above, and that&#8217;s just the simplest way to use the filesystem to hide data.</p>
<p>It&#8217;s amazing how recent some of these claims for incredible compression are, and that there&#8217;s still people who buy into it. Oh well, they may be wasting their time, but it&#8217;s at least slightly amusing to read about it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Nelson</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-518766</link>
		<dc:creator>Mark Nelson</dc:creator>
		<pubDate>Fri, 13 Apr 2012 19:17:51 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-518766</guid>
		<description>@Milcho:

One of the reasons that I have the rather vague comment at the end is because I don&#039;t want to get in a never-ending contest with people who find clever ways to hide their extra data. I can&#039;t anticipate every possible way to do this in advance, and I don&#039;t really have any interest in doing so.

For example, let&#039;s say I limit things to one file. Well, someone will come along and create a directory structure with 400,000 entries, and just one file.

Then I say no, it all has to go in a single directory. Then they create a file name with 400,000 characters in the name.

Really, who cares? This is not interesting and I don&#039;t want to deal with it.

And so far, this has worked okay, I haven&#039;t really had anyone suggest that they have met the challenge while staying within the rules.

- Mark</description>
		<content:encoded><![CDATA[<p>@Milcho:</p>
<p>One of the reasons that I have the rather vague comment at the end is because I don&#8217;t want to get in a never-ending contest with people who find clever ways to hide their extra data. I can&#8217;t anticipate every possible way to do this in advance, and I don&#8217;t really have any interest in doing so.</p>
<p>For example, let&#8217;s say I limit things to one file. Well, someone will come along and create a directory structure with 400,000 entries, and just one file.</p>
<p>Then I say no, it all has to go in a single directory. Then they create a file name with 400,000 characters in the name.</p>
<p>Really, who cares? This is not interesting and I don&#8217;t want to deal with it.</p>
<p>And so far, this has worked okay, I haven&#8217;t really had anyone suggest that they have met the challenge while staying within the rules.</p>
<p>- Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Milcho</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-518127</link>
		<dc:creator>Milcho</dc:creator>
		<pubDate>Thu, 12 Apr 2012 23:21:13 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-518127</guid>
		<description>I&#039;m curious why multiple files are allowed, I think that&#039;s a loophole.

From what I read, storing data in the filenames is not allowed. I read before that someone &quot;beat&quot; a $5000 compression challenge in this way before, so I can see what&#039;s trying to be avoided. (notice &quot;beat&quot; in quotes)

The only loophole I see is this: would enumerating files be considered storing data in them? I mean something like: 001.file, 002.file, 003.file, 004.file etc.

Because just the sheer presence of these files provides information that otherwise does not exist. For example if a certain bit pattern that repeats in the original data is found, then you could take that bit pattern out. Normally, you&#039;d have to store at which locations you took the pattern out of, so it can be re-inserted, and this would quickly add up to more info than you saved.

But, if instead you split the file each time you encountered this bit pattern and stored these pieces of the original file in enumerated sequential files, you wouldn&#039;t have to store the information of where to re-insert the bit pattern. The concatination would simply be 001.file + bit pattern + 002.file + bit pattern + ... etc. 

The question is only to find such a bit pattern. This isn&#039;t difficult:
The file is 415,241 bytes. A single byte can have 256 separate states (meaning there are 256 unique bytes)

By the pigeonhole principle then, just under 415,000 of those bytes have to be a repeat of a previous byte that&#039;s already been seen.
In one extreme, each one of the 256 bytes would be repeated about 1,600 times, and in the other extreme, 1 of these bytes would be repeated the whole 415,241 times.

Since if only one byte repeated, this data wouldn&#039;t be very random, the case here is probably closer to the first one, where the most any single byte is repeated is ~1,600 times. 

So, we find this most-often repeating byte, and split the data each time we encounter it, storing each new chunk in an enumerated file. 

The &quot;decompressor&quot; should simply store the value of this byte, and then concatenate these files together, each time inserting this byte between two files. 

So, basically, just by having enumerated files, we should save about 1.6kb (in the worst case scenario). The source code for the concatination program could potentially be written in under 1.6kb - it is rather simple. 

This process can also be repeated with nibbles (4bits) and 2bit patterns that repeat - there should be even more of those repeating.

But, even if this works, this is NOT quite in the spirit of the challenge, and isn&#039;t compressing the data, so much as just hiding the information elsewhere.

To be clear, I am indeed a programmer by profession, but I have NOT programmed this. I found this article today at work and the idea occurred to me when I read some of the comments later. 

I am NOT CLAIMING to be able to compress this (or any other) random data - only that there&#039;s a potential loophole in the rules (if Mark allows enumerated files). I am also well aware of the quite solid proof of the counting argument of why no method can compress all files.

If I do have a free time, I&#039;ll try to write a program to run this sort of data-hiding scheme, to see if it would really end up being smaller.</description>
		<content:encoded><![CDATA[<p>I&#8217;m curious why multiple files are allowed, I think that&#8217;s a loophole.</p>
<p>From what I read, storing data in the filenames is not allowed. I read before that someone &#8220;beat&#8221; a $5000 compression challenge in this way before, so I can see what&#8217;s trying to be avoided. (notice &#8220;beat&#8221; in quotes)</p>
<p>The only loophole I see is this: would enumerating files be considered storing data in them? I mean something like: 001.file, 002.file, 003.file, 004.file etc.</p>
<p>Because just the sheer presence of these files provides information that otherwise does not exist. For example if a certain bit pattern that repeats in the original data is found, then you could take that bit pattern out. Normally, you&#8217;d have to store at which locations you took the pattern out of, so it can be re-inserted, and this would quickly add up to more info than you saved.</p>
<p>But, if instead you split the file each time you encountered this bit pattern and stored these pieces of the original file in enumerated sequential files, you wouldn&#8217;t have to store the information of where to re-insert the bit pattern. The concatination would simply be 001.file + bit pattern + 002.file + bit pattern + &#8230; etc. </p>
<p>The question is only to find such a bit pattern. This isn&#8217;t difficult:<br />
The file is 415,241 bytes. A single byte can have 256 separate states (meaning there are 256 unique bytes)</p>
<p>By the pigeonhole principle then, just under 415,000 of those bytes have to be a repeat of a previous byte that&#8217;s already been seen.<br />
In one extreme, each one of the 256 bytes would be repeated about 1,600 times, and in the other extreme, 1 of these bytes would be repeated the whole 415,241 times.</p>
<p>Since if only one byte repeated, this data wouldn&#8217;t be very random, the case here is probably closer to the first one, where the most any single byte is repeated is ~1,600 times. </p>
<p>So, we find this most-often repeating byte, and split the data each time we encounter it, storing each new chunk in an enumerated file. </p>
<p>The &#8220;decompressor&#8221; should simply store the value of this byte, and then concatenate these files together, each time inserting this byte between two files. </p>
<p>So, basically, just by having enumerated files, we should save about 1.6kb (in the worst case scenario). The source code for the concatination program could potentially be written in under 1.6kb &#8211; it is rather simple. </p>
<p>This process can also be repeated with nibbles (4bits) and 2bit patterns that repeat &#8211; there should be even more of those repeating.</p>
<p>But, even if this works, this is NOT quite in the spirit of the challenge, and isn&#8217;t compressing the data, so much as just hiding the information elsewhere.</p>
<p>To be clear, I am indeed a programmer by profession, but I have NOT programmed this. I found this article today at work and the idea occurred to me when I read some of the comments later. </p>
<p>I am NOT CLAIMING to be able to compress this (or any other) random data &#8211; only that there&#8217;s a potential loophole in the rules (if Mark allows enumerated files). I am also well aware of the quite solid proof of the counting argument of why no method can compress all files.</p>
<p>If I do have a free time, I&#8217;ll try to write a program to run this sort of data-hiding scheme, to see if it would really end up being smaller.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tobbi</title>
		<link>http://marknelson.us/2006/06/20/million-digit-challenge/comment-page-7/#comment-517146</link>
		<dc:creator>Tobbi</dc:creator>
		<pubDate>Wed, 11 Apr 2012 14:39:50 +0000</pubDate>
		<guid isPermaLink="false">/2006/06/20/million-digit-challenge/#comment-517146</guid>
		<description>Ernst, here are some useful comments for your work on challenge

&gt;&gt; 617 and 673 is 415241

617 and 673 is 1290, not 415241

&gt;&gt; ... but since we are all interested in prime pairs ...

log(a * b) = log(a) + log(b)

Let million-digit-file = x * y, then 
log(x) + log(y) = log(x * y) = log(mdf) =~ 3320000

&gt;&gt; Hint: Well try recursive modulus.

My hint would be: Well try math!</description>
		<content:encoded><![CDATA[<p>Ernst, here are some useful comments for your work on challenge</p>
<p>&gt;&gt; 617 and 673 is 415241</p>
<p>617 and 673 is 1290, not 415241</p>
<p>&gt;&gt; &#8230; but since we are all interested in prime pairs &#8230;</p>
<p>log(a * b) = log(a) + log(b)</p>
<p>Let million-digit-file = x * y, then<br />
log(x) + log(y) = log(x * y) = log(mdf) =~ 3320000</p>
<p>&gt;&gt; Hint: Well try recursive modulus.</p>
<p>My hint would be: Well try math!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

