
Ten years ago I issued a simple challenge to the compression community: reduce the size of roughly half a megabyte of random data – by as little as one byte – and be the first to actually have a legitimate claim to this accomplishment.
Ten years later, my challenge is still unmet. After making a small cake and blowing out the candles, I thought this would be a good time to revisit this most venerable quest for coders who think outside the box.
Some History
In George Dyson’s great book on the early history of electronic computers, Turing’s Cathedral, he describes how much of the impetus for computation in the 40′s and 50′s was from the US military’s urge to design better fission and fusion bombs. A powerful technique used in this design work was the Monte Carlo method, which relied on streams of random numbers to drive simulations.
The problem then, as now, is that coming up with random numbers is not always an easy task. John von Neumann was intimately involved in all this, and is famously quoted as having said:
any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.
The book describes how the RAND corporation, flush with bomb money, took on the task of using physical processes to generate a handy bucket of random numbers, eventually published as A Million Random Digits with 100,000 Normal Deviates.
Since my tax dollars paid for those numbers, I thought it only fair that I make it the basis of my challenge. I took the decimal digits, converted them to a base two number, stored it in AMillionRandomDigits.bin, and challenged the world to compress it. (The original USENET post was followed with a more findable blog posting that contains a bit more detail.)
Ten years later, there have been no serious entrants, although there are a few dedicated souls who are continuing to attack the problem. Unfortunately for all who are working on it, it seems that those RAND scientists back in the 50′s did a really, really good job of scrubbing those numbers.
The 2012 Edition
For a few different reasons, it is time to close down the older versions of the contest and issue an updated 2012 challenge. Nothing much has changed, but over the years I’ve bumped into a few points of confusion that I can clear up, and in addition, I would like to slightly widen the contest’s scope. And most importantly, the comments section of the previous contest is way out of hand, and issuing an update lets me close that stream down and open a new one.
For the 2012 edition, I am actually giving entrants two possible ways to win the prize. The first challenge is essentially a reprise of the original, with minor tweaks and updates. The second poses a more difficult problem that is a superset of the first.
Meeting either challenge brings you worldwide fame, a cash prize of $100, and the knowledge that you have defeated a problem that many said was untouchable.
Likewise, both problems are governed by one meta-rule which seeks to implement an overarching principle: the point of this is to win algorithmically, not to game the contest. I will disqualify any entry that wins through means such as hiding data in filenames, environment variables, kernel buffers, or whatever. Someone can always find a way to win with monkey business like this, but that is beside the point of the contest. No hiding data.
Challenge Version 1 – A Kolmogorov Compressor
The original version of the challenge is basically unchanged. Your goal is to find the shortest program possible that will produce the million random digit file. In other words, demonstrate that its Kolmogorov complexity is less than its size. So the heart of Challenge 1 is the question of whether the file is compressible à la Kolmogorov and standard, general purpose computing machines.
The interesting part about this challenge is that it is only very likely impossible. Turing, and Gödel before him, made sure that we can’t state with any certainty that there is no program of size less than 415,241 bytes that will produce the file. All it takes is a lucky strike. Maybe the digits are a prime? Maybe they just happen to be nested in the expansion of some transcendental number? Or better yet, maybe the RANDians overlooked some redundancy, hidden somewhere in a fifth order curve, just waiting to be fit. There are no telling how many different ways you could hit the jackpot.
However, the dismal logic of The Counting Argument tells us that there are always going to be some files of size 415,241 bytes that are not compressible by the rules of Challenge 1. And of course, it’s actually much worse than that – when you cast a critical eye on the task, it turns out that nearly all files are incompressible. But for a given file, we don’t really have any way of proving incompressibility.
Rulings
I want everyone to have the best chance possible to win this challenge. The basic rule is that your program file, possibly combined with a data file, must be less than 415,241 bytes. Clarifications on questions that have come up in the past included:
- Programs written in C or some other compiled language can be measured by the length of their source, not their compiled product.
- Using external programs to strip comments, rename variables, etc. is all fine. It would be nice to have the unmangled source available as your input, with the mangling step part of the submission process.
- Programs that link to standard libraries included with the language don’t have to include the length of those libraries against their total. Hiding data in standard libraries is of course not allowed. (And don’t even think of hiding it in the kernel!)
- Source code can be submitted in a compressed container of your choice, and I will only count the bytes used in the container against you.
- Likewise, any data files can be submitted in a compressed container, and I will only count the bytes used in the container against you.
- You own the code, and just because you win, I don’t have the right to publish it. If you insist on an NDA I may be willing to comply.
- In general, you need to submit source code that I can build and execute in a relatively standard VM. If you are paranoid and insist on binaries only, we might be able to come to terms, but no guarantees.
- The nature of this contest is such that gaming the rules is pointless. You aren’t entering in a quest to beat the rules, you are entering in a quest to beat the data.
- Your program might take a long, long time to run, but we will have to draw the line somewhere.
If there is anyone who deserves to beat this file, it is Ernst Berg, who has been relentlessly attacking it from various directions for years now. Ernst doesn’t seem to be doing this to feed his ego – he’ll share his results with all comers, and is always willing to listen to someone’s new approach. I consider him to be the unofficial Sergeant-at-Arms of this enterprise.
But Ernst will also be the first to tell you what a harsh mistress the file can be – always taking, but never giving.
Challenge Version 2 – A General Purpose Random Compressor
Challenge 1 is interesting because it is nearly, but not assuredly impossible. Challenge 2 is more along the lines of troll bait, because it is patently impossible: create a system to compress and then decompress any file of size 415,241 bytes. In other words, create a compressed file that is smaller than the input, then use only that compressed file to restore the original data.
Unlike Challenge 1, there are no size limitations on Challenge 2. Your compressor and decompressor can be as large as you like. Because the programs have to be able to handle any input data, size is of no particular advantage – there is no data to hide.
This challenge is for the contestant who is sure that he or she has figured out a way to compress the million digit file, but finds that their program takes 100 MB of space. Okay, that’s fine, we shall first see if it can compress the file. It then must be able to correctly compress and decompress a file of the same size. Let’s say, on 1,000 different files.
To keep it simple, the files will be simple permutations of the million digit file – scrambled with an encryption system, or perhaps a random number generator, or whatever. The point is, they should have all the same numeric characteristics as the original file, just organized slightly differently.
Again, I will emphasize that Challenge 2 is at its heart provably impossible to beat. No program can compress all files of a given size, and the chances of any program being able to compress 1,000 different files of this length is so vanishingly small that it can safely be ruled out, even if every computer on earth was working on it from now until we are engulfed in flames during Sol’s red giant phase.
Conclusion
It seems unlikely that there will be any developments in lossless compression that change the terms of this contest. No doubt I’ll reissue the challenge in another ten years or so, but if it is beatable, the tools are already at hand. Good luck to those of you who are tackling the challenge. Beating this will not get you the fame associated with something like the Clay Millenium Prizes, but in the small world of data compression, you will sit alone on a throne of your own making, and deservedly so.
167 users commented in " The Random Compression Challenge Turns Ten "
Follow-up comment rss or Leave a Trackback> However, the dismal logic of The Counting Argument tells us that there are always going to be some files of size 415,241 bytes that are not compressible by the rules of Challenge 1
From my understanding the counting argument shows it’s clear that for a given compressor, there MUST be some files which are incompressible. Indeed, most, meaning definitely over 50% would be incompressible, since any compression must reduce by at least 1 bit, which cuts the number of possible representations in half.
But I think the counting argument doesn’t say that there’s a set of files which cannot be compressed by any given compressor.
More precisely I don’t think that the statement: “There exists a file F, such that for all possible functions C, size_of(C(F)) >= size_of(F)” is provable only with the counting argument.
Actually I guess the above statement, if true, would basically say that there exists a piece of information for which it’s Kolmogorov complexity is irreducible. I’m not entirely familiar with all the theorems in this field, so this could have already been disproven, or maybe proven, though it doesn’t sound like it could be proven at least.
Anyway, I’m not sure if that’s what you were saying in your post, but that line sort of sounded like that.
Regardless, great post, I especially like the v2 of the challenge, since I’m sure I’ll see many who’ll claim to have solved it :)
@Zen:
>But I think the counting argument doesn’t say that
>there’s a set of files which cannot be compressed
>by any given compressor.
No, you are right about that. I think that it would be fairly easy to disprove that idea.
Kolmogorov doesn’t care what your Turing Machine looks like, so in principle, I could have a machine that has a one byte instruction that then punches out the MRDF to the tape, showing that I have compressed that file down to 1 byte.
It’s sometimes frustrating for people to think that there are no absolute measures of information, but it ends up all being relative to the model.
I think you might be able to get into the area of absolute incompressibility if you start talking about something like Chaitin’s number omega, but it’s a little over my head I think.
- Mark
Just a reminder:
As far as I can currently ascertain, with limited time at hand, I may have solved, comprehensively, this challenge, in theory (I am not a computer programmer, but have worked out an apparent logical answer, that is sufficently highly defined that I have very precise descriptions of it, though it is an extraordinary thing, being very “holistic” in nature.)
Due to the possible commercial value of the large group of inter-related discoveries of which apparently solving this challenge is one, I am limited in what I can say. I can say that I have also apparently solved, or found insights into, each Clay Institute Millenium problem, and the Goldbach Conjecture, the Godel Incompleteness Theorem, and the Turing Halting problem.
What can I say, then? The million digits, however “random” it is claimed to be, has an Archilles heel- it has a limit on its size, and it has a very precise order. There are only so many ways the million digits can be arranged, and only so many higher-level structures it can contain, and for every such higher-level structure, “limits” occur on whatever other such structures can exist.
This is not a problem to necessarily be ‘solved’ by being restricted to “Shannon limit” or “probability” or the ‘counting argument”: the million digits is ALREADY a “counting” (a roll call of the digits you could say..) “argument” (alternating bias)- a varying density of zeros and ones- you don’t solve that with ‘pigeon holes’ – you solve this with un-pigeon holes (multiple possibilities juxtaposed in varying higher dimension ‘spaces’ – like a Sudoku puzzle on ice cold ginger beer- string theory going backwards and forwards (in time)) – you end out with a “solution” that only becomes “real” when all the digits are accounted for.
I found a new potential proof re: defeating Godel’s Incompleteness Theorem: if you consider the 6 quarks as associated with the phenomenon of conservation of measurement, then 6 x 6 gives quarks blur, but 6 x 7 gives quarks that are there at some scale but not at every scale.
That means “units” are defined (so mathematics- which depends on the counting of units) purely by means of a structural system that is fully integrated, and differentiated, with of its own content- it is a universal higher dimension space- one could say everything inside is potentially or partially “at right angles” to everything (else)- except its wholesomeness.
6 x 7 = 42, if you have 42 axioms, then each axiom partially may deconstruct the imaginary units, but in doing so, it must affect other such units in such away that the idea of “axiom” and the notion of “unit” become complementary over the whole thing (which forms a haphazard looking “square” floating as it were.
cheers
Alan
for what it is worth:
looking at Wikipedia, list of maths problems:
the 4th, to construct all metrics where lines are geodesics, appears solved by the apparent system to minimise space to store the million digits.
the 9th, to find most general law of reciprocity theorem in an algebraic number field, appears to be soved by Einstein Relativity
the 1st, to find a set strictly between integers and real numbers, ‘Continuum hypothesis’, appears to be such that it inviolves a floating set, or an actual ‘axiom of choice’ in that the rule is you get to make a choice (I.e. optional elements in sets)
the 6th, axiomise all of physics, is curious, in that if physics is really “imaginary mathematics”, what is there to axiomise?
Re; 8th, Riemann, and twin prime conjecture, e.g. the twin primes 41 and 43; well given the role of 42 (as neither prime nor not-prime (In higher dimensional space or math)), 43 minus 1 is like 42 plus 3 (Or 4 or 5 or?)- ‘priming’ and ‘counting’ become ‘interchangeable’ -so if you can count, you can still find primes, that ‘(as if) ‘count reverse’ ( 2 back as 1 forwards)?
chance that the 10th- re: Diophantine equation- that there is ‘one’ such algorithm, but applied to each scenario (e.g. the million digit challenge) gives a unique ‘algorithm’ for each scenario
@Alan:
Disproving the Continuum Hypothesis merely requires that you create a set with greater cardinality than aleph-0 and less than that of the reals. It’s always nice when all you need is a counter-example.
Proving it to be true is beyond the scope of my understanding.
Be careful though, thinking about transfinite sets may have contributed to Cantor’s mental illness.
- Mark
Cool Update Mark!
Wow what a nice comment about me. It is true that I started knowing nothing and have worked it like a “new boy should.” http://www.youtube.com/watch?v=k4VFFBCa5Aw : Heh :) I spend enough time with it.
My official status is still as it was; the file has been divided into 83,049 40-bit words and is currently being compressed when a match is found in a number generator I wrote. The number of bits to generate the 40-bit word from the Million Digit File by the number generator are fewer than 40-Bits. The fewer is variable but no less than 1 bit.
The Pace is slow with over 100 matches a day and a total number of candidates being Friday, October 12 2012 4385 @ 04:44 AM 5.280015% of 83,049
This number generator will be reviewed and I may wish to improve the number of matches by redesigning but the current reality is that while I have only a possible 5%, not counting duplicate codes for some same elements, I have a possibility to “re-compress” the “compressed” matches so far.
This possible re-compress could demonstrate what we have long thought impossible.
So, not knowing what maths are actually in play here, I want to see if I can do this “recursive-compression” and back it up with recursive-decompression.”
If so then it is time to “show and tell.” I trust after all these years you guys know I am serious.
So Props on the new forum and it’s nice to still be in the running. As always Good Luck Challenge People!
This Challenge is possible to win!
Oh my,
Challenge 2 is not about a Challenge it’s a Safety isn’t it Mark.
I understand the mechanics now. I have intimate insight.
What Challenge 2 is isn’t a new version it is Challenge 1 to a Power-Of.
Generating a set of files based on MDF is easy. I can show 5 million+ files for any file you like of that exact length.
If I am to write a general purpose encoder using what I am exploring now as to satisfy Challenge 2 I would need a team and fantastic resources.
I would believe something can be crafted and it may not be anything I can envision at this point. I have a glimpse of possible maths but not a command of such to reach that goal.
Hell You and all your friends cant even do MDF and you offer Challenge 2.
So I am still a Challenge 1 guy. I simply don’t see Challenge 2 as anything but a Safety for you. That is sad.
Sorry Mark, Joe Biden has influenced me. Here is a Video enjoy!
http://www.youtube.com/watch?feature=endscreen&NR=1&v=w-NshzYK9y0
Mark:
a jigsaw puzzle would appear to be a set whose content satisfies the requirements of the “Continuum hypothesis”….
I don’t know about “transfinite sets”- sounds like “across finity” – also a jigsaw it appears.
re: “Cantor’s “mental illness”"- It is worth noting that this is a metaphorical use of the concept “illness”- conflict, disagreement, dissaproved conduct, and conduct that annoys people, or conduct that “gets under someone’s skin, so-to-speak”, is not illness. Illness is either physical, or metaphorical. It was once claimed that slaves who tried to run away, were somehow not responsible for their personal desires, but that (acording to those who wished to think that they could somehow control them) they were allegedly suffering from an alleged ‘mental illness” called “drapetomania” (that supposedly made them run away!)
Of course, it is a myth, there are no such “illnesses” (except in the minds of people who try to coerce others and try to hide behind a rhetoric of “health”.)
(There is, however, such a thing as disagreement, and annoyance, and puzzlement by one person re: the actions of another).
Curiously, real (physical) illness appears to be all about minding (or person-environment communication), and so-called “mental illness” is all about people NOT minding (people trying to “bash” others up, usually out of fear of (re) discovering how much freedom there really is (both for them, and for others) in reality (which tends to jar with their own (often parent/society “programmed” (or influenced) subservience to so-called authority…?)
(For more information on fake medicine (i.e. what is known as “psychiatry” see http://www.szasz.com)
I must apologize;
I became frustrated whit what I see as the need for fantastic resources, ( computer power, storage and talented people assisting me ) over the Newer Second Challenge.
I was angry because I can see a possibility that a general compressor can be crafted; at-least in theory from my point of view. I do not see how I could accomplish such even with my nice 8-Core AMD and 32 GB of ram.
————–
So good luck anyone who attempts the second challenge. I want to but I don’t see how I can with my resources. Maybe that will change.
As to it being possible? We still don’t know what Dark-Matter and Dark-Energy are so can there be things we don’t know? Absolutely! Infinite compression could be one and the maths for it too.
I think there may be an answer to challenge 2, it is an integrated _________________ (censored).
Note: the data is ALREADY “compressed and decompressed” (I.e there will presumably be “wide bits” (lots of spread re: the zeroes and (the) ones, and “narrow bits” (bunching of zeroes (e.g. 0000 and ones (eg 1111)). (So to both compress and decompress it you need something called a “virtual computer”.
Re: “dark matter” and “dark energy”: I have a theory re: what these are- “dark matter” is “conserved disturbance” and “dark energy” is “conserved alternatives” (both relevant to the Million Digit Challenge.
These consepts are possibly from “continually projecting space”; if you look at a flat Atlas page, at the projection of the spherical Earth, Antarctica is typically too large, and India is too small. That is what “dark matter” and “dark energy” may be about. Astronomers are failing to accurately truly differentiate and integrate space (due to the real-time changes in the location of their instruments (how do you know where the planet Earth is, except by looking at the stars (and planets), but how do you know where these reference objects are, except by “triangulating” them from two locations on the Earth- but what about possible BOTH shifting..?
Re: Ernst’s recursive compression: if you partially differentiate hyperspace (two or more connected spaces), you will get the impression that “recursive compression” is possible, but I think you will hit a problem if you try to apply it to ALL the data. There is a way around this……
Mark:
re; “Continuum hypothesis”: I did more research on this.
First, I do not think there is such a thing as “the square root of 2″- think about it!: two quantities that supposedly match, and when juxtaposed alternately (i.e. multiplied, so creating “2 views”) create “2″? It would be less thahn 2, yet more than 2 (a hyperspace ‘object’ (or just plain “hyperspace”).
The so-called “real numbers”: claimed to be on a continuous line, which is divided. My impression is that such a line would tend to form a curve/ or circle. There is no ‘Cardinality” of the “real numbers” (number of elements in the so-called “set”), other than hyperspace quantisation (the error margin (depending on how many divisions you place on the line and where).
The cardinality of the “natural numbers” (counting numbers) is claimed to be Aleph zero (or I could call: a higher dimensional space times three): the non-(“error”) margin: the magnitude (or real-’time’ “size”) of the “set” (the number of elements in the “set” (of natural numbers) is (apparently) the number of “sets” in the (any) “element” (!!!!!)
So : the “set” of (locally) “infinitely” divisable numbers (i.e. cardinality of set of natural numbers = (local) real numbers. (How the natural numbers ‘mesh’ together (like in a jigsaw puzzle))
(The relatedness between the so-called numbers (their role in describing SPACE (or “real-time “objectivity”) (partial differential of time, full differentiating of space (or space within limits in some versions, I could say/ guess))).
The “cardinality of the Natural numbers ” is: the real “numbers” (the number of elements in the set of counting is (apparently) the number of ‘countings’ in the element (ultimate subdivisability) of ‘sets’ (assuming “real numbers” overlap (which they apparently do! )
If error margins and anti’error margins’ converge (on a magical “counting argument” “number”) you have “a transfinite set” (a set of “reals” that is (perfectly) “natural”). A limit on space (and a de-limit on time), or vice versa.
an OBJECTIVE definition of SPACE, as it were….????
Well, This is the last week of work. I cannot say if the last day is tomorrow, Thursday or Friday but it’s one of them.
I look forward to resting and coding.
The first thing is to craft the decoder and database programs. Don’t worry decode works fine I just need to update to fit this particular encoding configuration.
The Goal now is to gather up 1000 unique 40-bit matches and make the new file. 40,000 bits from the MDF.
Then once that is done I will share this file and the positional information of where these 40-bits are taken from MDF with Mark if he wants to bother with sharing it with everyone. Then I’ll attempt to reduce the first reduction again. Once that is successful then another level of compression will be added.
The decoding has to be universal so the same program that decompresses the first must also process the second and third.
This will be interesting to attempt. True I am blindly faithful to a conjecture that it will work with such a small set of data fast.
So the next couple of months should be interesting. The only drawback is I might land another job and then the time I have could be limited.
For those not familiar this “encoding” is not the traditional data compression. This is a number generator that looks for matches in the Million Digit File and can generate the target data with a code that uses fewer bits than the target data word size.
The hope is that all values can be found within the boundaries of those codes. The Counting argument gets a little bent with this approach since multiple systems are used concurrently with each having to obey it’s own counting argument reality. Together they can use the same “codes” and generate different results.
I trust I have shared enough to put many at ease as to the “claims” being made.
So I raise my coffee cup to the last week! Cheers!
I find the challenge interesting and I assume you still remember the $5000 Goldman Challenge. I still believe Patrick won.
Which brings me to this challenge using your data. I think you have to restrict the challenge to files in bytes only since to be fair using byte files you can not make the file 1 bit smaller. It has to be at least a integer multiple of 8.
Since I could map you file bijectively to a computer system where your byte file is actually smaller datawise in number of bits from scratch. Since on a bit file system you have not only the 256 files that the byte file system has for a file which is 1 byte long the bit files also has 128 file that are exactly 7 bits long. These 128 file would have to be longer than 1 byte on the normal file system. And so on for files 1 to 6 bits long.
I don’t believe and one will find a solution using bytes on a normal PC.
However I think there are several general bijective fair transforms that when applied in series to your data will result in a smaller data file. But these are all very long programs and the compression would only be a few bytes so there length alone would make them not in the running for a win. Yet if one can apply a simple series of general bijective transforms not really tuned to the data and still save a few bytes. It should raise some sort of flag than maybe the data file used was not the best since I don’t think for a truly random file (what ever that is) should be able to be compressed more than a few bytes using normal general bijective means. Anyway thats my thoughts on the subject and I belive your money safe.
It would indeed be interesting if you found a transform that saved a few bytes. It could be a start at Challenge 2, but of course it would be hard to
Back in the day some folks on comp.compression did find something like a couple of bytes worth of bias, alhtoughI don’t recall the details.
- Mark
The question is asked: “how many elements in the so-called “set of real numbers”?
What is a “real number”? I am inclined to think a “real number” is just that, a REAL number: i.e. e.g., the number of eggs in a carton (e.g. 12). A number that describes something real.
What distinguishes a real number also then is that it is “a number divisable another way” (e.g. by breaking eggs).
Hence it includes fractions.(?) Its so-called “continuity” (mathematics appears to define “real numbers”
as numbers on a claimed “continuous” line) suggests that such numbers are ‘real space numbers’- i.e. they always involve “some space” e.g. where are the limits in defining/ dividing an egg?
“Natural numbers” are apparently just that- “natural” as in numbers you use to count (with).
What then would a “real-natural” number be (a “hyper-space number”)? When you count, then it refers to something with “span”. There are 27 ways of arranging 3 beans in 3 boxes; the “27″ is I could say, a “transfinite set”, a real-natural number (you get 27 (“counting”), ONLY when you get “an argument” (a two-way bias) (Beans/boxes).
The “Continuum hypothesis” appears to be solved by such a number.
How many elements are there in the set “27 ways of fitting 3 beans in 3 boxes”?
Is the number of elements in this set, this (“trans finite set” ?), strictly between that of the integers and that of the real numbers?
The 27 ways all involve juggling beans with regard to boxes- pluses, and minuses (relative to the boxes).
But they are bigger than that (because each way creates interference in potential other ways) and smaller than that (Because each other way takes plus and minus possibilities from a particular way?). Overall, they are bigger because of “the continuum” (Beans plus boxes)?
So the set is “bigger than the (Localised) integers”?
So “the set of “trans-finite sets” is, by definition (by quantising higher dimension space), bigger than the set of integers? It counts more than just pluses and minuses, when it counts pluses, and minuses ?
The ‘real-ness” of the 27 ways comes from it describing something real, beans, and “boxes”. But in the 27, the beans and boxes finally merge? So become “less than real”- but only when fractionalised. A local limit is formed on how fractionalised the 27 can be. The set is like a dotted line? So lacks the continuity of a number line? So the set of transfinite sets always has less members than the set of real numbers?
But more than the set of integers?
Because the 27 ways of arranging 3 beans in 3 boxes is a real integer? A REAL plus and minus. A fitting together of objects in (a) space.
To solve the claimed million digit challenge, you need to find the hyperspace continuity of the (million) digits. How bunches of digits can form mutually rotating groups in hyperspace. I have apparently found how….. ??????????
Yes, there are bijective transforms. I work with such.
Nothing I know of “compresses” all files.
To encode smaller takes an external reference and that has been bit for bit in the “all files case.”
I cannot dismiss the possibility that there are some set of transforms that will do for Challenge 1.
Great thinking!
Looks like Friday for me…
Most of an email that I tried to send with Samsung’s website but it didn’t send:
Hello,
I am wondering if it is possible for an inventor to interest Samsung with apparent, theoretical, developments in information technology?
I have worked out, to at least a rough extent, many potential (un-tested, unproven)discoveries in information technology, including:
(1) possible breakthroughs in 3-D television (to be more natural)
(2) possible breakthroughs in forming holograms- using natural light
(3) possible breakthroughs in storing computer data in less space
(4) possible breakthroughs in finding hidden patterns in data (how to form “data holograms”, how to ‘rotate’ data and see it from different perspectives, potentially highlighting previously hidden patterns (such as patterns that may be hidden in the time. place. and activity data of seismometers, to potentially produce a “differential strain map” to possibly detect trends associated with eartrhquakes))
(5) possibility of breakthroughs in search engines
(6) possibility of breakthroughs in super-conductivity
(7) apparent insights or breakthroughs in each “Clay Institute Millenium problem in mathematics” (“P vs NP”, “Riemann hypothesis”, “Hodge conjecture”, Poincare Conjecture”, “Yang Mills existence and mass gap”,”Navier-Stokes existence and smoothness”,”Birch and Swinnerton-Dyer conjecture”) , and in: Turing Halting Problem, Goldbach Conjecture, Godel’s Incompleteness Theorem, Continuum hypothesis, and more, also with regards to Physics “Theory Of Everything”.
(8) possible breakthroughs in telecommunications including in: capability of copper, capability of fibre optic (including “wave division multiplexing in passive optical systems”), and many other apparent, un-tested, theoretical new technology ideas
License to use one or more of what I have discovered is something I wish to consider selling to technology companies, e.g. Samsung.
Usually a company signs a Non-Disclosure agreement re: the theoretical potential technology development they may be interested in.
Thank you,
Alan ——- (amateur theoretical physicist/ free-lance inventor)
I had a strange email this morning that Recursive compression has been done. I do see a few articles dated April 27 2012 but no major news such as Dr. Dobbs or other. Oh well.. It’s this if anyone wishes to look at the claim. http://www.prweb.com/releases/prweb2012/4/prweb9445241.htm
———————————–
My Update: A not so bold claim of random data compression but a modest claim of partial compression of the Million Digit File.
I am now off work as of this morning. I am having coffee and will get to coding tonight.
The goal is to extract at least 1000 unique 40-bit words from the MDF that have codes that compress the 40-bits to something like 36 bits I think. So If that 36 bits is the right length that means 4 bits reduction each word. 4000 bits for the subset of 1000 40-bit words. 500 bytes off of 5000 bytes.
So, it’s back On!!
I read the site on Recursive compression and the quote below is on it: from Q10 of the FAQ
START QUOTE
This will never happen, because after a file has gone through a large number of passes, it will first get into a phase of diminishing returns, and eventually reaches the optimum, when the effects of further passes would become negligible. Furthermore, even in the event two files become exactly the same at their optimum by sheer coincidence, the decompressed files will surely be different, because the two files would have gone through different numbers of passes to reach the same optimum.
END QUOTE
This is BS since if two files reach the same so called optimum where is the information stored that tells it how many times to decompress since it appears the number of times to do decompression passes differs for the two original different files that compress to same file.
Hell I can compress any file recursively to ZERO bits. Since it will decompress using my method to every finite file. The Trick is to know how many times to decompress since doing it a different number of times goes to a different file.
I will tell you for free how to do this first 256 decompressions go to every possible one byte file. the next 256*25 decompressions cover all 2 byte files. Then the next 256*256*256 decompression cover all files of 3 bytes and etc.
Recursive compression discoveries are BS. There’s a number of sources that describe the reasons why lossless recursive compression is impossible, as well as why making a general algorithm to compress *ANY* given ‘random’ data is impossible.
The arguments behind this are rock solid, in fact most of them can be covered by the counting argument. (which for some reason people often misunderstand, or just ignore)
That’s why anyone who comes on here saying they’ve solved the challenge by making some general compressor, or claiming to be able to re-compress their output is inevitably going to fail.
The only reason why trying to solve this challenge isn’t actually a pointless effort is because there’s a tiny, tiny chance that someone might find and exploit some incredibly well hidden pattern in the million digits, entirely specific to this file only, by which they can save some space. That’s the only approach that can win this challenge.
Well given something external to the source such as a number generator, there is a different realm of probability in my opinion.
I have a subset of the MDF and I will be looking for encoding for those integers that already encode the 40-bit word to say 36 bits.
So recursive compression may be possible at least in a limited sense.
So do we know everything ? Hell no..
Let us realize that we are part of “creation” and as such we observe not define.
So, that we dream of other worlds of reality my not be so “Crank” it may well be part of the multi-verse in principle.
Still if you claim it defend it.
Quoting Zen:
“Recursive compression discoveries are BS. There’s a number of sources that describe the reasons why lossless recursive compression is impossible, as well as why making a general algorithm to compress *ANY* given ‘random’ data is impossible.”
reply: the current secret to (apparently) solving the million digit challenge: via this method the _______________________________________________________________________________________________________
_____________________________________________________________________________________________________________________________________________________________________________________
The “compression” strangely enough, may actually be recursive IN HINDSIGHT i.e. AFTER it has been done like a constantly changing jigsaw that only fits at the very last (Or near last) piece
Quoting Zen:
“The arguments behind this are rock solid, in fact most of them can be covered by the counting argument. (which for some reason people often misunderstand, or just ignore)”
counting arguments can be ____________________________________ ____________________________________________________________________________________________
Quoting Zen:
“That’s why anyone who comes on here saying they’ve solved the challenge by making some general compressor, or claiming to be able to re-compress their output is inevitably going to fail.”
reply:
Any “recompression of so-called “output”" is hidden by the apparent solving technique, with all data and its arrangement non-set until the system has completed its activities (Like a Sudoku puzzle- it may be)
What is “output” and what is not “output” may be “undefined” exactly till the end of the higher-domensional processing technique…
Quoting Zen:
“The only reason why trying to solve this challenge isn’t actually a pointless effort is because there’s a tiny, tiny chance that someone might find and exploit some incredibly well hidden pattern in the million digits, entirely specific to this file only, by which they can save some space. That’s the only approach that can win this challenge
”
It can be said that may be all data contains patterns.
(Due to commercial sensitivity sections of this reply have been left blank)
A hypothetical but provable solution exists. Hypothetical only in that it cannot be done in a reasonable amount of time. It is based on a *working* prototype of a streaming OTP data encryption solution I developed.
Without giving too many details, the encryption program begins with an initial key k and plain text p. Encrypting p produces a cipher text c which is precisely 6 times the size of p, which results in key k1. Given the initial key k, one can stream decrypt (“compress”) c down to exactly 1/6th the size.
The statistical analysis ran by the diehard program of the encryption output shows the encryption method produces a cipher text c which is PRECISELY identical to any random data.
Meaning, given this encryption program, for every file of size n > 36 bytes, regardless of contents, there exists an initial key k of some size < 10,242 bytes which can stream decrypt ("compress") that file to ~ n/6 bytes if we view that file as a cipher text c.
IOW, given the random digits file of size 415,241 bytes, we can compress it, given an initial key k down to just 69,207 bytes. Resulting in key k1. Not by any means regressive compression, but a HELLOFALOT of compression nonetheless. The problem is finding the initial key k which is a very specific key that belongs to that file and that file only – OTP. That simply cannot be done in a reasonable amount of time (otherwise my encryption would be broken!) – BUT nonetheless mathematics has "an infinite amount of time" to work with… So one could argue that it is possible to compress any random data, however it is extremely unlikely that you would ever be able to find an initial key k which belongs to that file, to the tune of a keyspace of 256^10242! But there exists only ONE k and that's all that matters…
Oh, I forgot to mention, given an initial key k, the resulting cipher text c – even given the same exact input repeatedly, will result in a unique output every time. Meaning, the initial key k belongs to many different files of many different sizes. Pigeon hole, Key k combined with c, must produce k1.
@krishields:
There’s a problem with your method, in that it’s not only going to be very slow, but is most likely not going to work at all. Let me explain:
What you’re saying basically, is that you have a function E(encryption is just a type of function) that works like this (using your terminology where p = original data, k = key, and c = encrypted data)
E(p, k) = c
Now, c is 6 times the size of p, and from what you’ve said, k won’t exceed 10,242 bytes.
It’s safe to assume then, say for files p where the size of p is over 10,242 bytes, that
size_of(p) + size_of(k) < size_of(c) – where size_of(X) is the number of bits in X.
In fact, if you are going to claim to compress, this MUST be true.
Note that the MRD file is well over 10,242 bytes.
Your function then, has 2^(size_of(p) + size_of(k)) number of unique inputs. (remember, size_of is just the number of bits)
Given this, in the best case scenario, your function can then produce 2^(size_of(p) + size_of(k)) unique outputs.
However, the total number of possible bit-strings with the same number of bits as c is actually 2^(size_of(c))
Since we established that size_of(p) + size_of(k) < size_of(c), that means that 2^(size_of(p) + size_of(k)) < 2^(size_of(c))
What that means is that your encryption function will only be able to produce a tiny fraction of the actual possible strings of length size_of(c). In fact, each bit saved, halves the number of possible outputs – so if size_of(p) + size_of(k) is just 1 byte (8 bits) smaller than size_of(c), you'll only be able to produce 1/256th of all the possible strings with length size_of(c)
Why is this important? Because if you want to start with some random string c, and try to find a matching p and k so that E(p,k) = c, then you are very likely to find that no possible combination of p and k actually produces the string c, since all possible combinations of p and k only produce a fraction of the possible strings c.
This whole problem that I described is just a result of the counting argument, and in fact, your approach is commonly referred to as the "magic function theory", covered here:
http://www.faqs.org/faqs/compression-faq/part1/
in [9] Compression of random data
and more specifically here:
http://www.dogma.net/markn/FAQ.html#Q19
To everyone posting here, the first link to the compression FAQ should be something you've read and fully understood.
Zen:
apparently:
the so-called “counting argument” produces “area”, also “synthetic electro-magnetism”;
it results in “invisibility” (the million digits are not seen)
the secret “no-counting argument”, produces “volume”, “synthetic gravity” ;
it results in “visibility”, (the million digits are seen)
I have other names for “electro-magnetism” and “gravity”…
Your aproach appears to be based in (and limited via) statistics…
My approach involves something different
@Zen
I understand what you are saying.
BUT the common mistake is thinking that k must be static. The counting argument does not apply here. It isn’t, at all. It is modified every step of the way – for every 6 bytes, a new k is produced using information it gathers from those 6 bytes. Meaning, the program will use 69,207 very unique k. This means, if we stored the key at every interval in one big file, we have created a library that is 708,818,094 bytes long and stored it in a file that is just 10,242 bytes long and then combined it with a file that is 69,207 bytes long to produce a file that is 415,242 bytes long.
Put it this way, the encryption actually RELY’S on the injection of random data in c to be fully successful – and actually requires the use of that random data to decrypt the file. Think about that one for a moment… How is this possible?
I know it sounds like a load of crap, but it does work that way. I mean, how in hell can a guy look at random data in a file that was not produced from the RNG on his machine and actually use that to decrypt that file? Sounds Bizarre, but that’s the point. The problem IS bizarre, and requires a bizarre solution.
What this is saying is, a plain text such as this message, must also be the cipher text to some other plain text 1/6 the size of it. Weird, but true.
Again, the idea to inverse of my encryption to use it as a compressor is completely infeasible hence the strength of the encryption given a keyspace of 256^10242 but it is a working solution given an infinite amount of time to work on the problem…
@krishields
>The counting argument does not apply here.
I can’t tell you how many times I’ve seen people say this, as though they’ve somehow worked around the counting argument. Except that no one can work around it, because it’s just a very basic mathematical property of any sort of data.
It doesn’t matter if you use multiple keys. The counting argument applies to any type of data manipulation, because it only concerns the size of the data, not what method you use.
The number of possible files of a certain length is only dependant on the length of the data. Because of this, you simply cannot have enough combinations of smaller size data to represent ALL possible larger-size data.
It doesn’t matter what algorithm you use or what sort of trick you employ – if you take write any program that tries to take some file and outputs a smaller file, you will find that you can’t output a unique smaller file for every larger file, simply because there’s far less unique smaller files than there are larger files!
Please read the counting argument again! And again! It’s universal, it’s basic math, and you can’t bypass it!
@Zen
But you can because you are not concerned with producing the entire scope of the file, “in one interval”. You are only concerned about producing a tiny portion of it at any given interval….
While I cannot fit 15 pigeons into 16 holes all “at once”, I can stream them so that at some interval within the operation of the program all pigeons have gotten a chance to sit in one of them “for a moment”.
So I make for myself, another box to contain the pigeons that are waiting to sit in one of the holes.
Given a block size of 6 bytes, I am only concerned with being able to produce 256^6 possible combinations during that interval, and 256^6 possible combinations during the next and so on until we have enough intervals to cover the length of the file. Our box which contains the pigeons that are waiting to be sat is the most important aspect – we can re-arrange that box however we like until we find the pigeon that we want and then record its location.
If the algorithm is based in causation, the output given identical p and identical initial k are in fact, going to be unique and resulting k1 will also be unique, generally speaking anyways – notwithstanding the statistical “chance” of reproducing the same end result, even given different input.
Kris
Wait, that should read, “16 pigeons in 15 holes”.
@Kris:
>But you can …
Nope.
I was going to try to explain this again, but if you haven’t understood what the issue is, another explanation from me isn’t going to change that. I also think this is not the right place to have this discussion.
The problem is that to show the counter-argument to what Zen is saying is difficult to do while maintaining commercial secrecy.
There exists what I could call “the no-counting argument”.
It is too sensitive to say here.
Some comments though:
Zen claims that “The number of possible files of a certain length is only dependant on the length of the data. Because of this, you simply cannot have enough combinations of smaller size data to represent ALL possible larger-size data.”
The problems here: “possible” files, the word “possible” implies a statistical perspective. What about the number of “impossible” files? You are lucky to get this big a clue.
And re: “length” of the data, well, who is worried about
“length”?
“Cannot have enough combinations”… who needs “combinations”? Who said anything about representing “ALL”, “possible”, “larger-size” data?
These are statistics-based perspectives…
With hyperspace triangulation, you do not need to worry about “statistics” (and associated (relative) limitations)
You do not need to represent ALL possible combinations of “larger size” data with “smaller size” data.
You just need to find how, and in what way, the data is “like a blob of something sticking together like clay”.
It’s “large size” IS its “small size”, because it forms a blob- it has volume. It is no longer strung out.
@Zen
On the contrary, I do understand what the problem is.
Give me 8 bytes. If I analyze the file one byte at a time, I give you 256^8 possible combinations or files. I have to be able to come up with that.
Say I set my block size to 2 bytes. This gives me 4 runs each looking at 2 bytes at a time. Each 2 bytes gives me 256^2 possible combinations or files that I can produce.
(256^2)*(256^2)*(256^2)*(256^2) = (256^8)
I’ve successfully blocked out all possible combinations or files.
Now I have to represent all those in a smaller file, somehow.
But how am going to do that. I am going to represent them in two files. One that remains static, and one that is dynamic. But for this initial explanation, I will generate a static k so you can see the whole dynamic in one frame.
What I will do is set each two byte character in a key file, k. So I have a k that is 65536 bytes long.
We simply string search for our combination in k and assign the appropriate corresponding character, depending on our position in k to it, 0-255. However, this means that we are going to have at most, 4 duplicate characters in p. Thus conforming quite nicely to the counting argument and making it impossible to decompress. How do you account for that?
Simple! Make k *smaller* which brings out the requirement that it be *dynamic* rather than static. Instead of looking at 65536 bytes “at once”, k should only be 512 bytes for each of the 4 blocks so that you only have a single instance of a particular two byte sequence in k per block. That is, for this instance, ‘b’ = ‘ik’. Then shuffle k for the next block and so on so that you can have 4 ‘b’ in the target file, but each time ‘b’ might mean something different, ‘b”b”b”b’ = ‘ik”jh”uy”tr’.
So yes, you CAN get around the counting argument by breaking it into smaller pieces and streaming modifying k. How to stream modify k, determining what block size, k size and initial arrangements of k etc. are necessary to ‘make’ the file is the challenge… Can it be done, given an infinite amount of time, YES. Is it feasible, HELL NO, not in the slightest.
@Alan
Absolutely. We are not interested in producing ALL possible combinations at once. Only the ones which are necessary.
There is some mathematical rule that says, no file of any length can contain within it, all possible combinations for that length of file. So that is a moot point, really. We are only concerned with what is in that file. It matters not what the potential a particular sequence has to be in that file, only WHAT IS. That is, the program should not be responsible for producing WHAT COULD BE in the file, only WHAT IS in the file – for that particular file only.
Computation is bound by the same rules reality are.
Consider, every byte sequence could be a potential “program” in itself. If I had a program that sequentially generated every file of length n, then I have for myself, every program of length n! What those programs could do… who knows.
Hi. Interesting discussion here. Been dabbling with compression myself as a hobby for a couple decades and nice to come across people looking at compressing data from a different point of view, ie., not looking for redundancy or other regular methods. Looks like some of you have been at this challenge for a very long time!
I do have a comment about the post above about the Recursive site claiming the new lossless recursive technology. Looking over his method, the idea of splitting couplets into control/subordinate groups is interesting and certainly will provide some compression with certain data. The main flaw I see right away is with his examples. He didn’t take into account what happens when you have a couplet like 10. 00 01 and 11 will all work as he describes, 00 being the only one that will save 1 bit. Maybe I didn’t think it through enough to see how that would be split so it could be reconstructed on decompress but it doesn’t appear to have a solution the way I see it.
Anyway, just wanted to make a short comment and we’ll read through all these posts and maybe join in on some of these discussions when time permits.
Take care and good luck to you long timers.
Hey,
Been doing a hell of a lot of projects now that I have time.
It is not yet time for me to head off into the warmth of the cave and start working the data and writing code it’s been in the mid to upper 70′s for a while now and this weather is like spring here in Central California.. Obviously climate change but it’s Hawaiian like in my opinion of Hawaiian winters.
Just dropping in to shout out I have not forgot or slipped away.
@ the challenge. I will try to be clear.
This is not a race that we are going to sprint to the finish-line with the gold cup in hand but, I do think we are finding chinks in random-data’s armour.
I agree with the counting argument but what if we only encode 2^(n-1) words ( or less ) of n bit words with a number generator that generates all the words in “a file” with a n minus one bit length codeword?
The file will not contain all 2^n values so are we limited by the counting argument?
I would think that if a codeword can be input and a larger word retrieved then that is (de)compression.
I have over 1000 40 to 36-ish bit examples to work with later on..
Perhaps not the gold cup but it qualifies. Anyone else got anything?
Okay then the faster I get these chores done the sooner I can get to the fun of working on this challenge.
Thank you Kris, brilliant thinking as I was rethinking about what the counting argument may be and it does seem difficult from some perspectives i.e. how could 10 digits stand for 100- wouldn’t several identical-looking ten digits have to stand for different hundreds the counting argument might say?
But one could have non-data in different places so not all 0s and 1s would be data, that would introduce differences as could breaks in how the data is grouped.
But your point is superb- it doesn’t matter if theoretically a stored “compressed” version is one of a relatively limited range of optons to store data compared with all the possible larger versions of the data that could exist- so long as large data plus process becomes less-space required data and that it works.
What is stored is not really data but could be an entanglement between the original data and the process itself and the apparent stored version. (I call my system a dynamic data storage system)
Because the process and the original data are outside the set of possible combinations of the supposed compressed version- the secret is that THE PROCESS ITSELF IS PART OF IT- the process itself “Becomes entangled” with both sets- the original data and the what looks like “end-data”.
One could ask- how does the process “know” that the apparent “stored” “data” is different from what may appear to be similar-looking data? Simple: the “Process” IS NOT ABSOLUTELY FIXED: the process changes- the original data affects the process- there is an interference pattern between the original data and the process- you end out with a “compressed data” that could be seemingly identical to many other “compressed datas”, but the process has altered its patterns- the data is stored in a original-data altered process- hence the data IS ACTUALLY STORED IN HYPER SPACE (It is stored as a large group of sympathetic interference patterns like a hologram (hence: “data holograms”!)
What handles this data is probably not an algorithm in a conventional sense.
It is stored as a “wave” and as a “particle”, it is actually stored “like light” one may even say, it is up and down and sideways (like a wave), and part of something larger (it collides with a detector as “energy packets” (as “bunches of alternative possibiities”)).
One could suppose there is no absolutely fixed algorithm- the orginal data helps “create” the process that “stores” it- so suddenly you have vast numbers of combinations available because a process is time-spread, and time allows a lot to happen.
So tough as it was/ is, the “counting argument” is SHATTERED it appears.
As the data is partly stored “in time” not just “in space”.
All you need to do to defeat the counting argument is store a run time, then identical-looking zeroes and ones could stand for even identical-looking instructions and meta-data, but both would be entirely different!!!!!!!!!!!!!!!!!!!!!!
this of course only (apparently) works because of natural interference in all data (caused, naturally, by what is known in mathematics books as “the birthday paradox”)
You don’t have to worry about storing all combinations of a million digits, just a particular combination. If it works, it works. If you tried to store all combinations, you can store that very easily: “all combinations of a million zeroes and ones” is a sentence that “stores” it, you could say.
The sentence would be longer if you reduced the number of combinations to store- but as the sentence got longer, the number of particular storage systems needed to uniquely recreate each one decreases. At some point an efficient system is reached…
I have, apparently the answer
“All combinations of zeroes and ones” is “predictable”, you could say.
One combination of zeroes and ones is easily stored one may say by birthday-paradox-linked internal interference patterns.
At maximum improbability, it appears any combination can be stored via a particular apparent way.
Possibly, apparently:
all combinations are always stored (In the hyperspace continuum).
My method simply extracts a particular million digits from there
Wow, your post has attracted the crazies with crazy ideas. But in fact challenge 2 is solvable as posed. The following program can compress any file that consists of a bytewise permutation of AMillionRandomDigits.bin by exactly one byte and decompress it to its original contents. It requires AMillionRandomDigits.bin as a reference file (consider it to be part of the program).
In fact it would be possible to achieve a better compression factor because of the fact that the number of permutations of AMillionRandomDigits.bin is considerably smaller than the number of files of length 415241.
Please contact me privately to arrange payment of the $100.
I should have mentioned that the program is written for Python v2 and is run like
./random-compressor [--compress | --decompress] INPUT OUTPUT
where INPUT and OUTPUT are filenames.
I estimated the total possible compression based on lg(number of files of length 415241 / number of permutations of AMillionRandomDigits.bin) and using Stirling's approximation. If I haven't made a mistake, an optimal compressor could compress any file containing a permutation of AMillionRandomDigits.bin by 235 bytes.
@mhagger:
You would certainly win if the only input files were permutations of the million digit file. But of course the wording of the challenge is:
>Create a system to compress and then decompress any file of size 415,241 bytes.
Of course using the word "permutation" to describe sample files I might create is probably a bad idea, because it implies a formal reordering of the bytes, and you jumped on that. Because the problem definition said "any file", you can assume that when I use the verb "permute", I am using the English meaning from dictionary.com:
1. to alter; change.
Not the mathematical definition. But I should change to a more generic term like "scramble". And please, before you decide to engage in a protracted argument about the rules, remember the key point I tried to make:
>The nature of this contest is such that gaming the rules is pointless. You aren’t entering in a quest to beat the rules, you are entering in a quest to beat the data.
However, you certainly win the unstated contest, which is to find a way to win the contest if this were that kind of contest.
- Mark
;-)
I was just kidding about taking your $100.
I was more interested in pointing out how a tiny extra crumb of information can be used to effect a tiny bit of compression. I was initially surprised how little compression is made possible by knowledge of character frequency counts, though in retrospect it is pretty obvious:
N = 415241
Each byte value appears approximately N/256 = 1622 times with a variation of about ±sqrt(1622) = ±40
That fact that byte b appears n_b times therefore gives you (very roughly) lg(80) or 6.3 bits of information. Since we have independent counts for 255 bytes, that gives about 6.3 bits * 255 ≈ 200 bytes of information, which is not far off of the value of 235 that I computed.
The compressed size, 99.94% of the original file size, is nevertheless not very impressive.
@mhagger:
I've seen a lot of people get sucked into traps over the years when they think about compressing based on combinations and permutations. One guy actually had a formula for recoding the million digit file in big blocks, and he could show that he was going to save space. It took both of us a few days to figure out that his problem was evaluating a huge factorial using the online version of Mathematica. It turned out that when it gets to some really huge numbers, it switches from an exact evaluation to some sort of approximation, and even though it was flagging this, neither of us noticed it!
http://marknelson.us/2011/01/09/combinatorial-data-compression/
It is interesting to try to find those tiny crumbs. Every once in a while they will pay off.
- Mark
Suppose the million digits were all zeroes. They could be. Difficult to "compress"?
Suppose the million digits were all ones. They could be. Difficult to compress?
What if the million digits were 500,000 zeroes followed by 500,000 ones? They could be. Difficult to compress?
At what point do they become difficult, one may ask...?
To defeat the "Counting argument", one could say that there must be a way of increasing information content of say 100 digits, so it easily matches that of say 1,000 digits, and there must be information missing in a conventional 1,000 digits.
How can you represent any unique combination of 1,000 digits with say, 100 digits? What is missing in the 1,000 digits that could be included in the 100 digits?
One thing, you could run a list of 100 digits in many different orders, with the list itself ordered in a rotating pattern so that it easily covers a simple ordering of the combinations of 1,000 digits- that would be one way (so you go through a series of differing list orders re: the 100 digits that creates a much longer meta-list to correspond to just a simple listing of combinations of 1,000 digits.)
The question is whether Mark Nelson can be persuaded, without revealing (to him) actually how it is apparently done, that it is logically possible to counter the (so-called) "counting argument" and potentially, theoreticaly, if one knew how (which apparently I do....) solve the million digit challenge.
Regarding the claim that there are not enough combinations available to have a unique 'code' for any pattern of the million digits, well, something the milion digits do not have in their initial state, that a solution can have, is a break into two types of information: so-called 'decoder" and so-called "data". Hence you could have numerous semingly identical lines of zeroes and ones for different versions of the million digits, that are different vbecause that which is "so-called 'decoder"" and that which is "so-called "data"' can vary.
Another part of the lines of zeroes and ones can designate 'run time', and who knows which part of the lines of zeroes and ones refers to that? The computer can store the three roles differently ("Run time". "so-called 'de-coder"", and "so-called "data""), plus the "solution" files can be different lengths. With all these options, is there not enough to cover every combination of the million digits, with less zeroes and ones (If you knew how)(which, apparently, I do...............)
Mark Nelson's challenge: to refute this argument....?
Good work anyway mhagger!
I enjoyed reading the conversation.
Hey Alan.. It makes no sense to me.
I'm having to job hunt and get welfare since my UI benefits are not valid until January. LOL one of life's little twists but I will rejoin you and the challenge soon.
"Life is what happens to you while you are busy making other plans." John Lennon
I do have that potential recursive encoding so that is worth the effort and I will try to improve the number generator and maybe increase the number of matches.
Small Steps.
The problem with the contest is that the random file is to short. Also I feel there is a weakness in your rules you allow 2 files.
One the program file and the other a data file. You sum the program file and the data file to see if they are shorter than the random file.
THAT rule is a mistake. Since and two files will combine to an average length of which is greater than the sum of bytes in the two files see DSC at
http://bijective.dogma.net
and see pairing at
http://mathworld.wolfram.com/PairingFunction.html
To the current problem define a program N+1 bytes long where the first byte is either 0x00 or 0x80 this byte will be XORED with the first byte in the uncompressed file. The rest of data added to this one byte file making it thousands of bytes long is the data of so called random file skipping the first two bytes in the random file but only a few thousand bytes. The other the data file is made up of the rest of the original data file following where the previous stopped off.
The sum of the program plus the data file is exactly one byte less than the so called random file.
To get the first two bytes out of the random file when decompressing with as a two byte number the length of the first fragment as a two byte number then xor the first byte with the first byte of the program. Your done!
So this is shorter than the original file byte by one byte. Well suppose you could code such a program use real code it might be X bytes long but its finite. If you random file is long enough you get the first several byte for free that is the number of bytes that give the length of first fragment. When the random file is long enough you eventually get enough free space to write the program to do the combining.
Of course one should allow 2 files. But instead of adding the lengths of two files to N bytes and say get that should be less than the single file of N bytes. You should add to sum of the 2 files log(N)/(8*log(2)) bytes rounded up to next integer this would make the problem fair.
Or better yet is think to the Randomfile as a single number write how bytes it needs. In this case it the number of bits in the file itself so pretty easy.
Compare that with the number of bits in the program file say its X then look at number of bits in the data file.
Combine those 2 numbers using the pairing function to bet Z at wolfram and get the number of bits used. Require that number to be less than the number of bits in your random file. Which is why in the real game you only look at length of simple program that is run to get file instead of program file plus a data file.
@david:
First, there is certainly no requirement that a program in contest 1 include a data file - that is strictly optional. But I don't see how it causes any problems.
I'm not sure I follow your algorithm though.
- Mark
Gee Mark I was hoping you clear up what I wrote. I proof read it several times but many typos especially the last few lines
but here is an example of what I meant
take any file 2^16 + 2 bytes long assume its random suppose the first two are bytes 0x00 and 0x00 create a file that is
2^15 + 1 bytes long. the first byte is 0x80 and the next bytes are the next 2^16 bytes of file you compressing this is the program file its 2^16 + 1 bytes long the last file is rest of original file which is 2^16 bytes long you have two file that when run create the one file that is 2^16 + 2 bytes long so you save one byte.
Next same file as above but random file has first to byte as 0x80 and 0x00 the answer is the same except know the control byte is 0x00
You look at fist two bytes of the long random file read them as a number if the number is less then 2^15 then you use 0x80 and first word of program if its 2^15 or larger you use 0x00 as first byte of program. Then you tack on to the contorl byte the number of bytes that after the first two bytes. Then the rest of the file is just the bytes left
which is the lat file. When you combine you save one byte.
Extreme example I have a finite file such that the length N is so large log(N)/log(2) is several billion. I can cut the file into 3 pieces. The first short peace is is roughly a billion bytes long that file will be thrown away since it is can be recovered and is less than half the values of log(N)/log(2) so its still a billion bytes long. The next peace is much longer its length when write in bytes is actually the same as what the first fragment is. The last peace is the rest of the file so I have one short piece and two long piece. The there file when you glue then together you get the original file. And the sum of there there lengths is the length of original file. Well I can though the first piece away and since it at least a billion byte replce it by a simple program that is combined with the byte control work and second fragment call that the program its quite large. The data file used with the program is the third fragment. The two pieces the program and the last fragment should easily save you millions of bytes.
That is way you can't have the total of two files compared to a file by its self. You need a handicap based on the log of the length of the two files.
I hope this clearer. Feel free to change any typos you want I am sure there are many.
Okay friends I'm back at it.
It's rough to walk away from a "hot Project" and coming back cold.
The first order of business is to extract all the data and build a codec.
How are you doing Alan, mhagger, Mark and David?
I need to have a file that reports on where the data was taken from the Million Digit File and I need the subset of that data as it's own file.
I hope I warm back up soon.
here are some questions to ponder:
there are many arrangements of zeroes and ones that can be massively 'compressed', even a million digits, e.g. a million zeroes, a million ones, 500,000 of each, etc.
Agree?
Therefore, in storing all possible combinations of zeroes and ones in a million digits, it takes less space than just storing the fully written out combinations.
Agree?
If you allow combinations to interfere, you can transfer savings on 'simple' patterns of zeroes and ones (easily 'compressed') to more complicated patterns?
Hey all.
I have been unable to work on writing code here in a productive way.
I just started to work on the programs I need to extract the data and make a file of a subset (40,000+ bits) of the MDF.
On that issue : I hope my personal life settles down to a focus so I can get into writing code again.
I have a request. I am leaping into attempting to understand DE BRUIJN GRAPHS. I'm clueless about graph theory.
I was looking for URL's or a tutorial along the lines of Graph Theory for Dummies : DE BRUIJN GRAPHS explained or something reasonable along those lines.
Looks very interesting. I would like to understand.
I welcome (simple) tutelage on the topic of "DE BRUIJN GRAPHS and Data Structures explained for dummies" in general.
It looks like a way to index permutations and more.
Thanks.
@ Alan..
How is it going?
The way your post reads I believe segments of some file can be encoded smaller. Knowing what is what is an expense.
I'm taking the lazy approach. I am finding matches in MDF using a number generator that issues codewords of fewer bits than source length.
That's my subset of the MDF I claim to have compressed. I am not encoding where in the file those elements exist.
It would be a challenge to dynamically select what number generator goes best with what subset of the stream.
Like wise enumerating "compressed" segments along with uncompressed segments has an expense too.
I cannot suggest anything at this time.
there is a 'trick' to bypass the expense
if one looks at numbers at a ultra basic level the whole file becomes its own 'number generator', it 'generates the exact million digit file because it operates at the 'hyperspace boundary' where even what numbers are involves a 'limit' on space . (cross-referenced against its own 'uncertainty' 'parameters')
Good thinking re: "dynamically select what number generator........"
Well, this is my 1st post since the new rules... I haven't stepped aside, but I had (and still have) much to do, and didn't have the time to compress.
My number generator finally seemed to be periodic (probably it has a large cycle - several millions - but it is not enough to generate enough different numbers for the file), and that's not good. I think I'll resume to my classical "Puzzle" approach. Or I'll invent some new way to compress. Or if all else fail, I'll try to make a new number generator (I'll read a bit about randomness in general...) At least, it's finally winter (with temperatures under 23ºF), so my computer doesn't need to be stored away in the basement (but it doesn't hurt anyway)...
Someone making progress? Ernst? Anyone?
Same thing here.. Puttering around on the data extraction off and on.
Still aiming at encoding smaller the subset of MDF already compressed.
Recursive compression.
Also I have written to one of the men who wrote a maths proof related to a discovery I made in 2010. I need to get help in understanding that proof in order to see if I have something to add.
I believe at this point I have the Universe of that maths and I assume they have just a piece of it. But, I don't know exactly yet.
I hope he replies.
Wouldn't it be cool if all this bit-crunching adds to our understandings?
@ Alan
I use the same function but use different values for the "X and Y" for 16 instances of the program running at the same time.
The number of variable sets that can be used is infinite.
Any "word" from MDF can have more than one code in the same instance of the program or other instances of this program.
Some numbers show up more often than others.
I was seeing about 100 new numbers found with 16 programs running per day. I can only assume the number of repeat matches was very high. I believe the number of numbers generated not found in MDF was even higher.
The truth is that those 37 bits actually generates 20,480 bits but only the specific instance of one of the 512 40 bit values will match the source value.
There are more possible designs. I know I don't have command of the science.
Vacek Nules asks if anyone is making progress-
I apparently figured out a way to resolve this challenge ages ago. As I am fincially poor I cannot afford to give it away, and can only give very few clues.
Publishing a possible solution to Godel's Incompleteness theorem, and apparent answers re: the Turing Halting problem, the seven Clay Institute Millenium problems, and the Goldbach conjecture, and the physics "theory of everything", would not guarantee being able to eat well and buy a house (as far as I know).
THe Clay Institute require solutions are published in mathematical journals, and last two years of challenges, and generally be accepted by mathematicians, before they even consider their option of paying a prize.
If my apparent knowledge was added to the computer programming knowledge of someone here like Ernst, then the result could be worth hundreds of billions?
But a new company needs to be created????
>Publishing a possible solution to Godel’s Incompleteness theorem, and apparent answers re: the Turing Halting problem,
>the seven Clay Institute Millenium problems, and the Goldbach conjecture, and the physics “theory of everything”,
>would not guarantee being able to eat well and buy a house (as far as I know).
While it is true that the Clay Institute moves slowly, publishing a reasoned rebuttal of Godel or Turing's Halting problem would pretty much guarantee you a good job somewhere in academia right away, no matter how lowly your credentials. It would simply be too remarkable a feat for the world to overlook it. If it was done by someone outside mainstream academia, it would make you all the more interesting.
Your wording needs correction though. You describe a "possible solution to Godel's Incompleteness Theorem." This theorem is a well-buttressed logical statement with a proof that has not been seriously challenged for 80 years. There is no "solution" to it as it is not a problem, puzzle, or challenge. Really, if you are talking about doing something to Godel's work, it would be in the form of either showing that his proof is incorrect, or in showing some extrapolated framework in which it does not hold. Either would be remarkable. Neither would be a solution.
The same thing holds true for the halting problem. To make a rigorous proof that we cannot decide which programs halt takes a bit of work, but at the heart the logic behind it is quite simple. And again, it is proven. For you to do anything, you would have to find a flaw in the proof, which is not "a solution".
- Mark
Thank you for this, Mark.
I am not looking for a job in academia, I have other things to deal with that puts me in a situation where I need to be independent. The default situation is I in due course hope to get a document sufficiently ready to send to a scientist who has signed a non disclosure agreement, and hopefully negotiate a license for a specific application (involving "holographic data analysis and hidden pattern discovery re: seismometer raw data) of a collection of apparent discoveries all related (so includes data storage ideas).
If desperate I could contact investors who lost money on non-delivering data-compression technology investments, to offer a second chance.
Re: 'solution"- I was thinking about how Godel's Incompleteness theorem, and the Turing Halting problems, are it seems depicted as problems limiting mathematics or technology.
My Internet service has been on the Fritz friends. I am able to use between 2 and 7 minutes of Internet time before the line goes off again but I wanted to Wish everyone Happy Holidays as soon as I could. A bit kate but it's the thought that counts.
To all Million Digit folks the best of luck in the coming year.
For Christmas fun here I went to inventing. I invented a transform that can work with BWT and MTF. It maps the values of the set of object words (like 8 bit words) in a dataset to new values in a different way than MTF. There is no physical movement of elements like MTF involved just recoding in place. There are many possible applications. This was an exciting avenue to explore for the holiday.
Still.. I owe a report on compressing MDF. I am still needing to prove my long rant on compressing subsets of Million Digit File. I can cover my ass on that. I'm lazy but I'll get busy this weekend.
Hey @everyone. I am very willing to take a full time job at college so I could go back to school. I would do much better in a work/study mode in my old age than a waiting to die, slowly broadcasting updates mode.
I'm cool with waxing the hallways and cleaning the bathrooms...
Must help with Cat Friendly housing. Toga Parties optional but fun.
Again The Best of Luck to any and all Million Digit Challenge Folks!
Happy New Year.
Challenge #1 is interesting. Even if remains unsolved (most likely so), tackling it could advance theories on how we might be able to measure the absolute degree of order in any data set.
And challenge #2 is bogus and impossible, of course. But I'm sure there will always be people around to confuse the apparently impossible with the mathematically impossible :).
I have good news to start the MDF Challenge New year.
A subset of the Million Digit File of 5860 40 bit words 234,400 bits has been compressed 7.5% and decompressed.
The "compressed" data has been decoded (decompressed). That means a partial compression of Million Digit File has been achieved.
I plan on offering the subset and the compressed file as soon as possible to any who care to examine the data.
Publishing the code for the number generator is on hold pending information on the maths discovery I made in 2010. I have wrote someone who can answer my questions and have not received a reply yet.
Again I announce a subset has been compressed.
7.5% compression achieved.
Good Luck Challenge People.. This Can be done!
@ Alexis Naveros
You Wrote :
Challenge #1 is interesting. Even if remains unsolved (most likely so), tackling it could advance theories on how we might be able to measure the absolute degree of order in any data set.
---------------------------
Tell me about it! 10 years I have been tinkering around with this challenge. As Mark wrote I have come at this from many directions in that time.
I am hopeful I have something to offer on advancing theories as you wrote. If not it will advance my understanding.
I would say that no one is going to get far on compressing million digit file if they don't have tools to manipulate "ordering" and reassign elements into different sets.
I'm way over my education level here but I see some fantastic possibilities and I see mathematical objects to work with. I am needing to find out what exactly I'm romping around in.
The next phase here is a new number generator to see what those maths things have to offer.. It's exciting.
So yeah my friend; it's all as you say. I have to agree.
Happy new year!
Hi,
without directly going to my apparent technique re: the million digit challenge", I can say the "counting argument" can often seem very tough.
However, I just worked out an apparent simple answer, that also (apparently) "knocks out" (from very brief investigation) what Wikipedia calls "incomputability of Kolmogorov complexity" ("Kolmogorov complexity" is said to be the least number of characters required to describe a piece of data in some universal description language), and also (apparently) dissolves the "Counting Argument" (which is I gather saying that if you have "n" items and "m" pigeonholes, and if "n" is greater than "m", at last one pigeonhole must contain more than one item). (Hint: don't exactly count pigonholes...........)
I thought about saying both these results, as it doesn't on the face of it appear to involve giving detail of actual method, but will say some other things just now-
I assume that if one million digits, using these: 0,1,2,3,4,5,6,7,8,9, are written in binary, there are a fixed number of possibilities.
I assume that people recognise that many of these possibilities can easily be stored in very little space: e.g. one million "1"s, one million '2"s, one million of a huge range of patterns (one could say the anti-Kolmogorov complexity is list all easily reduced versions of the million digits)- I wonder just how many non-easily reduced versions there are?
It seems to be assumed that the remaining versions are going to require more characters, even no compression.
Yet overall it takes less space to write all possibilities than just writing them....
so there must be some inherent anti-Kolmogorov complexity....
The pigeonholes are not isolated, but are at least partly linked along with whatever is stored in them- i.e. they form (with the "data that could be in them") a higher-dimensional system-
It comes down to what goes where when-
a universal space-time difference
Another thing is: one approach of people is to look for any possible "weakness" in the million digit file, that may allow easier storing. The technical apparent discovery I found (apparently) allows one to find how to re-arrange the data so that it naturally 'fits togther' like a ball of clay, and to find "clumps" (or "objects" in the data).
Ernst writes: ".... tools to manipulate "ordering" and reassign elements into different sets"
true
Alexis Naveros wrote:
"Challenge #1 is interesting. Even if remains unsolved (most likely so), tackling it could advance theories on how we might be able to measure the absolute degree of order in any data set"
A way of apparently solving (i.e. of counter-arguing to) the claims of Godel's Incompleteness theorem and (If it is a claim) the Turing Halting problem, seems to adress this
(both answers involve a system with at least 42 axioms, allowing all data or whatever to be rotated in higher dimension space that it itself "projects" so to speak, which brings into the scenario the question of turbulence, its opposite 'teleportation', the Poincare conjecture, probably also Riemann hypothesis, Hodge conjecture, and other Clay Insitiute Millenium claimed problems in mathematics.
Its "halting" IS at the moment of its "incompleteness: a perfect ....... (a example of a theory of every thing)(a sort of mini uni verse subdivided into discrete bounded objects)(that (curiously!) "exchange" quanta of "energy")
I recently thought of about 25 ways (from maybe 50 or so have found) to describe, name or relate to the technique I found, but only a few have thought it o.k. to say (most too obvious re: the apparent discovery)
the ones I have said include: "theory of everything", "hyperspace bypass", "stating the obvious".
You Wrote.. without directly going to my apparent technique re: the million digit challenge", I can say the "counting argument" can often seem very tough.
--------------------
It's something to count on..
We have to enumerate codes to tell the difference between them. It doesn't mean that there can't be two or more different codes that map to the same uncompressed source.
Obviously if the weight of trying to encode all N elements in one go is lifted there are systems that can represent a subset of MDF.
Remember who is to say Mark's dataset is the best way to approach a solution to the underlying problem?
All we all can agree on is it's a tough climb. It's a Challenge.
The fun is in trying to climb it.
I'm holding out that you have clever ideas that will bare fruit.
I can honestly say I can see where I was in the yesterdays but I have to take the steps to be some place tomorrow.
How many characters does it take for me to describe all the possible combinations of a million digits?
This many: "all the possible combinations of a million digits". By my count that's 49 characters. See, that's what Kolmogorov complexity is.
Just because the complexity of some string is really low, doesn't mean that the complexity of all of its individual sub-strings is going to be lower.
Ernst, you still seem to be somewhat realistic, though some of your posts hinting at recursive compression are bordering the absurd. But take my advice - pay no attention to Alan's worthless rumblings - there is no mathematical basis behind what he says, he's simply making up things that aren't real and delusional in believing himself a genius.
Why not add a leader board to see how close people are getting to crossing the +% compression gain threshold? That would encourage more people to participate... and in the process something innovative may come out!
It's okay Zen.. Alan doesn't bother me. If he has issues with reality well haven't we all?
Alan you do seem strange.
1 Matches Found. FPos-Element 4563
The first match found from the "compressed" binary. if the binary file checks out then that is "compressing" the compressed.
Thanks for having a little faith in my sanity Zen. I know nothing about yours.
Hey all.
The most difficult thing to sync up has been me. I'm working on the stand alone decoder. Part of that is to rethink the whole flow of data. Rechecking what has already been rechecked. It never hurts to review.
It looks like the compressed binary is true. Having just parsed the first record I will assume that the following records are going to be at sequentially predictable intervals.
Now on to integrating the number generator and generating the first de-compression / decode for the stand alone version.
On "re-compressing:"
Here is a list of the first few matches found then; since the binary appears to be true.
1 Matches Found. FPos-Element 4563
2 Matches Found. FPos-Element 1901
3 Matches Found. FPos-Element 1649
4 Matches Found. FPos-Element 1900
5 Matches Found. FPos-Element 5251
6 Matches Found. FPos-Element 2635
7 Matches Found. FPos-Element 5273
8 Matches Found. FPos-Element 3934
9 Matches Found. FPos-Element 1595
10 Matches Found. FPos-Element 2625
11 Matches Found. FPos-Element 3060
12 Matches Found. FPos-Element 5011
13 Matches Found. FPos-Element 150
14 Matches Found. FPos-Element 4027
15 Matches Found. FPos-Element 3699
16 Matches Found. FPos-Element 2936
17 Matches Found. FPos-Element 4774
18 Matches Found. FPos-Element 4017
19 Matches Found. FPos-Element 1987
20 Matches Found. FPos-Element 1518
The above reports the compressing of the "compressed" subset of MDF.
The same ratio of compression 7.5% is still valid encoding smaller a second level.
There are many questions I have. Like will all 5421 40 bit words of the this binary be found under a "full load run?"
That answer and more has to wait until I load up and run a full load.
Could go either way. I am exploring after all.
It's exciting guys.
Good Luck Challenge people.
Ernst
Update:
Current Chill out music: http://www.youtube.com/watch?v=7apji-hg5j4
--------------
I Love that kind of chill and also pan flute but I digress.
I just extracted the first MDF subset source[0] 40-bit word from the binary with the stand alone version.
Looks good once again. The re-compression then has a green light.
Okay now to finish it and move on. I'll set up for a full load run on level 1.
Let me call the level we stop at Level[0] or some variant like Source[0]. It's time to introduce such concepts.
I am sure it's possible to generate a level[-1] dataset.
Haven't been around in awhile ...
Ernst, are you on the edge of greatness (winning the challenge).
If so, good on you!!!
I'm positive it can be done ... well, let's just say, "incredulously confident".
Thanks CWCunningham
Winning by encoding all that data, that is MDF, smaller is still a mountain I am yet to climb from my point of view.
It is still a really big challenge for me even though I've found codes for some of it.
I'm no where near claiming a win on Challenge 1 or Challenge 2 nor am I thinking I'll have a solution to either challenge any time soon.
However, it would seem to be true that a subset of the file is vulnerable.
At this point it will help us all to be clear on the progress I claim to have made.
The truth is that, it is a simple construct that demonstrates a codec that has an associated data set.
It's a dictionary method.
I can match any "word" in a file that is in that dataset and decode is simply a look-up table function.
The data is generated by a number generator, of my own design, and is not based on statistical aspects of the Million Digit File directly.
It is the year to share some good news of partial success' so good news Challenge friends. A codec of data-compression value has been found albeit a simple one. If I may say there are more wonders ahead. It's beautiful.
Please don't let this news hamper your efforts or darken your spirit. Mine is not the only path. Of that I have faith.
Update:
Synced up the stand alone decoder output binary file with the subset binary file. The "CMP" utility reported no differences. Good enough for me. How about you?
If there is more interest comment I guess.
I just started a "full load" run on the compressed "level1" binary.
That file did change. It had a header added to it.
Full load means all 16 instances searching the Level1 file for words that match the data generated. I have an 8-Core AMD Bulldozer CPU now running on all 8 100%
I will be moving on to the next design from here.
If anyone has questions feel free to ask. I'm rather relaxed at this point so it's all good.
I could write a manual decoder version. Allow anyone to type in 37 bits and generate a 40-bit output.
I can demonstrate generating the Level[0] file from the Level[1] file.
I don't expect to find all of the 40-bit words of the Level[1] file with this new search. I expect to see a subset again but, I could get lucky.
Zen: my "worthless rumblings" are like this because I cannot afford to tell people the actual apparent answer- it has potential commercial value. Hence i have to talk in vague terms.
As for "delusions" and claims of genius- I made no such claims.
Any tricky puzzle may seem impossible till you know how- e.g. the "string and ring" puzzle looks impossible.
The 7 Clay Institute Millenium problems in mathematics can potentially all have the same weakness- if you find that then they could solve.
Have you heard of Edward de Bono and "Lateral thinking"? Do you think his book "Six thinking hats" is strange?
Is being successful "strange"?
I have something- whether it is in error or not and whether it works or not re: the million digit challenge is not yet considered by commerce- if you think I do not have something then you are mistaken- the only question is how good is it?
there is no mathematical basis behind what he says, he's simply making up things that aren't real and delusional in believing himself a genius.
on January 8th, 2013 at 3:02 pm, Ernst said:It's okay Zen.. Alan doesn't bother me. If he has issues with reality well haven't we all?
Alan you do seem strange.
1 Matches Found. FPos-Element 4563
The first match found from the "compressed" binary. if the binary file checks out then that is "compressing" the compressed.
Thanks for having a little faith in my sanity Zen. I know nothing about yours.
on January 9th, 2013 at 9:02 pm, Ernst said:Hey all.
The most difficult thing to sync up has been me. I'm working on the stand alone decoder. Part of that is to rethink the whole flow of data. Rechecking what has already been rechecked. It never hurts to review.
It looks like the compressed binary is true. Having just parsed the first record I will assume that the following records are going to be at sequentially predictable intervals.
Now on to integrating the number generator and generating the first de-compression / decode for the stand alone version.
On "re-compressing:"
Here is a list of the first few matches found then; since the binary appears to be true.
1 Matches Found. FPos-Element 4563
2 Matches Found. FPos-Element 1901
3 Matches Found. FPos-Element 1649
4 Matches Found. FPos-Element 1900
5 Matches Found. FPos-Element 5251
6 Matches Found. FPos-Element 2635
7 Matches Found. FPos-Element 5273
8 Matches Found. FPos-Element 3934
9 Matches Found. FPos-Element 1595
10 Matches Found. FPos-Element 2625
11 Matches Found. FPos-Element 3060
12 Matches Found. FPos-Element 5011
13 Matches Found. FPos-Element 150
14 Matches Found. FPos-Element 4027
15 Matches Found. FPos-Element 3699
16 Matches Found. FPos-Element 2936
17 Matches Found. FPos-Element 4774
18 Matches Found. FPos-Element 4017
19 Matches Found. FPos-Element 1987
20 Matches Found. FPos-Element 1518
The above reports the compressing of the "compressed" subset of MDF.
The same ratio of compression 7.5% is still valid encoding smaller a second level.
There are many questions I have. Like will all 5421 40 bit words of the this binary be found under a "full load run?"
That answer and more has to wait until I load up and run a full load.
Could go either way. I am exploring after all.
It's exciting guys.
Good Luck Challenge people.
Ernst
on January 9th, 2013 at 11:19 pm, Ernst said:Update:
Current Chill out music: http://www.youtube.com/watch?v=7apji-hg5j4
--------------
I Love that kind of chill and also pan flute but I digress.
I just extracted the first MDF subset source[0] 40-bit word from the binary with the stand alone version.
Looks good once again. The re-compression then has a green light.
Okay now to finish it and move on. I'll set up for a full load run on level 1.
Let me call the level we stop at Level[0] or some variant like Source[0]. It's time to introduce such concepts.
I am sure it's possible to generate a level[-1] dataset.
on January 10th, 2013 at 4:04 am, CWCunningham said:Haven't been around in awhile ...
Ernst, are you on the edge of greatness (winning the challenge).
If so, good on you!!!
I'm positive it can be done ... well, let's just say, "incredulously confident".
on January 10th, 2013 at 5:54 pm, Ernst said:Thanks CWCunningham
Winning by encoding all that data, that is MDF, smaller is still a mountain I am yet to climb from my point of view.
It is still a really big challenge for me even though I've found codes for some of it.
I'm no where near claiming a win on Challenge 1 or Challenge 2 nor am I thinking I'll have a solution to either challenge any time soon.
However, it would seem to be true that a subset of the file is vulnerable.
At this point it will help us all to be clear on the progress I claim to have made.
The truth is that, it is a simple construct that demonstrates a codec that has an associated data set.
It's a dictionary method.
I can match any "word" in a file that is in that dataset and decode is simply a look-up table function.
The data is generated by a number generator, of my own design, and is not based on statistical aspects of the Million Digit File directly.
It is the year to share some good news of partial success' so good news Challenge friends. A codec of data-compression value has been found albeit a simple one. If I may say there are more wonders ahead. It's beautiful.
Please don't let this news hamper your efforts or darken your spirit. Mine is not the only path. Of that I have faith.
on January 11th, 2013 at 5:34 pm, Ernst said:Update:
Synced up the stand alone decoder output binary file with the subset binary file. The "CMP" utility reported no differences. Good enough for me. How about you?
If there is more interest comment I guess.
on January 11th, 2013 at 6:28 pm, Ernst said:I just started a "full load" run on the compressed "level1" binary.
That file did change. It had a header added to it.
Full load means all 16 instances searching the Level1 file for words that match the data generated. I have an 8-Core AMD Bulldozer CPU now running on all 8 100%
I will be moving on to the next design from here.
Zen wrote: "
If anyone has questions feel free to ask. I'm rather relaxed at this point so it's all good.
I could write a manual decoder version. Allow anyone to type in 37 bits and generate a 40-bit output.
I can demonstrate generating the Level[0] file from the Level[1] file.
I don't expect to find all of the 40-bit words of the Level[1] file with this new search. I expect to see a subset again but, I could get lucky.
Leave A Reply
Zen: my "worthless rumblings" are like this because I cannot afford to tell people the actual apparent answer- it has potential commercial value. Hence i have to talk in vague terms.
As for "delusions" and claims of genius- I made no such claims.
Any tricky puzzle may seem impossible till you know how- e.g. the "string and ring" puzzle looks impossible.
The 7 Clay Institute Millenium problems in mathematics can potentially all have the same weakness- if you find that then they could solve.
Have you heard of Edward de Bono and "Lateral thinking"? Do you think his book "Six thinking hats" is strange?
Is being successful "strange"?
I have something- whether it is in error or not and whether it works or not re: the million digit challenge is not yet considered by commerce- if you think I do not have something then you are mistaken- the only question is how good is it?
(re-posted as previous post somehow posted other peole's comments)
@ Alan
See I believe in you man.. I'm sure your ideas will bear fruit.
Hey, I have a plan on the next evolution of the number generator Alan.
So with clear direction I'm starting today.
Good luck Alan. I hope to taste the ambrosia from the fruits of your labor soon!
@ CWCunningham and all
191 matches found so far against the already compressed subset from the Million Digit File using the same software that compressed 5860 40 bit words or 29,300 bytes from the Million Digit File.
By using a simple estimate of output count so far it's reasonable to hope that all the 5422 new 40 bit words will be matched. It remains to be seen if that will hold true.
I think I am on to something here. It will require some work over a few months to get an encoder designed.
So CWCunningham and all I'm going to press on with an evolution of this maths I use now and see what that encoder offers.
The encoder software I envision will encode the whole million digit file that is a give you can trust but will it encode the whole file smaller? That is unknown even in theory at this time.
@Alan.. Sorry if you have been made uncomfortable. It does help to have something working to show like I do. I feel much better today than just a year ago. I can hold my head up and say to people who know of my dedication to this challenge and tell then I succeeded in encoding like I said I would last year.
Like I say it goes a long way to be able to say that and have the proof to show if any one cares which as Mark points out few do unless they get the source code to go with it.
I tell you guys I was wide eyed back ten years ago when Comp.Compression was wild with dreamers.
I remember being awestruck reading some of the ideas people had. I enjoyed all the banter about "Random Data compression." It, after all, fit into my programming interests then and still does now.
Some how over time the fires of creativity and BS have been tamped down to smouldering coals. Still remnants remain of those early days and the general rebuff of dense data claims in all of us.
I say to you fellow MDF people I have compressed. I am recompressing. These are truths.
Well,on the home front, my finances are recovering and I have to shut down my Internet for a couple of months to catch the bills up. So, I don't know what day is the cut off or when I will post again after that. I thought to check in, in case it is of concern.
On the recompression, I suspect 50 days of run time to find all 5422 words if they are all in this dataset. I have nothing to suggest they all are. I assume they are not. Still I have enough already to prove re-compression is a reality.
That's it really. It's been a long strange trip so far with miles to go.
I am not out of the running on MDF challenge at all. In fact I have to rule out the possibility of compressing MDF before I will be sure with this next evolution.
So get busy guys. You don't want me to win do you?
Ernst
I'll leave the winning to those who are actually working on the problem.
I had been working on an approach to find the weakest point to attack, when it occurred to me that I needed to get back to what I was *supposed* to be doing.
One day, I decided to look back at that code, and I saw that I could finish it up in just a few hours, which I did.
What I discovered was that those boys at Rand did an excellent job of producing a homogeneously random bunch of bits. This doesn't stop my approach to compression at all, but it does show that a successful approach will not be trivial, IE, there are no (obvious) shortcuts.
I can't think of any reason why you shouldn't win, Ernst. In fact, I'm pulling for you. If my approach has any validity, it will still be valid, whether it is the first to succeed, or the last. But I won't be snapping at your heels in the foreseeable future. So hit a home run soon, Alan has the secret to the universe and it's only a matter of time before Bill Gates decides to buy it for a few hundred million (and what a bargain).
On a related note, my proprietary idea which would be the key to my solution, may prove to be less 'security sensitive' than I thought. That will free me up to release the source, if, and when, the time comes. For now, I'm still preoccupied with other things.
I am currently at a low point in the economic year. In fact my Internet is whatever I can access wherver it's available for a few months.
Anyway, I Update nonetheless.
@CWCunningham
I understand. Thank You for your kind words.
The point of this (MDF) dataset is that it has none of the relative qualities we associate with classic data compression. That in itself is it's strength.
The, on my end, current focus is a number generation function generating a dataset which I can use to search files to be compressed for matches to that generated dataset encoding those data with fewer bits.
This new effort is read to run once the previous is done.
The status of the "re-compression" of the compressed subset of MDF is going well. I estamate 86% of the 5422 40-bit words will be found.
After that writing a program to allow manual input of 37 bit values and exporting the expanded 40-bit words is a good idea. I can polish up the compression and re-compression demonstration.
Since I am at the Job Center looking for work it's reasonable to point out that I am looking for oworka nd why not a job in R&D ??
I am compressing dense data so I have skills.
So I am open to making a quantum leap from labor class to the sciences. I have cats so that is my only requirement if I must relocate. I must earn enough to live on with cats.
About this next evolution of number generator.
The new effort will map any file to a 320 tebibyte virtual datset.
I will examine the mapping for exploits and hopefully be able to compress the whole MDF using that mapping data.
This is a long term project and I can use secure access to 1000+ CPUs to shorten the time it will take to generate a map.
So, I need a job. I have technology and I'm tossing that out.
Until next time.
Good Luck Challenge people!
PS my email is blocked right now so give me some time to get that bill paid.
Drop a note here if you have a job offer. I am very flexable on wages and hours.
Ernst I'm not sure I should say what you have possibly done (commercial interest potential relevance?...)
(I could say that you appear to have possibly made a "parallel universe" to the millon digit file and this is a hyperspace _________)
!
@Ernst
you stat
"I have good news to start the MDF Challenge New year.
A subset of the Million Digit File of 5860 40 bit words 234,400 bits has been compressed 7.5% and decompressed."
My question is this is your subset of the file a continuous subset of the file? If it is good work! If it is not then I can break the file into 2 subsets using one bit words. The one containing zeroes and the other one containing ones these two subsets can be compressed quite small.
Another point is this say using a large word size say 40 bits its a fact that not all the patterns will exist in the file that mark gave. If you make a universal bijective arithmetic coder with fixed weights for those patterns that exist and another for those that don't exist in this file
you have in worst case for a one million bit file 25000 40 bit words you could use:
-log((1)/25001)/log(2)= 14.609698181... bits each symbol this would easily make the file less than 1/2 the size of original file.
for other files. each symbol not used in the million bit file would expand so that each 40 bit symbol would be represented by 54.609.. bits so most other files might expand a little bit.
Does this win the first challenge. No I don't think so since somehow you have to represent the table that was used for the bijective arithmetic coder that caused this compression.
Mark
How does the rule "then decompress any file of size 415,241 bytes." will be verified, you aren't going to test the compressor/decompressor with every possible combination of files of this size.
Can you please clarify how are you planning to test it exactly, ie - how many files of size 415,241 actually are going to be used for testing ?
>Can you please clarify how are you planning to test it exactly, ie –
>how many files of size 415,241 actually are going to be used for testing ?
I could try to be funny and say I am going to test all of them :-)
But I don't really have a specific number in mind. I will say that if someone submits a file that compresses one file, I will surely test it against at least 10 more.
If my 10 other input files are simply versions of the original file scrambled with a reasonably strong crypto algorithm, I feel pretty good about them being a good test.
So let's say that the submission passes the original file, then ten randomly chosen others.
I would probably have to declare that person a winner at that point. It's very possible that the world could go on to problem that there were files that couldn't be compressed, but it is not for me to do the exhaustive analysis. I would be satisfied that the submission met the spirit of the contest and was at least a pretty good result.
So if you want a precise answer, how about if we say ten? I reserve the right to change my mind, but I won't be doing it just to hassle someone, there would need to be a compelling reason.
- Mark
Mark,
Thanks, that's the answer I was looking for, as it adds some room for making it actually work.
Even if somebody can come up with random data compression algorithm it sure can't work well with all the data every single time.
unless it Is "all the data" every time ??????
Hey everyone!
I am still very poor but in America we do have free WiFi spots so I am now sitting at one.
About the 7.5% compression. It is a subset of the million digit file and the "recursive" compression is a subset of that subset.
I have not compressed the whole(compressed)subset of the Million Digit file just part of that subset. I have compressed the subset of the Million Digit File 7.5%.
What I can show with that is the potential encoding of the whole file.
No claim of complete success compressing the Million Digit file is being made.
Let me see, long time no update.
I stopped running the "Number generator version 1" which gave the 7.5% compression of a subset.
I then spent time reworking the "Turing Engine" concept which doesn't compress all input but does allow for a stable system of modifying the stream. Could be useful in encryption. Possible data compression use too.
I then went on to rework the number generator effort and apply OpenMP coding to take advantage of all 8 cores of the AMD cpu I own. There are some interesting aspects to multi-processors and threads.
Version 2 of the number generator is now running and it is closely related in design to Version one. It is closely related because I had the right idea at the start. I did try different coding in the development of version 2 but found the version one concept the nicest.
So far I am matching to 8192 sets generated albeit slowly.
One cycle of all eight threads takes 21 hours and it is matching around 160 new 40-bit words of the Million Digit file a day. That is up from 100 a day with version one.
This effort aims at generating a data set to use in potentially cobbling together data to satisfy the challenge number 1.
What I am now working on is publishing a discovery I made in 2010. There is already a published work on this but that work lacks that actual transform believe it or not. I have the transform so it's a matter of learning to write a proper peer reviewed paper that will add to the previous work by an honoured and distinguished cryptographer.
I have little experience or training in writing a proper paper so I'm thinking I could use a buddy who has experience in doing this. I'm considering the local college or signing up to take the basic two algebra classes at the local J.C. to gain access to the maths faculty.
So that is what I plan.
Over all and again the time frame for the version 2 number generator run is 18 months. The actual generation can go to infinity but the data set I assume I'll have after 18 months will be enough to start with.
I simply did not find an exploit in any of the encodings (codecs) I have written over the years.
I now assume that simple matching of one value in file to a value in generator is the way to go for future efforts.
Perhaps in the future generating such a dataset as a 5 tebibyte set will only take a few seconds on the computers of the future but for now we have to brave the slough rather than take the free-way.
If you are interested in coaching me through a paper I will consider that. After all you are my data-compression friends. After all I am the common man attempting the (assumed) impossible so why not be humble and ask for guidence.
I do doubt that I will compress random data in real time but I know I will always try to find some exploit because it suits my nature.
What I can find solice in is that I have something to contribute because of these efforts having been made.
So for 2013-2014 I am wanting to publish a discovery that I know the Crypto-community will enjoy a lot. I assume that will aid the Data-compression community too.
I can see BWT being modified as well as other applications. It might be an advantage for some corporation if they take me on board. I still need a good job even if it's part time so I can go to school.
Never hurts to network.
Good Luck Challenge people!
My guess is that with your computer skills and bravery, if you were aware of what i have found, it may be commercialisable. You could form a company to do that?
There are only 27 ways of placing 27 beans in 27 boxes.
There are only so-many ways of fitting the "beans" of people's birthdays into the "boxes" of the days of the year.
It is said that if you go in to a room that has 23 people in it, you probably will find two of them share a birthday.
What if you went into a room and it had 364 people in it, and you never found a shared birthday even by the time you spoke to the 365th person?
(called "a large eddy" (or "electro ... magnet")
What if you went in to a room with 365 people in it, and you only came across a shared birthday every 52 people (what you have done, inadvertently ...?)
What if you went into a room with 10 people in it, and they all shared one birthday? (called the opposite of electro magnet: gravitational ("collapse") ?
(and what if you.... )
I have said quite a bit.....
error: was to write: 27 ways of placing 3 beans in 3 boxes
Gentlemen: I commend Mark for providing a venue for the discussion and allowing it to live in a very 'noisy' channel. I am sorry to observe that a couple of you have missed an important point: Challenges like this one stand up over time because people misunderstand the problem. You must provide an implementation that compresses an arbitrary file. Solutions like the 'mad professor' approach of hiding the data in hyperspace are nonsense because the potential value of accessing hyperspace is far in excess of any value of winning the challenge. It will be trivial for Mark to supply a file you can't compress.
With that said, I have to travel back to 'the other side' for a few millenia. Watch out for bald men in suits wearing similar hats. (I'm going to miss Fringe even though the stories were thinning out).
@Alan All of the solutions you describe about going in to a room with slightly more than 360 people in it and finding 'n' people with the same birthday are possible and have nothing to do with 'spooky action at a distance'. This is simply a demonstration of the mathematics of probability.
Each case you describe has no magical short-cut causal connection to cataclysmic events in the cosmos and your perception of it is influenced more by unusual brain chemistry than anything else.
I wish I could cure the side-effects but I can't and I wish there were a gentle way to persuade you that there are benefits to the medication but I can't. I hope your situation improves as you age as it has for my younger brother whose future was cut down by a tragic neuro-chemical development. None of that prevented him from saving the life of our youngest brother with his gift of bone marrow stem cells. There is a purpose to life and we all urge you to accept any help that is offered.
wow, John , you sure are trying to cause some type of annoying?
who is the "mad" professor? People on the street used to think that man would never walk on the moon. People used to think it was witchcraft to sail beyond the horizon on a ship?
People who had an agenda against those who were slaves tried to claim that slaves who run away were suffering friom a "disease" called "drapetomania".
Colonel Sanders received one thousand "no"s before someone said "yes" to his receipe for Kentucky fried chicken. EMI said "Guitar bands have had their day" and wouldn't sign up a new band called "The Beatles".
Are you afraid of losng your priviledged position in the world? Or are you just being a classic TROLL?
I have what I always said I have: an apparent, theoretical, potential answer to the so-called "million digit challenge" and a lot of other things. I object to any call to psychiatric fascism- conflict and disagreement are just that, conflict and disagreement. Those who have priveledge might not like a more free world- trying to medicalise and verbally abuse those who dare to challenge academic orthodoxy is a type of fallacy in argument (read: "The Uses Of Argument" by Stephen Toulmin, also anything by Thomas S. Szasz).
you wrote an extraordinary thing- apparently you claimed that any solutions involving hyperspace werre nonsense because the value of accessing hyperspace was far greater than the prize Mark was offering- um- how can something be "nonsense" because it is more valuable than a prize being offered?
That looks like what is known as a "Non sequitur".
Your conclusion is not warranted by your premise(s).
The claim is made that it wil be trivial for Mark to provide a file that I cannot compress. That is your opinion.
You refer to "spooky action at a distance". I did not use this phrase. I can say other things.
You refer to "mathematics of probabilty". Yeah- what about higher-dimensional mathematics, improbability, and so on?
You talk of "magical short cuts to cataclismic events in the cosmos"- I did not use this phrase; you may be refering to my referring to "electro-magnetism" and "gravity".
There appears to be a misunderstanding and/or lack of comprehension by you of what I am doing. Your lack of understanding and/or comprehension of what I am doing should not lead to verbal / moral attack.
I am looking for licensees for commercial use of apparent theoretical breakthroughs- so I have to say less than would reveal too much. If you want to know the missing bits- please assist me to find a licensee, so eventually the detail can be published with patent protection if need be.
accusations of "unusual brain chemistry" is just an abuse- fist, it matters not whether I or anyone else has "unusual brain chemistry"- it has been said that Nazis understood that being a Jew meant you could be discriminated against, similarly in western societies people who are accused of being different are often abused- in such societies it is tanatmount to calling for abuse of a person to claim that they are different. You then went on to abuse me by talking of "medication"- that has no more vailidity than calls for "curing" slaves that dared to run away.
Your post could be "trolling" - all that talk about stem cells- I'd say you are stirring.
Regrading apparent lack of understanding;
I often look at "minimum defining characteristics" of concepts. It so happens that I found that what is called "electro-magnetism" appears to match (as a pattern of information) another pattern. Similar with what is called "gravity".
It is sad if you are too scared to challenge those who would claim to have "the universe"- the physics and mathematics hierarchy. The Universe belongs to everyone.
@Alan on February 20th, 2013 at 3:55 am, :
You made an error again: placing 3 beans in 3 boxes
could be done in 6 ways, because this is a permutation problem
and 3! = 6.
Written out:
Box 1 Box 2 Box 3
1 2 3
1 3 2
2 3 1
2 1 3
3 2 1
3 1 2
--> 6 possibilities
http://en.wikipedia.org/wiki/Permutation
Not an error:
beans: A, B, C
each box can contain 3 beans, there are three boxes:
(I could call this a differentiated limit in 5 dimensions. I I got this beans/boxes notion from a Time-life book about Mathematics)
It only seems to adress the question of how many of each bean is in each box, with room in each box for three beans.
you could possibly call it "quantum entanglement"
there is likely an error here as I listed 28, must be a repeat entry somewhere:
(1) box 1: empty. box 2: empty. box 3: ABC
(2) box 1: empty. box 2: A . box 3: BC
(3) box 1: C . box 2: A . box 3: B
(4) box 1: A . box 2: B . box 3: C
(5) box 1: C . box 2: B . box 3: A
(6) box 1: empty. box 2: BC . box 3: A
(7) box 1: empty. box 2: AC . box 3: B
(8) box 1: empty. box 2: B . box 3: AC
(9) box 1: empty. box 2: ABC . box 3: empty
(10) box 1: ABC . box 2: empty. box 3: empty
(12) box 1: A . box 2: empty. box 3: BC
(13) box 1: A . box 2: BC . box 3: empty
(14) box 1: BC . box 2: A . box 3: empty
(15) box 1: B . box 2: AC . box 3: empty
(16) box 1: AC . box 2: B . box 3: empty
(17) box 1: C . box 2: AB . box 3: empty
(18) box 1: AB . box 2: C . box 3: empty
(19) box 1: empty. box 2: AB . box 3: C
(20) box 1: empty. box 2: C . box 3: AB
(21) box 1: BC . box 2: empty. box 3: A
(22) box 1: AC . box 2: empty. box 3: B
(23) box 1: B . box 2: empty. box 3: AC
(24) box 1: AB . box 2: empty. box 3: C
(25) box 1: C . box 2: empty. box 3: AB
(26) box 1: A . box 2: C . box 3: B
(27) box 1: B . box 2: A . box 3: C
(28) box 1: B . box 2: C . box 3: A
I found that there were no repeats, the numbering was missing 11, so the list does contain 27 entries
...
rather: it only seems to adress the question of "which bean is in which box", with room in each box for 3 beans
Gawd bless the noise John :)
After a decade of writing codecs related to MDF it's just fine with me to have (even this) social interaction with those doing the same. The Monks life of it sucks if one is too isolated.
Proof: The work is as I think it should be; full of reflection with every word, construct and example.
So far I read that it should be clear and understandable. The contrast between novice and professional demands careful reasoning.
Have any of you published a paper?
I will be watching Joel FeinStein's videos on proofs but I welcome advice on writing a paper for publishing.
Anyone work with OpenOffice? I have questions about how to use it to do formulas and graphics.
As it stands, I have nothing that can crack random data down to compression "in whole."
Sure a subset of the MDF has been compressed and then some of that compressed subset has been compressed but the challenge of encoding all files smaller is there.
With thye Number Generator V2; In time there will be a complete set of all the elements of the MDF file generated by the number generator effort here. Also in time each of the 8192 sets will find all the elements of the MDF individually. That is months and years with just an 8-Core cpu.
In time perhaps, some careful selection of encodings might qualify as a compressing the file.
It's all "maybe stuff" even after a decade of trying but, it is real.
Oh and @Alan..
I was thinking about you and hyperspace which has never seemed real to me in the first place but what about zero dimensional space?
Could space itself have qualities without mass?
Could there be a cycle or system in one dimensional space?
What kind of space exists just beyond the edge of our expanding Universe? Zero-Dimensional space? Could there be negative dimensional space?
@Alan..
Is what you are reasoning have anything to do with Hilbert Space?
If one wanted to go to extremes with the 3 beans 3 boxes there are far more solutions than 27 it's actually much higher
example even for your first entry
(1) box 1: empty. box 2: empty. box 3: ABC
Actually when you look at a box in this case
box 3: ABC could one not make an entry
box 1: empty. box 2: empty. box 3: BCA
if one considers thinking outside the box who
says you can't look into the box and see the
order of the beans.
Also who says the boxs are not totally different.
maybe you look at box 2 first so that
box 2: empty. box 1 empty. box 3: ABC
could be considered another entry. Of course
this could be extended to infinity if one
makes up enough rules such as where the boxes
are placed on table and etc...
In other words I think even Ben is as correct as
Alan it is just who makes the rules. Sure in this game
or any game like this one can also call the others
wrong with a few more rules.
What does this have to do with Marks challenge.
I think that if one is allowed TWO bit files. One a
pure data file and one a program file where the first
bit of the program file is a complete program to the
special computer saying to combine the bits of the rest
of the so called program file. With the pure data file.
It will create the random file mark used. And the length
of the TWO bit files will be shorter than the Fixed bit file of marks random data.
I still feel that Mark left a way out for the fixed case by allowing two files the so called decompression program files with an extra data file. This kind of arrangement does not violate the counting axiom.
@Alan
That maths of Hilbert space is something I have to learn. I understand it allows us to position satellites using 12 dimension state space.
Speaking of that what do you think of my series
{...-3/-2,2/-1,-1/-1,0/0,1/1,2/1,3/2 ...}
I have a possible explanation or at least a suggestion as to what 0/0,1/1,2/1 are about. One possible point of view anyway.
I realized a relationship in another famous challenge and wrote down that series because of that insight. It's based on something real.
Does it suggest anything to any of you guys? What would the series be called?
Update:
Version 2 of the number generator is running. There are about 4000 unique matches and about 200 more repeats at this time.
As time goes on and it gets closer to finding all 83049 40 bit words of MDF I am sure the repeats will go up. The Ideal is 8192 repeats for each value since I am searching 8192 sets for matches.
It would be fantastic to have 83,049*8192 matches to choose from in cobbling together a compressed file right this second.
I say give me a few thousand threads and I can get this done faster.
Anyone care to donate a super computer for a couple of months?
That's it for now. I will get to Internet as I can.
Oh also anyone care to work with me on BWT?
I know Mark wrote something in C++ but I am a C-Guy and I would like to start exploring BWT. I think having a partner would be really cool.
If anyone is interested in learning about BWT along with me I am open to the partnership. I'd love to have a code-friend.
I already worked MTF, I also have designed some that applies to MTF and BWT.
{...-3/-2,2/1,-1/-1,0/0,1/1,2/1,3/2 ...}
My Bad I made typos on posting my own series. LOL
{...-3/-2,-2/1,-1/-1,0/0,1/1,2/1,3/2 ...}
My Bad I made typos on posting my own series. LOL^2
http://www.youtube.com/watch?v=ITW5bjTipPA
This must be fake :)
John
Of course it's fake.
What's even worse is that this doesn't look like the work of a well-meaning individual who thinks he's stumbled onto a magical compression method.
This looks like the work of someone who has deliberately designed a video to deceive others.
http://www.youtube.com/watch?v=kQsWP6n03EU
Another fake from the same author
thoughts:
Earlier I was very surprised that Ben left empty the columns under "box 2" and "box 3".
Re: David's comments re: the rules: very interesting and not something I can say much about without perhaps saying too much. However, to create "the birthday paradox" one needs the rules that the Time-Life book on mathematics were using (any order of beans, 3 beans spread distributed over 1, 2, or 3 boxes, boxes remain in place once decided (if ------------------------------------you get ---------------the method used for what has been mentioned as project 7?)(rest censored)
Re: Project 7 and "hidden valley":
I do not know much about computers and computer programming, but I looked at this and have a theory of what they are doing and how it works (if it works). At the moment I have as much reason to think it is for real as not to- does anyone have any proof it is fake? There is hardly anything I can say about it here as it may be too revealing (could call this technology "a warp 'drive' )!
as far as I can see they appear to be say partly on to the apparent discovery I have found?
Hilbert space and 12 d state space:
If "Hilbert space" is an objective view of higher dimension space (so virtual 3-d as higher-d space could be regarded I think as "at least two spaces linked by a wall say" - to have any view of the two spaces requires the wall itself acts like a cushion - as if it contains a little space itself (the locations of the other two spaces being "objective" so not exactly located except via mutual location as it were)
then it is an un-stated space
12-d may also be a way to describe a (quantised) unstated space, a 12-d "state space" would be vector analysis in 5 (or 6) "dimensions" (or a Hilbert space quantisation "vector") , so if have "12-d state space" then "Hilbert space" if added to this would give an "uncertainty bias" in 5 (or more) dimensions- a ----------------------------- (Censored for commercial reasons) or the (possibly ???) more-or-less 'exact' 'location' of any object(s) in 'orbit (or statically -----------) around a sort of centre of 'mass') ?
in case you wondered what i just did- i instinctively seem at least to sense how patterns comnect- this can have the curious effect of writing something and a bit like hoping it will be comprehensible ; which on checking often seems it figures out to be true (idea is that:
one gets to trust ones instinct on what word or concept is right though may not know immediately why but after writing can check if it at least seems to be on a brief check as logically o.k. )
e.g. example: solve problem how to skip a stone on water many extra times instinct answer: don't skip it - continuously rotate it in 5 or more dimensions (hold it flat at a precise angle) - now one can look at this idea and see if it is looking like it is o.k. - it may be o.k. to some extent (i heard of a musician called Brian Eno who had a saying "honour thy error"- if he 'made a mistake' instead of throwing away a piece of paper with some music ideas on it, he would stop and think this may open an unexcpected door to a new discovery or potentiality, or something .....
Re: Valley and Project 7
Interesting story but everything in dutch except:
http://jansloot.telcomsoft.nl/Sources-1/Netwerk/2001_1.htm
http://en.wikipedia.org/wiki/Jan_Sloot
There is more but google translate fails :(
Has nothing to do with compression but neural data storage using reference table. Never saw it before.
Mustafa, Budapest
Very interesting video.
Well, I may (apparently, I guess) KNOW the "secret" to this- as what I figured out appears to allow for something that could be similar I suppose to this. I figured out what may have gone wrong and why (if the chip blew up, this may be because of the connection between this type of "technology" and "structure"- if his chip had a buffer zone this may protect it from such interactions with its surroundings that could muck it up). Related subject: solitons; how to utilise an environment to sustain data for very long distances via (the) data "surfing".......
My theoretical technology both compresses and expands the data (in higher-dimension space), rotating it in multiple dimensions till you have a perfect something- it seems to be related to how humans may "store" data- if you go to a very quiet place and listen- you will be able to hear I think a very faint "soundless sound" - like a Sunpak flashgun at the top of its warm-up cycle- so high pitched it is almost "transparent".
I discovered that this "soundless sound" may be like a "space-time doorway"- a sort of 'quantum' superposition of every thing you have heard, seen, touched, tasted, felt, and thought about etc.
It also happens that one description of this idea could be called '(the) theory of every thing". I also call it "a hyper-space bypass".
So who wants to be a billionaire? I sure need a house and some income....
@ Alan..
I wish to name the series {...,-1/-1,0/0,1/1,...} Fibonacci State-Space
Since the "object"is size Fibonacci number as the numerator.
The "division" by zero is interesting and I have to go with BramaGupta on how to handle it. I think I get the Guy and why he suggests we write it as a "pending" operation.
I define 0/0 as function without output.
What BhramaGupta means by Naught is something I am pondering but it seems he is referring to function of object with one and function on object with the other. Naught may be the object but 0/0, for example is function on object.
In my thoughts these days, I see 0/0 = 0 but any other value the answer is an infinity not a rational at all. The answer doesn't exist in finite state-space from my point of view.
I suppose if, in Fibonacci State-Space, then, say 6/8 the object is a 8-dimensional object. in 0/0 it is a 0-dimensional object.
Then again I need to understand why a six dimensional object is called a http://en.wikipedia.org/wiki/6-cube Hexerate
Hex seems to be base 16 to me.. lol...
What all this is suggesting is that functions in higher dimensions may offer some insight for us who attempt to invent new data compression. That's all I'm on about.
As for my traditional Update:
The Number Generator effort is going well.
I was able to write an "inelegant search" function and that increased matching 120 fold. I was more concerned in truth of results at first than speed.
As of today the total approaches 400,000 matches with less than 1500 unique matches to go before all 83,049 bit numbers have at least one vector in the dataset.
I plan to post the last unique number found when it's reported to me here.
Multiplicity of references is an asset rather than a detriment.
The more "matches" the more flexibility in attempting to index the dataset in a smaller binary State-Space.
After all, All Data Compression is a Game of Indexing.
I also believe I must attend college now. I am making the slow turn because going to school at 52 isn't about career it is about life-direction. If I go I should see it through to a degree.
Naturally I must start at the bottom and revisit the basics necessary to be successful such as English and the Maths. The Junior College seems a smart move. I amy have to "warm up" to a life of work and school but I have little else in my private life so adding school won't hurt me.
Also as it would be I am advancing to study BWT now. I hope all of the efforts to transform data I have made will apply when i am able to write and BWT Codec.
Any suggestions on reading or source code to study?
So, it's still going. I am not out of the game and I soundly believe compressing MDF is possible.
That's all I have now.. I hope to afford Internet service at home later this year. Also later this year I will attempt to index the matches of MDF made with this number generator effort. Later this year means after my "seasonal job" is over and The Winter is at hand.
Spring and Summer are outside months for me :) Lovely Spring this year.
So keep Banging those "Data-Compression" rocks together. There is a reason we believe it is possible to compress MDF. So, Good Luck Challenge folks.
Denominator not numerator.. LOL always making typos.
A is numerator B is denominator in A/B
Doh!
@Ernst
Hex, as a Greek word, means 'six' (6). But hex, as in 'base 16', is an abbreviation of 'hexadecimal', which means 16. It's just like 'six' in the english word 'sixteen'.
Hope this will clear your doubts.
@Alan LOL Dude..
I get out to Internet now once in a week or longer and I get a chuckle at the "_____________" Commercial sensitive.
Well at home the count of unique numerical matches continues. I believe it's accurate to say counting down to 200 left to go.
It is finding new mathces less frequently however I would be shocked if it cannot find all numbers of the values {0,...(2^40-1) }
Again I state I plan to let it run till Winter anyway but finding the proverbial "Last-One" will be celebrated.
The good in finding all the _________ Unique matches is that it will represent at least one vector for each 40 bit word in the MDF. For some "_________________" Reason I adopted 40-bits as my word size. I gave this curiosity some thought and I believe it comes from an odd exploration of a design, that proved to not be "all that I wished it to", but from that 40-bits seemed a reasonable length to work with numerically.
Truth be known, no matter what the distribution in the MDF it is rather even for all bytes. For example the current data set size I am working with, excluding the number generator and matching, is 4096 bytes. For that the "byte-symbol" set count is always 256. That means in 4096 bytes expect all 256 byte values to be used.
My latest "________________" sensitive effort is exploring reducing that symbol set down below 200 for each 4096 "block" of data.
This is in anticipation of exploring BWT, however, naturally; I will be sending it out to Bzip2 just to see if any magic happens.
So, I'm not there yet with the "__________" output however the math is solid already.
I plan to walk through the process and learn more so I can bring the construction into focus. It does cost o reduce the symbol set. It is revertible.
Now let me ask. How do we deal with mapping byte values?
I mean say I have this hypothetical "___________" transform. Now given that I can reduce a set of symbols with a modest expense, I then see a large cost in storing the symbol orders as related to frequency. In that I would craft a generic body of data for the whole file and strip the symbol mapping to that aforementioned symbol-data track.
So in that expense of byte order mapping I assume it's about expressing that data in whole and in full size. This is a bother since it inflates data size.
However, I can subject the data generated to the same "_____________" process so as to unify the data with the original data in the end. Still increasing the file size is not desirable naturally.
My restated question "in the clear" is what methods of representing byte-symbol order are understood? Are there clever algorithms I may benefit from the study of? Some clever referential system by chance?
Anyway!!! I am still reading on BWT and I think Mr. Scott and his friend are outstanding in how their "S" transform is written up. I am still pondering all the terms but as far as I can tell; for an exact description, David Scott and his friend are concise to a precision I am yet to master. I do, however, have a copy to digest so I'll be nibbling on it for a while. I believe it doesn't get more exact then that.
So, this brings me to a more general question: Assuming one can manipulate data in a commanding yet reversible way, what makes something compressible?
I read that BWT is a process that makes symbols frequently following others or preceding a symbol, more likely to be in "runs"n of same symbol. Is that the main benefit of BWT?
I believe that organization is considered as favourable input to some algorithms. But which?
I would also assume that fewer symbols in a set of symbols say 100 verses 256 means more repetition and that suggests to me compression.
I also have assumed that some "constructed-codes" for symbols also lend to compressibility along with error detection.
So, with respect and reverence I am asking you; if you could " change MDF" into data with qualities what qualities would you believe would lend to compression?
I'm starting to venture into this 'State-Space" from the number-milling I do and I find I am not prepared since my data set has been MDF almost exclusively for over a decade and I did little with proper data-compression works before that.
So what would you choose if you could "transform' data into other forms? What qualities would you seek?
In humorous terms What "White Rabbit" would you follow?
I am asking honestly. I have not followed traditional schools of thought in this Data-Compression hobby/challenge and now I am facing these questions. What is compressibility?
I have come to terms with "entropy" but see that even learned folks disagree on it's exact meaning. So context is king I assume.
Again what makes for compressible from uncompress-able.
Thanks.
On a personal note the employment season is about to start here so that is a positive. I plan to reclaim Internet service at home at some point but it's months away still.
So, friends.. I cannot put this "book" down however at some point I must return it to the library of life. Until then I do not see myself giving up on this quest.
I appreciate any input you can share.
@Vacek Nules
Thanks. I assume I could have search for an answer but conversation is rewarding too. Thanks.
Hilbert space is exciting. Robotics is indebted indeed.
Now if we can do for the process of thought and conscience the same then we will truly have State-Space. I do believe our minds work on mathematical principles.
Isn't it funny we seek to design machines that think when we are one already.
I have been exploring Plato so I'm on about such things.
Again thanks.
"So context is king I assume"
!
You deserve a gold medal Ernst but I am tooo poor to say why
You are asking the very awesome questions...........
Well Alan, I would say that context is King of data compression in the same light that Higher Arithmetic is the Queen of Mathematics.
I should also be more sincere in general here. I am being a Newton for sure with all my "__________"'ing
The works I am experimenting with is a Mathematics that allows us to transform some "data" we can call "object" to billions of other forms of data the same size in bits. I discovered this mathematics.
I am now considering David's S(.) transform because it promises to "bwt-style" sort a data without overhead.
Thus iterating David Scott's S(.) transform is realistic.
I'm at my local coffee shop right now using the Internet to see if there is existing code for David's S(.) transform. That would save me from reinventing the wheel although I shall continue to learn what I can about the BWT in general and to absorb the fantastic construction that is the S(.) function.
Imagine Alan that given an "Object" of say 4096 bytes, that I can present to you thousands of versions of that object for just a few bits of cost.
Imagine that from that set one of those instances (I call them eXpressions) is selected for some quality(s) that lend to construction of a data that will compress. We are now constructing a data rather than simply exploiting a data which as we know has a limit the MDF challenge is based on.
In general I think this is the path we are all wanting to walk. This mathematics is something useful and I must learn what I have to to properly report this discovery.
@ David Scott
Dear Mr. Scott. I am unsure as to if I have an email account I can use since I am still needing to pay my bill with the phone company but I intend to contact you privately.
In short, if I am reading things correctly, you S(.) transform requires no overhead. It costs us nothing.
The news then is I can iterate your S(.) and we can construct new data from old. Perhaps we can construct something compressible from that which isn't through iteration and modification.
This is Mathematics so it's not a fluke concept.
Okay guys I need to see if I have access to my Gmail still. I may have never written the PW down but the computer knows. Until I have recovered myself I will be checking this forum for news.
I wish to let David know that there may be something more exciting to come for his S(.) than what he has already done.
Also I could use working code if there is some. I have to check on that. I mean I could write my own code but it's a lot of work to do so I am sure. So Link me please to David's S(.) code fi there is any and I can start experimenting with this.
I have a good foundation with the mathematics of my eXpress() so I want to see some results of iterating S(eXpress(object).
It's that zero cost that is a gem David!! Well Done!
Oh and I don't see why I can't share the "Number Generator" results with everyone and they can work on it while I keep the mathematics behind it closeted a bit longer as I work to publish properly.
I have a lot to do and I think some Junior College classes are in order since I an 20+ years out of school and don't remember how to write or even how to solve a quadratic equation now.
Oh I am accepting donations to that end! I need money to go take classes. I'm poor and I will go if I can pay.
Okay then.. Till the next outing to find an Internet connection.
Until then I have no Email to use until I pay my phone bill I assume. That will happen but not as soon as I would like.
Ernst
Offering to make my encodings the new MDF file.
If you can compress my encodings you compress MDF.
Okay, Good News..
I can be Emailed at quintery747576@Gmail.com
Also I wrote David Scott.
Hey I have to do this and sooner is better than later so I'm now aiming in that direction. I mean I could die and no one would ever know if I don't right?
I welcome suggestions on how to proceed.
Does anyone have an OpenMP friendly version of Scott's BWST or S(.) ?
I started to modify David's code but I thought to ask if someone has done the work already.
-------------------
Other than that nothing much different.
The Number Generator is approaching 83,049 I think it was 83,005 the last time I looked.
I thought to be clear this is a swath of data ranging from 0 to 2^20-1 but it must be realized this as a parallel for loop so those iterations are divided between 8 threads.
By that division it cannot be said that these "last few" matches to complete a set of all 40-bit values that make up the MDF, are rare or even last. It is simply the results of a search order.
In addition to that it can be considered that given enough inputs there will be a reasonably uniform number of outputs generated for all sets.
Currently I am generating 8192 sets for each "input" and over time all sets will have set sizes approximately the same as all other sets. It just takes time to find them.
That's all. It seems to be a few days away before all 83,049 40-bit words are found. After that, I expect to let it run till winter so there will be millions, hopefully, of matches to the MDF.
The Goal after that is to find some indexing that allows each code to be represented with fewer than 40-bits. 39-bits each will do.
I figured that to compress MDF we need to be able to encode the whole file. By that I mean our encoder must first represent all possible values.
As it would be there will be many more codes for each 40-bit value of MDF than just one. I believe surjective is the correct term for this where given the set of 40-bit words that make up MDF there are codes that point to those elements No one code points to two 40-bit values of MDF but more than one code can point to just one 40-bit MDF value.
Oh hell, I'm learning as I go. It's coming along.
http://www.youtube.com/watch?v=hgDaKL_YpKM
james gates jr + Strange Computer Code Discovered Concealed In Superstring Equations!
Wouldn't it be cool if we will know the actual codec of all things in our lifetimes?
Just thought to toss Alan an Info-Bone..
A better version of that video..
http://www.youtube.com/watch?v=GZKmyMaG40o
also http://en.wikipedia.org/wiki/Adinkra_symbols_%28physics%29
I am back at the Coffee Shop to set up my Voice Mail.
I have never had a Cell Phone before so it's awkward.
I looked at the total unique matches of the number generator on the way out of the house and it had 28 more to go.
Again I say, this isn't an instance of least nor does it represent some frequency. This is simply a product of a shared iteration of a for loop among threads and the numerical range I am searching.
Over a large input set the outputs are more or less the same for all sets.
I will come and post when the proverbial "last" match for a complete set of the MDF using words of 40-bits in size when I have it.
I plan to let it run into the Winter of 2013 so I have many choices for each 40-bit Element of MDF.
This also is not a statement of magic compression.
If there is to be a result that satisfies the MDF challenge it will be based on clever indexing of those surjective codes the number generator has generated.
It's possible I will be busy with returning to school so it's possible all I will have time for is to verify and release the data and let others attempt to index the data.
If not then I may attempt finding some indexing myself.
What is important with what we call Random Data is that we need a codec that represents all the data and in a few days I hope to feel more confident that indeed such a thing is a reality. I'm rather confident the theory is sound.
So, Good Luck Challenge people. We can give up the day after we die :)
A "superstring equation" could be said to be a way of showing "the fifth dimension".
A "computer code" is a code about things "adding up". A "code" is an alternate view of a thing, one could say.....?
A "computer code" would appear to be "an alternative way that something can add up".
Therefore a "computer code" implies "an imaginary string", or alternatively: a "theory of a string" or it could be "just a super string" (a higher d string".....(???))
So "computer code" EQUALS (in minimum pattern definition (?????)) space) the information idea "superstring".....(???)
So to see both "superstring" AND "computer code" would result in "strangeness" in one or the other.
The fifth dimension allows SPACE to be seen, if there is any space in it, to actually identify at least some of this space would result in seeing "a strange computer code in a super string" (?)
Although you said "superstring equations" so that would give a mixture of space swaps (3 and 4 d 'random numbers')
off-topic (?) question
Yahoo! are closing their e-mail classic system.
I wish to copy and paste my data (e-mails and files) - it will take over 100 hours copying and pasting each file unless there is away to do this in bulk. They imply something called an "e-mail client" can do this via something caled "IMAP"- I looked these up on Wikipedia but where do you buy the appropriate program?
Alan,
I recommend you get Mozilla Thunderbird. It is free and you will find help readily available due to its wide use.
And of course, it is free.
A good IDE would be to suck all your email out if the Yahoo! account, then back up to a new gmail account. Keeping your email on a local PC is just not a very good idea. I think the only good reason is if you are engaged in criminal activity.
- Mark
Hi,
thanks, however I have looked at Mozilla Thunderbird before and it seems to require considerable expertise to use. If I could only buy a program to do this job I would pay for it...
I have been alerted to "Windows LIve" but it looks complex, something called "True Switch" looks more promising but they don't seem to say what it costs or how to pay for it.
If I was a multimilionaire I would consider paying US$10,000 to get this done...., if that's what it takes though it shouldn't be that hard
I don't know anything much about where my e-mail is kept, I assume it is on a remote server somewhere run by the e-mail company, I just use web-based e-mail like many people do. I have no interest in criminal activity, in fact I am missing out on a lot of things because I refuse to lie, I refuse to pretend to have read or agreed to the incredibly long, very complicated, and often impossibly or impracticably demanding "terms and conditions" that corporate lawyers try to force on to their would-be customers.
(Just as well I have aparently solved many major mathematics and technology and physics issues- I may be able one day to offer people lying-free technology. If companies are worried about frivolous lawsuits they should say that their terms and conditions only apply in-so-far-as customers use the service at their own risk and that the company may be protected from lawsuits by what is in the terms and conditions, instead of a mutual lying situation where users of facebook, linkedin, Yahoo!. Youtube, i'tunes, Gmail, and many more so-called services pretend to have read, checked in case of changes (re-read daily?) so-called "terms and conditions" and the company pretends that the customers have read and agreed to the so-called "terms and conditions". (I heard that the rather dodgy tv show "Southpark"
has looked at this issue- I am not alone in noticing this it appears)
There may be a usable e-mail service in Germany where their laws appear to be more honesty-friendly, otherwise I may be lost from this forum unless I can get dispensation by entering an e-mail address after it no longer functions.)
some comments re: so-called "random numbers" (they are not numbers....?)(they are ________________(Censored for commercial sensitivity reasons ))
This is censored for commercial sensitivity reasons or suchlike:
What is "random"?
How know a __________ is 'random'?
___________________________________________________________________________________________________________________________________________________
random numbers are continuously _____________________
______________________________________________________________________________________________________________________
= computer code
i.e. __________
______________________________________________________
random number 1; 1,2; _______________ implies imaginary ____
a fixed direction in space
non-random number ((implies))
a fixed direction in time
__________________________
3 or 4d random numbers: quarks and gluons
2d random number: ____________________________________
3d random number:______________________________________
5d random number: a probability width width
a probability depth
a __________________________________________________________
4d random number: a minimum path to connect a collection of large and or distended 'objects' in space
6d 'random' number: no such thing unless have 8 (?)
(table)
7d 'random' number: no such thing unless have 7
(periodic table)
8d - doesn't exist unless have 'elements'
then it could 'exist' if (only if?) the elements are (sub) divided into 7s and 8s
metals and non-metals (?)
if you have a periodic table of the elements then you can ____________________________________________________________
the 5th dimension quantise (to at least some degree)
(10d space time interferometer (???))
5d random number: quark _____________
Gluon ____________________
....
hey guys!
Couldn't take another hour at the keyboard at home so I had to get out for some wifi action. LOL If only a computer was a woman I would be popular with my keystrokings.
But It's not lol..
Lets see. the count seems slow with 12 to go still. I may be mistaken in how waiting to find them all isn't like finding something infrequent or rare.
Yesterday I assume it too all 24 hours to find a new match while the total matches is possibly 1,000,000 by now.
If it takes a day then 12 more days if it gets even more stretched out then that is that.
Did I ever mention the MDF is a real pain?
Well what is new... I reworked the BWTS(0 archive from David Scott's site so it is an in-line single call encode/decode with OpenMP action happening.
I bit of testing in application to do and it will be safe to share.
This version aims at the multiple "file" process or bulk packet processing.
I have 8 cores so I can provide 8 different datasets and it will churn away.
This function can be compiled without OpenMP and will function as a single thread function.
So, ahead is the stroke I will have waiting to see a complete set of MDF in 40 bits and in a surjective codec. Meaning that instead of finding a code that is based on source I have many codes that point to source to choose from for some possible compression scheme.
Other than the stroke from waiting I now want to interface the eXpress() function with this Bwts() and process thousands of "forms" (to borrow from the Great Plato) with it.
I'm giving thought to what qualities I want to search for.
What qualities would be ideal under Bwts()? I assume a run of length of data is ideal but that isn't reasonable.
I figure runs of same data are important. That something akin to RLE might be used to measure the quality of that expression.
Lets back up a bit. :
Imagine that processing any given data is possible by sending that data into a function that provides thousands of "versions" or "forms" of the same length string.
The data as viewed by characters then changes elements but not total bit length. That remains constant.
That these forms of the data then can be selected for a small cost of a few bits and can be use in place of the source suggest to me that for some string that has a quality of being made more compressible with Bwts() may have an element it that eXpress() set that is even more compressible.
I am learning what compressible means so please be kind.
So given the idea that we can select from (okay billions and billions and billions) of elements with modest processing of combines systems suggest to me that we have the ability to select a surrogate data that has the qualities that will compress.
That is where I am right now. I am starting that project of developing the software to generate those billions of surrogate data.
I mean to say interfacing the two functions already "off the shelf" Here.
i don't know what exactly to do to apply statistical analysis to this. I admit there is a limit to my formal education and thus my skills need developing but my math is solid. Well it is the math of the realm no isn't it..
So, friend! I am starting to work with Burrows-Wheeler transform Scottified. Bwts() and any suggestions of what makes a great output even more sexy will keep these fingers fingering my keyboard.
Some of the things I thought of to look for are runs bwts() generates as in the best score.
I ask myself what sort of scoring can be done.
I ask myself what makes data compressible.
I have the theoretical ability to iterate the process. With Bwts() there is no cost so I can apply that at will but with eXpress() there is some overhead of indexing so it can be reversed. very few bits.. Something like 2^16 for 4096 bytes not bad.. 2 bytes spent to pick among thousands of forms of data.
So, I figure I could be moving data towards some quality with iteration. It's possible. Then knowing, in a sense, that going this way over that way is better for example is good enough to experiment.
Hey you simply cannot keep trying for over a decade and not make progress so it's over when I am dead but it just keeps getting more exciting.
Again advice is welcome.
@ Alan
Hey,
What the String Theorists are lacking is a data type. PROOF.
I have it. I am working with it now. How else can I get a billion versions of one 4096 byte data then need a Bwts() that can process by the truck load looking for the Gem of a form with the small overhead of 16 bits to reverse things.
No one is going to take me serious when I say I found the "spinning strings" of String Theory in 2010 and am working with the mathematics now.
I am going to say it anyway because the real loser is the bingo player who knows they have a bingo but waits till someone else claims bingo after the next number to yell bingo.
So think of me as King-Kook but I have a piece of the String-Theory puzzle we are looking for with String-Theory.
I have the data-type and it runs of our computers already. The "Spinning-Strings" are real.
I hope to publish the Proof if I can manage to get some schooling going. I think I can do something with the books I have but it will help to take English and Maths again so I have an updated skill set.
I have given this some thought and I am in-between being a Newton with my solitude and my binary-alchemy, and, being a Tesla with grand-visions of application of theoretical works he most definitely proved he could "pull out of his mind like a magic rabbit."
I hold both Tesla and Newton is high esteem.
So I may be laughed at but "BINGO' I have a something to contribute to String Theory, data-compression and cryptography.
There I said it and let it be.
All I need is a job that pays enough to go to junior college and enough to live on so if anyone wants to take a chance on me I'm open to offers.
I come with technology you can't find any place else and 5 cats.
@Alan Well.. over 10 years at this you know I had to find something that kept me in the chair besides the graviton-action at a distance.
So again.. any suggestions of what makes a Diamond out of coal in the data-mine that is eXpress()+Bwts() is welcome.
I suppose I need to study statistics so I can measure frequencies of individual elements.
I told you guys I need 1000 CPU's with 16 cores each didn't I.. That an a Grant to pay the electric bill and the Pizza-Man
Well there it is.. As Kook as it Gets! String Theory data-Type and all.
Good Luck Challenge people!
All my typos make me read like I come from a non-English speaking country..
How funny.
Spinning-Strings anyone?
This looks interesting and so I thought to share..
http://en.wikipedia.org/wiki/Algorithmic_probability
It has some insights of examining data. I can use links like this.
"i don't know what exactly to do to apply statistical analysis to this"
sample 24?
Thanks for a kind reaction to a Stark-Boast.
I think what I was trying to do with the last trip out to use the Internet was divine what methods and tools I'm supposed to know already.
I found a You tube video that is helping. http://www.youtube.com/watch?v=Ba0zSNYkWtw.
Complexity is the topic I'm searching for now. What seems really cool is now that I have a foundation in understanding the structure of information I find I have confidence to "sort through." Bwts pun.
As to value of the "discovery", I believe it can be patented. It's a lot of Mathematics but there seems to be grounds for a patent. After all isn't our Patent system there to encourage people like me to share the news?
I am willing to join-up with established IT now. I need a Job and if I can work/study then that life would be better than the life I have now.
Update: 7 more matches to go on the surjective number-generator codes.
I'm still thinking I will let it run till Winter and have 10 million or so matches to pick from.
I suppose this technology could be implemented in real time. It's a code after all and if we don't have to do a brute force search from scratch it should be as fast as a look-up.
So, 7 more to go for a "complete" set of matches to the MDF. After that it's a question of how to select elements that construct a compressible file from the surjective codes (SCODES).
Thanks for reading. I think this, live this and dream this. I spend every day working at this so it has become my life.
Again I'm open to income opportunities if anyone wants to jump on this too. It never hurts to network and I am looking for work.
Now to find more text and videos to study. I'm working towards understanding how we use Complexity to understand information in a system. I think I am starting to understand what http://en.wikipedia.org/wiki/Kolmogorov_complexity is. I am still thinking about what makes a better data with Burrows-Wheeler Transform Scott'ified.
I am designing software that will give me 2^32 choices for each 4096 byte segment of MDF. Well that is 2^32 choices per iteration and I can employ Bwts() with no overhead so the cost per iteration is just 32 bits.
It's important that I understand what I am looking for or I need to join with others soon. Well, that and write a proof.
I seem to love doing things the hard way huh.. Why couldn't I just give up like most folks :) <--- Because I am a true Fringe. LOL
Again thanks for a kind reply.
another curious thing:
people think that the million digit file is very "random"....?
What could possibly "de-randomise" it- what "force" could be perhaps say holding all those 'random' digits together?
I found that the information pattern called "the force of gravity" appears to be "the natural computing of space".
It seems that this supposedly very "weak" 'force' is stronger than the so-called 'randomness' .....
Not surprising? That only a very 'weak' 'force' could be capable of re-organising all those 'digits' in a more noticeably "organised" (?) manner?
Those are ideas many of us think about all the time.
I find I even go back and rework things I came to have a reasonable confidence "didn't" work and it helps sometimes.
I intend to patent if possible since I do need money. I got in touch with my brother and I think I will open up and share what I am doing with them.
I was looking at the report and it looks like the number generator is in another slow stretch after having a "spert" So I will keep a general watch on how long between but we were down to 5 more to go.
Again I stress this is NOT compression this is an encoding where there are many codes pointing to a few objects (83049 40-bit words of MDF in this case)
The worst case size is 48 bits to generate 40 bits so there will be some work to find some indexing that will bring it down to a uniform or average of 39 bits per 40 bit word.
At least, my Challenge Friends, we are not stuck with just the structure of MDF any more.
I am here at the coffee shop because some bastard stole my motorized bicycle and with the help of my brother I am shopping for another or maybe a used moped.
Just when it looks really hard to make it month to month someone kicks you down. F'in bastards
So what a bummer. I am very distracted now.
I was reworking Scott's bwts() and merging it with the new function eXpress() but I'm off my game now till I know I can report to work when the grapes come.
They stole my only way to work.
Again folks if any one can employ me I have talent and technology. For example say you give me some data. Some length say 4096 bytes I can give you billions of versions of that with only a cost of 32 bits overhead. Naturally I am saying it's reversible.
If we cannot find a string that can compress out of that then it's useless but when it comes to mathematics the clever rule so hey! I need a new life. I need to get out of poverty.
I'll drop in and post any time I have access to the internet.
I doubt I will do much work on the Number Generation data till winter. I possibly will get some work done on the bwts()+eXpress() in a month or so.
f'in thieves steal a poor man's motorized bicycle and stop him form working a low wage seasonal job.
Ernst that sounds like rough news, hopefully the law will find your motorised bike/ you'll get a recognition of what you can do by the commecial world.
It has been said that "one cannot compress all of the files all of the time". However, it is clear that many versions of the million diogit file can be easily compressed: e.g. a milion 1's. a million 2's, a million 3's, etc.; a hundred 1's followed by a hundred 2's.... to a hundred 9's, then repeating; and numerous other patterns.
So the list of "all possible million digit files" can easily be compressed because a number of those files are easy to compress.
If the actual million digit file is on the list of easily compressed patterns, then it is easy to compress.
Can you, Mark Nelson, refute this argument:
I presume you agree with the logic so far, now:
All that is needed to easily compress ANY million digit file is a method for converting any particular Million digit file into an easily compressed pattern (and presumably converting all other possible million digit files into different arrangements so that possibly files that were easily compressed no longer are).
The actual million digit file can itself in theory provide at least part of the instruction- the file tells the computer that it is one of a large number of possible files, and to change all the possible files so that the actual million digit file is easy to compress. So you don't have to compress all of the files all of the time, you just need to allow files on the list of all possible files to swap places till the file you want to be easy to compress is an easy pattern.
Of course you then may have to have a way of analysing all possible files to decode what the actual now code-written file actually is ( I apparently know an answer which the commercial use of I am looking for a licensee or sponsor)
Hi,
I wonder something. Basically, the claim is that the binary file has maximum randomness (entropy), so it is hard to find a pattern. That is maximum state of entropy or close to. How about allowing the original file to mutate in guided way. For example if you apply a 8-bit mask to every byte and XOR the mask to every byte, then try to compress with a known algorithm. If you manage to decrease the complexity with 2 bytes or more, you have done the task. Out of 256 possible masks (mutated files) one state should be lower in entropy than the original. If that is not enough, you could apply bigger mask (2, 3, 4 bytes) or apply two masks in consecutive order. The number of mutations of the original file could be astronomical, so one state should win. You can always reconstruct the original file if you keep track of the mask and the operation (XOR or something else). What you try to compress would be a mutant which has a clean track to the original. I have limited programing skills in bitflip operations, but I am sure it is not so hard to try that idea.
Kind regards,
bloodIce
Hello BloodIce..
@Alan.
This encoding I am testing out is the Version 3 of the Number generator and as I left to come to the Coffee Shop and use the Internet there was only one more to find!
The basic concept with surjective codes such as this is in theory there are many codes that map to object. In this case since we cannot find an exploit with random data then logically we must come up with something that can be compressed that will generate the MDF.
In this case I hope to have a few thousand choices for each 40-bit word of the set of 83,049 words of MDF.
I hope there can be some compression but I can't say what will happen. I must trim 9 bits off of each selected symbol so I am looking at RLE-type encoding now.
I will not be done generating the SCODE(s) till winter as it has only generated less than 64,000 matches out of the sample range of 1,048,576.
I do think this will get done. As for a general compressor I think this particular application could function in real time if a dataset was available so as to skip all the time it takes to run through 0 - 2^20 and check for matches. I would assume something more elegant can be crafted. Nothing is written stone with these maths.
About my transportation. I have received help from my Brother Tom. What a guy! I will be ordering a moped today. 150cc should get me to school so things are turning out okay but what an emotional thing. A man and his motorized bicycle, which he put hundreds of hours in on over the years, is not just theft it is an assault. The worst thing is I figure no one will love and care for that machine now. Motorized bikes always require TLC and parts. So they get my bike but they just don't have the skills I am sure. That is really the sad thing. That bike will be ruined if it isn't already. What's the point of taking it in the first place? Drugs possibly.
On a bright note if it was about Meth then there was a 290 pound local bust yesterday here. 290 pounds of Meth will more than likely only scratch the surface of the problem but I felt good hearing the news.
So Alan, Mark and all I think we have turned a corner on data compression and I will do my best to get the proof written and published.
I think you all will be pleased.
I plan to open a website where I can get my email and post news.
Also I will accept donations (if it's legal to do so) when I get going to school (If I need help that is).
I have to check with the Veterans and see if I have any education benefits. I may have some monies I can access there too. Also there are grants and maybe even some aid for older people?? It can't hurt to get help when a maths book is $120.
All in all we get knocked down and we get back up and keep swinging. To hell with bike thieves.
So, I cannot say at this moment that Version 3 has found a complete set of 40-bit words of MDF but with only one more to go it's exciting to say something here anyway.
I can afford the price of a cup of coffee so I can come down with a cut and past which will prove nothing but what good will it do me to lie?
I mean I can only make an ass of myself if I am not being honest after what a decade of work?
So until then. Keep banging those data compression rocks together and good luck Challenge People!
Oh and do feed the Moose.
@BloodIce
I have worked several hundred angles and yes yes yes please try what you think might fly.
Xor is useful and it does reverse so you can work with such things.
In my experiences I found that most schemes seem to expand in the 1.5 times source range and that 1.5 is something I see often in codecs I work. Take the Collatz conjecture dynamic function for example it cannot have a "11" state in the state-space of 2 bits so the only legal patterns are {00,10,01} and it expands to 1.5 times the source if we keep both parities separate for the even and the odd functions.
So, I got used to writing the programs and seeing what it did. Perhaps you can experiment too. I am sure I have not thought of everything.
My advice is be true to yourself and if you win you win but even if not, you win by doing anyway!
Hi Ernst,
I do not think that the source for my idea is more than 20-30 lines (in python), so the source expansion is not a big concern. It is interesting to know if a external call from python to gzip will be fine or I have to implement the compression in python too. Basically, the idea is a loop in python with XORing every byte of the file (already in memory) and test the size after compression with gzip. There is no requirement for time of compression... :-), so I can apply millions of masks (every result will be totally different state). If that doesn't help, we can use another bittransform operation.
By the way, I am NOT competing or anything (100$ are two hours at my workplace), it is just my curiosity.
So folks, are external calls (gzip, and all needed python modules) fine or everything should be in the decompressor?
/bloodIce
one
state
is
"entropy"
@Alan,
Please correct me if I am wrong, but if you shift the state, than you change the entropy. If I transform the file, I assign a different state (it is a new system), which has a different entropy. I am not saying that XORing is the key, but it is some initial idea.
Btw, is there better compressor than gzip? I see that it is some sort of optimal for linux, but it also adds extra garbage to the file (timestamp, filename, checksums, etc...).
/bloodice
Consider:
suppose that a University student has a bedroom. Suppose that it tends towards "disorder".
Someone could notice the state that the room is in.
How could ___________________________________________________________(censored) If the state was ___________________________________________(censored)"?
A ______________________________________(censored) 'time' ???
If there were only so many "states" that the room could be in, and only so much time available for the room to configure into one or other of these alternate "states"- (_________________ (censored))then the actual state the room is in (or the time at which that state occurs- is not so significant- what IS more significant is "the calculus of alternate orders disorders" (known as "the (approaching) category of imperator" -
the state of the room is "obvious" when all other differing states are incorporated in the "state vector (analysis) of the rooms sum of all possible (alternate) states "
(also called (The) THEORY of every THING (the cat in the Hat!)
What happens if I inject order in the room and compress the way I have done it? Say I order all socks but keep their original coordinates and compress the list :-) ? I would put the system in a new state, wouldn't I? The question is if the initial information of the state could be preserved and compressed.
There is no need to "preserve" and to "compress" the initial information of the state-
the initial information is the state
@Alan:
Please, no more posts of this type.
If you have useful, on-topic information to share, this is the place.
If you have information, but you do not wish to share it, this is not the place. Let us take it as axiomatic that you have things you believe to be true but want to keep to yourself.
Now that we all know it, there is no need to repeat it.
- Mark
Mark, you are surely well aware of the "silliness" of this challenge- in that if anyone really was solving it, the potential value would be more than US$100, and they would be solving "P vs NP" among other things.
One can only hope to say a limited amount, to protect potential commercial possibilities, and hope that someone like you would realise they actually may have found something of possible interest to information technology companies, without knowing detail, but it seems your attitude continues to be very negative...
However I am having to be very careful to avoid saying too much.
Do you expect someone to just give away the answer, if they actually had the answer?
Your use of the expression "... that you have things you believe to be true " looks like you are skeptical- I have things- how viable those things are (those unwritten words/ phrases ) is for a non-disclosure-agreement-signed serious honest investor/ party to decide
It is interesting to test my ideas as far as possible against computer people and skeptics, though it is risky as I have to refrain from saying much.
would it be true to say: that a number of "experts" in a field hate the idea of those who they regard as of low social status challenging their high position in their field?
Physicist Erwin Schrodinger reportedly made his "wave equation" look more complicated than it actually was, as he was afraid it would look too simple to his colleagues.
? ? ?
@ BloodIce
"So folks, are external calls (gzip, and all needed python modules) fine or everything should be in the decompressor? "
I believe this question has been raised several times over the years and I believe accessing common libraries, using existing compression software or other common and equally accessible to all is very much okay.
I also believe that Mark is allowing unlimited file size if there is some scheme that "compresses all files."
Honestly, I have not accesses those functions yet but I plan to use them some time. I even think Bzip2 has function calls and also I think Mark has a tutorial on this site to one compression scheme to be called from within a program.
While I am posting the Police found my bike last night and returned it to me. They arrested a man on felony charges for the theft. He or they took off the accessories, damaged the clutch and throttle assemblies and basically did $600 in damages. Meanwhile my brother Tom, sent me enough money to buy a scooter so I can now make it to the Junior College. The scooter will be here in about 15 days.
In a strange way this is a function of Karma in that while this was a bad thing good has come of it for me. I was wondering how I would manage to travel the 35 miles to school on the bike @25mph. I figured I would be riding the bus.
On the progress of the number generator. I made a mistake in counting to myself. I always think of zero being a valid count but I am counting from 1 to N not 0 to n-1 with the unique counts so I am waiting for two more matches not one. Still two to go as of this morning.
Back @BloodIce..
Good Luck man.. I find it fun no mater what and I hope you discover something useful there with your idea.
@everyone.
"Please correct me if I am wrong, but if you shift the state, than you change the entropy. If I transform the file, I assign a different state (it is a new system), which has a different entropy. I am not saying that XORing is the key, but it is some initial idea."
BloodIce asked this..
I am still learning what is meant by entropy in "information theory" but I agree that we are trying to change data in some useful way.
Some other traditional understandings is pigeon hole principle.
In general if you are transforming data there will be some cost. This is an area of study too where we see more information coming out of a data because we do a thing such as trying to compress "random data" and this is easily demonstrative by taking the MDF and make a "zip" file because it will be larger than the source.
The only transform I know of that has zero cost is David Scott's BWTS()
In general, in 10 years, I never saw a way to magically toss even a single bit away and magically get it back every time.
In general the term "demonstrative" applies in that some reference is necessary to derive context in data encoding and decoding.
In, general, the laws we are working under demand that existence of information is a constant and lucky for us existence is not conditional and with that we cannot toss out and get arbitrary bits back without some reference that allows us to know what is what.
As I said before I am sure I have not thought of everything so I'll keep an open mind on any news that can accomplish tossing bits and getting them back somehow "magically."
On the humorous side of things I am really liking the idea that the Universe is a Hologram and that we are puppets in some sort of puppet show. I figure we are all going to find the projector someday after we discover the "Codec" that makes it all possible and then we will realize that there is a projector of the projector in some other puppet show.
However, what is not funny is that all information exists already and is a constant. We discover we do not invent.
Now to chat at our resident "crazy-blankster." LOL @ Alan
@ Alan
"Solving it" may well take a lot of baby steps. As I wrote in protest before it may well take a fringe type such as Cantor or Newton to see some "digital mysticism" that will lead us to that "vision of our Holy Grail."
So while it is painfully obvious that there are limits and that Shannon is our main hero we still are dreaming our way past Shannon's limits not by proving Shannon a fool but by placing Shannon in a context where we can recode and compress (in theory).
Not one person here is claiming magical powers of data compression and I would believe that everyone who has every picked the MDF file apart in earnest knows just how hard this challenge is.
So I'm all good with a new guy showing up and having ideals and concepts to try. Perhaps it is a path that has been walked before by many or maybe not but (they) can grow intellectually by trying.
As for crazy.. We often see success presented for people like http://en.wikipedia.org/wiki/Leonhard_Euler but even he had failures.
So, with the Xor idea yes it's a valid approach so I say he should look into it for sure. If he can modify the bit patterns and keep the expense of such a thing down it's possible he could squeeze some out of MDF.
I have used Xor before and I like Xor. Bit shfts, static rotations and now I am studying the BWT with Scott's work.
Who knows maybe some sort of BWT sort crossed with some sort of Xoring may well offer the mechanics for a proper Codec.
Oh and we all realize the basic thing is it has to be a Codec. That means what can be done has to be able to be undone to be useful in data encoding.
Alan.. You asked about string Theory before and I thought to say I watch You Tube stuff so just wander through the lectures from MIT and such to gain an over-view.
I was saying I think I found something useful and I believe that it will apply to String Theory as well as Information Theory because I think the two are the same in a common way.
I also say to everyone I am not qualified nor educated enough to prove this belief but I intend to try and share what I can.
As Plato would say "to each as they can contribute to society."
So Alan your "Blanking" isn't funny all the time but it is cute once in a while.
The bottom line is throughout history there are more failures than success in all of Man's efforts and as a Challenge it is nonetheless true here as well. Keep a good spirit and be a good sport.
Folks are spending their lives trying to accomplish this Challenge.
@ Alan
"There is no need to "preserve" and to "compress" the initial information of the state-
the initial information is the state"
Perhaps there are "Quantum" States of any string Alan..
What then is P/NP ?
What if there is a function that given some binary string Y there exists a set of "N-states" we might call quantum-states. Imagine that we can select one of those strings in the stead of the referenced string. This is a truth Alan.
So do I think we can solve this challenge? Yes I do.
Here is one from MIT that I have been seeing parts of on You Tube.
I just downloaded the course materials myself.
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-050j-information-and-entropy-spring-2008/
simplicity:
what is "space time"?
what is "time"? "It is dinner time"- implies some limit- something in between "before" and "after".
A pendulum swings "between limits". A clock face has a beginning and an ending- it goes around from 12 (back) to 12.
So the information concept or pattern "time" appears to involve what one might call "a minimum defining characteristic" of the concept or pattern "anywhere within limits".
What then is "space time"? That is, "space anywhere within limits"? Just that: spacetime = apparently "space anywhere within limits".
The "birthday paradox" showes there are limits on any (set of) data. There are only so-many ways of placing birthdays ("beans") in the "boxes" of the days of the year.
A form of time?
Space in that time would allow precise identification of any partcular arrangement of those birthdays. The "Shannon limit" is shattered? Now the "information" Is "the entropy".........
(incidentally an example of "space time" is a ruler.)
bloodIce:
You can xor ever byte with some data, but you still need to keep track of that data when you go to decrompress the file.
Worst case scenario, if you use unique data for each byte of the MDF, the data you need to track to decompress will be as large as the MDF.
If you use the same data to xor every byte with, you won't get any better compressibility, simply because there's no pattern in the bytes of MDF.
You can try something in between the two scenarios, where you xor with data less than the total MDF size, but still large enough to improve compressability (Say, data 1/4 the size of MDF), but that becomes a battle in trying to find a pattern in MDF, and as far as everyone has been able to tell, no such pattern exists.
Remember the rules:
Size of all data necessary to produce the MDF + size of your decompressor program (source code size is acceptable) MUST be less than the size of the MDF.
That means you can't xor the MDF with some data to produce a tiny file but not count the size of the data you xored with.
And yes, you're allowed to use standardly available compression libraries.
@Zen:
Thank you for your informative comment. I agree with your conclusions (not to mention that I tried some primitive implementation of my idea, which did not work). As you point, the battle is indeed to hope for a pattern, after the transformation. XORing the file with itself will produce a file of 2-3 bytes :-), but that will be useless. The idea is to XOR with very small mask (1-2 bytes) and to hope that the new file will have a pattern. Well, it did not. I still believe that from the unlimited number of possible transformations (n-byte mask, subsequent masks, masking only n-th byte, many more combinations and new ideas) one will have a small pattern which will reduce the size with some bytes (squeezing several bytes). However the source code of the decompressor will need at least 50 bytes, so the idea might not work. But, if the compressed (source+mask+basic_logic) is one file and the decompressor is a standalone program (not included in the count), than I think it will work through violation of the rules :-).
Ah, BloodIce!!!
Welcome to our nightmare here with the MDF I think your gonna like it. http://www.youtube.com/watch?v=iQE0pfBAYQ8
It might be seen as somewhere between Alice Cooper and "Penn and Teller : Fool Us"
The idea of finding some mask is attractive. I know ways to generate data that can be used for a mask if you need some suggestions.
Just don't feel bad for trying. There are some who will want you to.
@Alan
Space-Time is the relationship of space, time and the speed of light. The actual maths of it I have not studied.
I do think you have a point to ponder the "assumptions" we hold about the nature of our reality in not only what we see but also in what we don't.
Hey, Version three here is still running with 2 to go. If it can find them then this is a mapping of many to the 83,049 40-bit words of MDF.
By this surjective mapping I hope there are enough choices of SCODES to select data that has qualities I can work with.
For sure I don't think there is an easy pattern in the native MDF. I believe that transformation is generally the way to go.
Also in keeping with my twisted review of strange videos this morning http://www.youtube.com/watch?v=Rtkdo7bOmJc
Enjoy!
This "Update" reports finding at least one of each of the 40-bit words of the MDF.
It means complete sets are possible to construct. I'll be running this into the tens of millions of matches.
The "Cnt" is the count of matches of that element found so far.
83,049 is the total 40-bit words that are the Million Digit File and is identified by "TU"
A total of 63,600 "inputs" were used.
It is assumed here that a complete "run" is a set of inputs the count of 1,048,576. 984,976 to go.
-------------------------------------------
T: 1299105 TU: 83048 Thread 1 matched 46965151098 of FPos 9247 Cnt 11
T: 1299106 TU: 83048 Thread 1 matched 2255103294 of FPos 50422 Cnt 23
T: 1299107 TU: 83048 Thread 6 matched 1016163663384 of FPos 37322 Cnt 9
T: 1299108 TU: 83048 Thread 4 matched 960454279373 of FPos 62237 Cnt 43
T: 1299109 TU: 83048 Thread 2 matched 821312821310 of FPos 47884 Cnt 14
T: 1299110 TU: 83049 Thread 1 matched 817508182085 of FPos 9953 Cnt 1 <-- Complete set Tuesday, May 14 2013 @17:00 hrs
T: 1299111 TU: 83049 Thread 5 matched 822974257824 of FPos 40050 Cnt 12
"
T: 1299112 TU: 83049 Thread 2 matched 666014465170 of FPos 27215 Cnt 20
T: 1299113 TU: 83049 Thread 0 matched 1013269890975 of FPos 2764 Cnt 11
T: 1299114 TU: 83049 Thread 5 matched 1019051599804 of FPos 69676 Cnt 16
T: 1299115 TU: 83049 Thread 3 matched 1075627242711 of FPos 13937 Cnt 16
T: 1299116 TU: 83049 Thread 6 matched 774046476528 of FPos 38423 Cnt 33
T: 1299117 TU: 83049 Thread 1 matched 281347678459 of FPos 57139 Cnt 11
T: 1299118 TU: 83049 Thread 4 matched 1048746262368 of FPos 30292 Cnt 16
T: 1299119 TU: 83049 Thread 3 matched 4039934805 of FPos 73933 Cnt 37
T: 1299120 TU: 83049 Thread 5 matched 442186383182 of FPos 72414 Cnt 10
T: 1299121 TU: 83049 Thread 2 matched 465097151251 of FPos 6156 Cnt 18
T: 1299122 TU: 83049 Thread 1 matched 245002864460 of FPos 52415 Cnt 27
msb:1011111001010111010010001110100001000101:lsb Factor 817508182085: 5 7 109 214287859
Now I will have a choice of data to call the MDF.
Also ahead is verifying the data and crafting a database.
It can be said I have reasonable confidence the data is correct and that multiple different sets are possible to represent MDF.
But, now I have nothing to report for a while and much study to accomplish.
So, my Challenge friends Good Luck. :)
Ernst
Leave A Reply
You can insert source code in your comment without fear of too much mangling by tagging it to use the iG:Syntax Hiliter plugin. A typical usage of this plugin will look like this:[c]
Note that tags are enclosed in square brackets, not angle brackets. Tags currently supported by this plugin are: as (ActionScript), asp, c, cpp, csharp, css, delphi, html, java, js, mysql, perl, python, ruby, smarty, sql, vb, vbnet, xml, code (Generic). If you post your comment and you aren't happy with the way it looks, I will do everything I can to edit it to your satisfaction.int main()
{
printf( "Hello, world!\n" );
return 1;
}
[/c]