Hey Mark should we make all the file in the MDF DU cycle available for the challenge? It’s a zero cost transform.

]]>I worked a bit with the what, 5 million files, in the MDF cycle but not a lot. There could be a winner in that cycle and dynamic unary is a zero cost transform. ]]>

“The overhead of using fixed-length arithmetic occurs

because remainders are truncated on division.

It can be assessed by comparing the algorithmâ€™s performance

with the figure obtained from a theoretical

entropy calculation that derives its frequencies from

counts scaled exactly as for coding. It is completely

negligible-on the order of 10^-4 bits/symbol.”

https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf

I don’t think it is completely negligible as stated when attempting to compress high entropy data such as the MDF. That pretty much describes the reason I have not seen any compression taking place. I am speculating that the overhead from the AC encoder I am using is actually more like 2 X 10^-5 bits/symbol. The theoretical entropy calculation of my model should save 68 bits, but it compresses one bit at a time using 1-bit symbols. It follows that the overhead for the AC when compressing the MDF is 415241 X 8 X 2 X 10^-5 = 66.43856 bits. And I’m not using any bits for a termination message at this point which results in a 1:1 compression ratio. Hmmm… Will using BigInteger types provide more precision yielding less overhead and allow compression to actually occur? Or should I look for another model in a larger space with a theoretical entropy that will provide more room for overhead from the AC? Going to a larger space means handling counts larger than 2^64, in which case I would need to implement a BigInteger AC encoder anyway. I think I have my answer on what to try next.

-Brian

]]>I’ve been using Prof. Eric Bodden’s AC C# implementation (http://www.bodden.de/2010/08/13/ac-in-csharp/), which uses 32-bit integers and I tried modifying it to use unsigned 64-bit integer arithmetic. I did see some improvement after making the modifications. I’m a .NET C# developer by day now and have lost most of my Unix C++ gcc compiler and emacs/vi skills since college over the years. So I’ll try porting your implementation to C# first. If that doesn’t work, I have an XBuntu machine currently running a MythTv backend server that I can also use for development. And if that doesn’t work then I’ll try converting the AC to use the BigInteger type in the new .NET System.Numerics namespace. I’m assuming you won’t count the use of the .NET 4 Framework DLL’s as part of my program size for challenge #1? :)

Thanks,

-Brian

If you use my arithmetic coder and 64 bit ints, the loss from the encoder is pretty darned small.

http://marknelson.us/2014/10/19/data-compression-with-arithmetic-coding/

- Mark

]]>Haven’t been working on this but an idea to work with abstract cycles keeps coming to mind.

]]>