I have mentioned that I started to write
some Mpeg-1 decoding routines in C#. By now, I have an almost complete
decoder written in C# from scratch (i.e. I just followed the spec. or
the Standard document).
Since now I can play video segments with this
code, I started measuring its performance. I knew it wasn't good but I
hadn't done any profiling yet. I added some timing instructions and, my
initial results, were something like: more than 11 secs for my decoder
vs 0.09s for FFMpeg; or around 120 times slower. After further
investigations, I found that more than 95% of the decoder execution
time was spent in my "naive" implementation of the IDCT. So I've
changed it to a simple but faster one (the Chen-Wang algorithm) and now
the running time is within five times of FFMpeg.
I'm sure that there are plenty more opportunities for optimization
in the original code, so my XMas Holidays asssignment will be to
continue improving its speed. These observations certainly explain the
considerable efforts done in improving the performance of this critical
piece of video processing: the DCT/IDCT.