Where is CMStream.mp4 code

Jan 7, 2010 at 5:22 AM

There is a bug with Mp4TrunBox parsing.  It is reading Uint values, but Sample Compostion Time Offset needs to be an integer since it can be negative for B-Frames.  I can see the problem in .net Reflector, and would be happy to fix it if I have access to the CMStream.mp4 source.

Developer
Jan 7, 2010 at 5:37 AM

Hi,

The CMStream.Mp4 library is not ready for an open source release. Regarding the trun parsing, the Sample Composition Time Offset is an unsigned int (32 bits) based on the specification ISO/IEC-14496 Part 12. The notion of B-Frames is codec specific and the number must be interpreted as a signed int. In the .NET case, simply do an Int32 type cast. I used this for a deep understanding of Smooth Streaming.

Regards.

 

Jan 7, 2010 at 2:38 PM

Just checked the spec and you are correct.  I guess that the fix should be in MP4Explorer based on the stream type.

Jan 11, 2010 at 9:41 PM

Upon further investigation, I think that a negative value is not correct.  The negative Sample Composition Time Offset is produced by Expression3 encoder when the profile is ismv.  Other encoders that produce fragmented mp4 do not use negative values.  Using a negative value for Sample Composition Time Offset, you get a dts > pts, which makes no sense.  The problem seems to be the Expression Encoder.  However, I can not find enough information in the spec to say for sure.  I know that you have been investigating MSFT adaptive streaming.  Have you thought about this at all?

Developer
Jan 11, 2010 at 10:11 PM

Actually, my thought is that the issue is related to the lack of specification. The problem is that the Silverlight decoder needs to receive the samples in decoding order. For simplification, I guess, Microsoft opted for put the samples in decoding order inside the mp4 container file instead of doing any reorganization in the MediaStreamSource. Under this conditions, you need negative values for the sample composition time offset for B-frames samples.

The silverlight decoder request the samples ahead of time and based on the stream parameters, he decides whether display, buffer or drop the frame.

Here I'm talking about Smooth Streaming. Hope this helps.

Jan 12, 2010 at 5:27 AM
Edited Jan 12, 2010 at 3:02 PM

The spec is definitely weak on details.  I am working on understanding fragmented MP4 and smooth streaming, and am writing a fragmented mp4 parser and writer.  I understand that MSFT needs to compute the pts.  However, using negative values is wrong because it results in a dts that is later than the pts.  In addition, storing a signed value in a field that is specified as unsigned is something I have never seen in any video spec, and is very unorthodox.  The Flash encoder uses positive offsets, and the numbers make sense, and the B-frame pts == dts which is normal for video systems.  MSFT seems to have gotten this one wrong 

Jan 17, 2010 at 9:53 PM

I wanted to follow-up on your post.  All containers that I use store video in decode order.  The use of the negative offset is wrong for two reasons: 1) the spec defines the field as unsigned, and 2) the result is frames with decode time occurring after presentation time, which is not possible.  Negative presentation offset is not necessary to handle out-of-order decoding.  If you normalize the values to 0 (add offset to get the negative offsets positive), everything works fine.  This is what I am doing in my parser, since I cannot deliver samples downstream with decode time that occurs two frames after presentation .  Also, I have noted that MSFT's adaptive streaming files are not even compliant with those parts of the MP4 spec that are well defined.