Commercial Advance Beta Release - Page 34

Jere_Jones · #**661** 01-03-2005, 02:06 PM

Quote:

Originally Posted by foolio

does the frame size matter for comskip? i notice i get better results recording off the pvr-250 with the expected size of 720x480. It isn't very good with my DVB-S recordings of 544x480. is this because of problems with logo detection?

It shouldn't make a difference. Are your DVB-S recordings properly calibrated? Comskip is pretty sensitive to contrast. Logo detection should be the cause of your problems, but you're welcome to send me a verbose=10 log and I'll take a look.

Jere

Cayars · #**662** 01-03-2005, 02:19 PM

Quote:

Originally Posted by Jere_Jones

<snip>
Progress report on CommDetect

CommDetect is working on all types of files (mpg, divx, avi, etc) that I have throw at it.

CommDetect can be told which decoder to use. (make a huge difference in quality of detection and speed)

CommDetect is utilizing video and audio.

CommDetect is not utilizing captions yet.

CommDetect is REALLY slow! (ComSkip runs at about 180fps on my development machine vs. CommDetect's 85fps)

CommDetect can show what it is detecting.

Logo detection takes about 15 seconds.

The detected logo can be overlayed onto the preview so you can SEE what was isolated. Different color's show if individual pixels match the expected logo or not.

Jere

How does it compare to ComSkip in it's accuracy so far?

fcollman · #**663** 01-05-2005, 08:54 AM

I just got SageTV recently.... and I've been playing around with Commskip on my machine.. trying to figure out what its getting wrong, so eventually i downloaded your code to get an idea about what was going on behind the scenes.....

I won't say I have a perfect understanding of how everything works, but reading your algorithms got me thinking about the problem and I had a couple ideas I thought I might share.. and might even implement if I get the time, but they might be very relevant to getting the program to run faster and more accurately... so anyway.. here it goes.

First, with regards to logo detection, I might propose doing something a little more formal and less non linear with your edge detection. At the moment you are basically doing a derivative in 3 different directions with a mask and then thresholding that derivative counting when it should be high and when it should not be high and then thresholding that.... instead.. might i propose that you convolve the logo region with a Laplacian of a gaussian..
(x^2+y^2-2*sigma^2)/(2pi)(sigma^2)^(5/2). This should give you a more general edge enchancement. You might have to enlarge the logo region a little bit so as to avoid edge effects. You can take an average over some frames you detect as having the logo with some simple strict method to get a logo template, then do the same filter on that. Then you can take the integral of the cross correlation of the filtered sample frame, and the filtered logo template and get out a general score of how likely it is that there is a logo. You might have to worry about normalization in there.... I need to think about that more... but maybe the minimum of that function with respect to normalizations is a decent measure anyway... or maybe it isn't a problem in that the derivatives are generally all of the same magnitude across the movie. Anyway.. the main point is to get out a less non linear score of the likelyness of the logo being there.....which bring me to my second idea and the more general one.

Rather than doing this on every frame, i would propose skipping several hundred frames, then doing 5 frames, then skipping hundreds more.... sort of a first pass. Then rather than doing this thing where you build and merge, and relabel blocks, I think maybe this problem could be naturally approached with a hidden markov model..... the first pass having only two states... show and commercial. If you are in the show state you have a probability distribution that is highly weighted towards outputting high logo detection scores and a low probability of transitioning to the commercial state. The commercial state has a probability distribution that is weighted towards low logo scores and a higher probability of transitioning back to the show state. Now you can figure out which sequence of commericial/show states results in the highest overall probability. There are algorithms for how to calculate this sequence in an iterative computational efficent manner... I need to look up the details of them again.. but I know they exist. This framework would also naturally incorporate other data such as your scene change rate and closed caption information. My hope would be that you could identify the important transistions in this way while reducing yoru computational time signficantly. With this in hand you could then focus down on analyzing the exact timing of the transistions.

I am tempted to just extend the markov model to 4 states to do this...

show (with logo), show (no logo), blackness, commercial

as you might imagine the transitions would look like show>show(no logo)>blackness<>commercial with the probabilities set so the relative general time scales would be right. The outputs would become 2 dimensional in this model by including the average brightness (nevermind the maximum, minimum, etc calculation.. we can keep the analog nature of that signal and use it to our advantage i think). Now the show ouputs high logo scores, high brightness, show (no logo) outputs low log scores, but high brightness, blackness ouputs low logo scores, low brightness, commercial low logo scores, high brightness. Again we use the markov modelling to help us find the transitions.

An advantage to this whole framework is that you could write something that would allow people to mark commercials by hand and after doing it on a few you would have some data in which to build a custom model based on that channel, or that show... all the probability distributions and transistion probabilities could be learned rather than having to tweak them via the "scientific method" (that was not meant as a jab at Jeri's earlier post with that comment but as an appreciation of what a pain it is to fiddle in parameter space with any program that takes more than 15 seconds to run as I have done the last few days). I think in the end the only non learnt parameter might be sigma in the laplacian. Of course in the end you might need to do some of the tossing out that you did in the past....

For my part... I'm going to try to implement the Laplacian filter thing then set it up to output a big file with the logo score and a brightness score then work on applying a markov model to that... i might try to do that in matlab first as its sort of frustrating not to be able to easily visualize what your model is doing..... in that vein? what do you use to debug things in terms of seeing what frame is what? I got ffdshow to display a framenumber when i'm playing back a show.. but i'm not sure how accurate it is, and since I think mpeg2 is a variable bit rate encoding, I don't know how to skip forward in the buffer to just use the out of the box mpeg2dec to skip to a certain frame, and doing division to convert from time to frame is tedious and inexact...what are you guys using?

Anyway... i know that's a lot, and maybe some of it sounds like gibberish, and maybe non of it works, but having spent a day thinking about it when I should have been working on real work, i needed to write it down. If you were wondering why I would possibly think about all this... this last semester I having being TAing a class on computational neurobiology and computing networks, and we covered hidden markov models as they the algorithms used in the best speech regonition software these days.... and speech is somewhat similar to this problem in that you have segments (like phonems) that have characteristics (in this logo/no logo, blackness, in speech it is spectral characteristics), sharp transitions, but undefined and variable time scales.

If either of the developers would like to chat with me about these ideas, catch me on AIM at revez vert.

forrest

Jere_Jones · #**664** 01-05-2005, 10:01 AM

Cayers,

It does a MUCH better job at accurately isolating blocks. So a 30 second commercial is actually a 30 second block. The current comskip already does an excellent job detecting commercials (for me), but I expect that CommDetect will do a better job.

fcollman,

Your post was obviously well thought out and insightful and I'm appreciative. But, um, huh? I didn't quite catch that.

Seriously though, there are some interesting ideas (that I'll leave for later). But for now, some quick comments:
1) ComSkip doesn't check every frame for a logo. It checks once a show second (every 25th frame or 30th frame depending in fps of the show). Since I use the logo as a determination that a block is a show and not to determine boundries, this seemed sufficent.
2) One of the frustrations that I had with comskip was that I couldn't "see" what it was detecting. That was the source of the SaveLogo code. I wanted to see the detected logo. It is also one of the reasons that in CommDetect, I coded in a preview, so I could see "why" it chose the logo or some other artifact.
3) Another frustration I had with comskip is that it is not easy (or possible, I think) to seek to another position. That would have made logo isolation much quicker in comskip, but it wasn't possible (at my programming skill).
4) I use virtualdubmod to see each frame and then match it with the frame number in the log. It's not graceful, but it works.
5) And keeping track of which frame is which is easy enough. Each time the detection code gets another frame it adds one to the frame number.

That's not really useful for skipping, but in linear processing it works well.

Jere

fcollman · #**665** 01-05-2005, 01:26 PM

in looking at this problem i did stumble on libmpeg3 which property to have a frame seek function. though it has to build a table of contents first. I downloaded the code. tried to compile it. discovered it wanted GCC and NASM version 0.98 in order to compile and didn't feel like trying to work that out at the time. but maybe it's worth it.

In developing this you would love to have an interface that resembled a defrag bar with different color bars for different blocks and the ability to watch the block change as you processed. and even the ability to click on a block and have it play the file at that block.i could imagine writing such a thing in matlab if i just had a routine in C to just play frames i-j of file x. i could do it in C if i Knew anything about plotting /graphing using Visual studio. But everything ive looked at looks so tedius i get discouraged. Then again I haven't had studio that long so maybe in missing something. There should just be a module that i give it some common data structure and it plots it however i tell it to.Things ire seen want me to construct this database or something. how about an array?

fcollman · #**666** 01-05-2005, 01:46 PM

to continue... i was going to write a short reply on my tablet... but you can only do so much with handwriting recognition before you get frustrated by the lack of speed and voice does a terrible job with technical discussions.....

anyway... if we could figure out how to skip forward in frames in the stream I think we would save a lot of time......demuxing and decoding every frame.... which i'm pretty sure the code I was looking at does even if you only analyze ever second or so is going to be terribly expensive... maybe one could figure it out if you actualy understood mpeg compression on a really technical level... i for one don't.... i just found the point in the code where it displays a frame, and figured at that point the buffer contains the frame... then you all have seemed to figure out how to iterate through the interlacing, though I don't understand that either at this point.

I guess the other thing I should have put in my post was that the whole reason I didn't just download this thing and forget about it is that it isn't working for me that well..... granted I've got just about the worst cable signal ever so I'm not exactly being kind to your algorithm, and the fact that it comes close to working at all made me very impressed. So i tried to fiddle and read the logs and sort all this out and started pulling out my hair. It doesn't do a very good job of getting the precise timing down (it leaves off sometimes a few seconds of show, and has shown me false positives and false negatives depending on which settings i've used) and from what investigation I've done it doesn't seem to be a sagetv thing in that the seek bar is off as I read that was a consideration.

oh and finally.... i noticed you have commericial_skip.cpp posted up there along with the comskip code.... i assume this is the new version you are working in, however, it refers to header file that aren't included, and also to qt header files. I can't figure out how to get qt for visual studio without paying thousands of dollars...though it looks like you've got it working for you alright. That code looks very well written from my browsing, and I'd love to be fiddling with it rather than the C... any thoughts on getting it working?

Jere_Jones · #**667** 01-05-2005, 02:16 PM

Quote: