|
SageTV Customizations This forums is for discussing and sharing user-created modifications for the SageTV application created by using the SageTV Studio or through the use of external plugins. Use this forum to discuss customizations for SageTV version 6 and earlier, or for the SageTV3 UI. |
|
Thread Tools | Search this Thread | Display Modes |
#661
|
||||
|
||||
Quote:
Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
|
#662
|
|||
|
|||
Quote:
|
#663
|
|||
|
|||
I just got SageTV recently.... and I've been playing around with Commskip on my machine.. trying to figure out what its getting wrong, so eventually i downloaded your code to get an idea about what was going on behind the scenes.....
I won't say I have a perfect understanding of how everything works, but reading your algorithms got me thinking about the problem and I had a couple ideas I thought I might share.. and might even implement if I get the time, but they might be very relevant to getting the program to run faster and more accurately... so anyway.. here it goes. First, with regards to logo detection, I might propose doing something a little more formal and less non linear with your edge detection. At the moment you are basically doing a derivative in 3 different directions with a mask and then thresholding that derivative counting when it should be high and when it should not be high and then thresholding that.... instead.. might i propose that you convolve the logo region with a Laplacian of a gaussian.. (x^2+y^2-2*sigma^2)/(2pi)(sigma^2)^(5/2). This should give you a more general edge enchancement. You might have to enlarge the logo region a little bit so as to avoid edge effects. You can take an average over some frames you detect as having the logo with some simple strict method to get a logo template, then do the same filter on that. Then you can take the integral of the cross correlation of the filtered sample frame, and the filtered logo template and get out a general score of how likely it is that there is a logo. You might have to worry about normalization in there.... I need to think about that more... but maybe the minimum of that function with respect to normalizations is a decent measure anyway... or maybe it isn't a problem in that the derivatives are generally all of the same magnitude across the movie. Anyway.. the main point is to get out a less non linear score of the likelyness of the logo being there.....which bring me to my second idea and the more general one. Rather than doing this on every frame, i would propose skipping several hundred frames, then doing 5 frames, then skipping hundreds more.... sort of a first pass. Then rather than doing this thing where you build and merge, and relabel blocks, I think maybe this problem could be naturally approached with a hidden markov model..... the first pass having only two states... show and commercial. If you are in the show state you have a probability distribution that is highly weighted towards outputting high logo detection scores and a low probability of transitioning to the commercial state. The commercial state has a probability distribution that is weighted towards low logo scores and a higher probability of transitioning back to the show state. Now you can figure out which sequence of commericial/show states results in the highest overall probability. There are algorithms for how to calculate this sequence in an iterative computational efficent manner... I need to look up the details of them again.. but I know they exist. This framework would also naturally incorporate other data such as your scene change rate and closed caption information. My hope would be that you could identify the important transistions in this way while reducing yoru computational time signficantly. With this in hand you could then focus down on analyzing the exact timing of the transistions. I am tempted to just extend the markov model to 4 states to do this... show (with logo), show (no logo), blackness, commercial as you might imagine the transitions would look like show>show(no logo)>blackness<>commercial with the probabilities set so the relative general time scales would be right. The outputs would become 2 dimensional in this model by including the average brightness (nevermind the maximum, minimum, etc calculation.. we can keep the analog nature of that signal and use it to our advantage i think). Now the show ouputs high logo scores, high brightness, show (no logo) outputs low log scores, but high brightness, blackness ouputs low logo scores, low brightness, commercial low logo scores, high brightness. Again we use the markov modelling to help us find the transitions. An advantage to this whole framework is that you could write something that would allow people to mark commercials by hand and after doing it on a few you would have some data in which to build a custom model based on that channel, or that show... all the probability distributions and transistion probabilities could be learned rather than having to tweak them via the "scientific method" (that was not meant as a jab at Jeri's earlier post with that comment but as an appreciation of what a pain it is to fiddle in parameter space with any program that takes more than 15 seconds to run as I have done the last few days). I think in the end the only non learnt parameter might be sigma in the laplacian. Of course in the end you might need to do some of the tossing out that you did in the past.... For my part... I'm going to try to implement the Laplacian filter thing then set it up to output a big file with the logo score and a brightness score then work on applying a markov model to that... i might try to do that in matlab first as its sort of frustrating not to be able to easily visualize what your model is doing..... in that vein? what do you use to debug things in terms of seeing what frame is what? I got ffdshow to display a framenumber when i'm playing back a show.. but i'm not sure how accurate it is, and since I think mpeg2 is a variable bit rate encoding, I don't know how to skip forward in the buffer to just use the out of the box mpeg2dec to skip to a certain frame, and doing division to convert from time to frame is tedious and inexact...what are you guys using? Anyway... i know that's a lot, and maybe some of it sounds like gibberish, and maybe non of it works, but having spent a day thinking about it when I should have been working on real work, i needed to write it down. If you were wondering why I would possibly think about all this... this last semester I having being TAing a class on computational neurobiology and computing networks, and we covered hidden markov models as they the algorithms used in the best speech regonition software these days.... and speech is somewhat similar to this problem in that you have segments (like phonems) that have characteristics (in this logo/no logo, blackness, in speech it is spectral characteristics), sharp transitions, but undefined and variable time scales. If either of the developers would like to chat with me about these ideas, catch me on AIM at revez vert. forrest |
#664
|
||||
|
||||
Cayers,
It does a MUCH better job at accurately isolating blocks. So a 30 second commercial is actually a 30 second block. The current comskip already does an excellent job detecting commercials (for me), but I expect that CommDetect will do a better job. fcollman, Your post was obviously well thought out and insightful and I'm appreciative. But, um, huh? I didn't quite catch that. Seriously though, there are some interesting ideas (that I'll leave for later). But for now, some quick comments: 1) ComSkip doesn't check every frame for a logo. It checks once a show second (every 25th frame or 30th frame depending in fps of the show). Since I use the logo as a determination that a block is a show and not to determine boundries, this seemed sufficent. 2) One of the frustrations that I had with comskip was that I couldn't "see" what it was detecting. That was the source of the SaveLogo code. I wanted to see the detected logo. It is also one of the reasons that in CommDetect, I coded in a preview, so I could see "why" it chose the logo or some other artifact. 3) Another frustration I had with comskip is that it is not easy (or possible, I think) to seek to another position. That would have made logo isolation much quicker in comskip, but it wasn't possible (at my programming skill). 4) I use virtualdubmod to see each frame and then match it with the frame number in the log. It's not graceful, but it works. 5) And keeping track of which frame is which is easy enough. Each time the detection code gets another frame it adds one to the frame number. That's not really useful for skipping, but in linear processing it works well. Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
|
#665
|
|||
|
|||
in looking at this problem i did stumble on libmpeg3 which property to have a frame seek function. though it has to build a table of contents first. I downloaded the code. tried to compile it. discovered it wanted GCC and NASM version 0.98 in order to compile and didn't feel like trying to work that out at the time. but maybe it's worth it.
In developing this you would love to have an interface that resembled a defrag bar with different color bars for different blocks and the ability to watch the block change as you processed. and even the ability to click on a block and have it play the file at that block.i could imagine writing such a thing in matlab if i just had a routine in C to just play frames i-j of file x. i could do it in C if i Knew anything about plotting /graphing using Visual studio. But everything ive looked at looks so tedius i get discouraged. Then again I haven't had studio that long so maybe in missing something. There should just be a module that i give it some common data structure and it plots it however i tell it to.Things ire seen want me to construct this database or something. how about an array? |
#666
|
|||
|
|||
to continue... i was going to write a short reply on my tablet... but you can only do so much with handwriting recognition before you get frustrated by the lack of speed and voice does a terrible job with technical discussions.....
anyway... if we could figure out how to skip forward in frames in the stream I think we would save a lot of time......demuxing and decoding every frame.... which i'm pretty sure the code I was looking at does even if you only analyze ever second or so is going to be terribly expensive... maybe one could figure it out if you actualy understood mpeg compression on a really technical level... i for one don't.... i just found the point in the code where it displays a frame, and figured at that point the buffer contains the frame... then you all have seemed to figure out how to iterate through the interlacing, though I don't understand that either at this point. I guess the other thing I should have put in my post was that the whole reason I didn't just download this thing and forget about it is that it isn't working for me that well..... granted I've got just about the worst cable signal ever so I'm not exactly being kind to your algorithm, and the fact that it comes close to working at all made me very impressed. So i tried to fiddle and read the logs and sort all this out and started pulling out my hair. It doesn't do a very good job of getting the precise timing down (it leaves off sometimes a few seconds of show, and has shown me false positives and false negatives depending on which settings i've used) and from what investigation I've done it doesn't seem to be a sagetv thing in that the seek bar is off as I read that was a consideration. oh and finally.... i noticed you have commericial_skip.cpp posted up there along with the comskip code.... i assume this is the new version you are working in, however, it refers to header file that aren't included, and also to qt header files. I can't figure out how to get qt for visual studio without paying thousands of dollars...though it looks like you've got it working for you alright. That code looks very well written from my browsing, and I'd love to be fiddling with it rather than the C... any thoughts on getting it working? Last edited by fcollman; 01-05-2005 at 01:53 PM. |
#667
|
||||
|
||||
Quote:
Quote:
Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
|
#668
|
||||
|
||||
Quote:
Quote:
Quote:
Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
|
#669
|
|||
|
|||
for what its worth... i outputed just the average brightness of every frame of last sunday's Criminal Intent, read it into matlab and plotted it..... The commercial's jump out to the eye, second picture is a blow up of the second commercial break... one incidently your alorithm misidentified as having a segment of show in the middle...... it stopped the commercial just short of 25000.. i presume corresponding to that sharp dip in brightness you see, and indeeed this does mark the end of the commercial... the real show fade in and outs are much longer than the commercial cuts.. i don't know if this is a feature that extends beyond NBC dramas or not though. It cut back in to commercial at the drop at 2.8. I don't know why it even detected those black frames, as i tried set the min black frames for break to 5, and that's only a dip of one...
anyway.. i thought maybe you'd find the visualization informative.... i'm gonna mess around with looking at the logo score and plotting it with this. |
#670
|
||||
|
||||
Is Criminal Intent a rather dark show? Try the same experiment with a variety of other shows. Your graph is certainly interesting, but is it a universal trend?
It seems to show that commercials have a higher overall brightness or max brightness. This hasn't been my experience. But, I've been known to be wrong. Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
Last edited by Jere_Jones; 01-05-2005 at 04:12 PM. |
#671
|
|||
|
|||
Have you looked at an overlay of audio and video? The commercial transitions would certainly have an audio break where the show transitions may not. Just another thought. Don't know if audio would be faster than video and you might use it for your first pass. Then follow up only doing logo detection and other video alorithms on the "segments of interest" generated by the audio.
|
#672
|
|||
|
|||
ok.... so after getting my hands into some of the data.. i discovered that my complicated idea was really not neccesary. I went into comskip and got it to spit out two sets of data... a) the brightness of every frame b) the number of "good" edges it found when looking for a logo (i.e. there was an edge where there should be an edge). Looking at that data, i was amazed at some of the mistakes the comskip algorithm was making because it seemed like the signal was so strong. Looking at the graphs reminded me of another problem I had worked on involving bistable systems.... and trying to figure out what state the system was in, what worked there, and what works really well here... is a double thresholding kind of system. Take a look at one of these graphs as I explain this to help you out..... the dark blue curve is the good edges signal.... say we start in the show state, I say the system stays in the show state till the signal falls below the lower threshold, and when it does the system switches to the commercial state. However, in order to get out of the commercial state, it has to cross the upper threshold, and once it does it stays in the show state again. You can calculate these thresholds dynamically by looking athe probability distribution of the signal over the whole show... i set the lower one to 25% and the upper to 60%. You can see how it nicely adjusts for the different signals keeping the same percentages.
What you are left with is almost perfect, except for the few short little flips that happen from time to time. Well.. we know short flips are real... so what i do is set a minimum show and commercial interval, then starting with the smallest interval i remove and combine intervals until all the intervals are bigger than my thresholds. That is my green curve that i have highlighted on the right hand sign of withoutatrace.jpg. What's left is to get the exact timing of fade in's and out's... since the logo doesn't appear until some period into most shows... varies from network to network and show to show. The logo also sometimes disappears just slightly before the end of a show period. I basically just used the transistions from the green curve and looked in a window around them for the most conservative estimate of where the black frame that represents the fade-in to the show was. I evaluated black frames by doing the same sort of adaptive threshold I used for the good edges curve. This at 2%. As you can see.. this agrees very well with the comskip algorithm sometimes, but doesn't make some of the egregious mistakes it seems prone to do for me. Of course if it can't find a logo, it is screwed... but anyway. I wrote this whole thing importing data into matlab. I'd be willing to convert the whole thing to C, but having the ability to instantly graph and interact with the data makes developing an algorithm so much easier in matlab... plus all the nifty built in functions like diff, cumsum, find, hist, etc... anyway.. i have to run, but I thought I'd present my results. I'd be happy to share the matlab code with whoever would like to see it.. though it could be cleaned up and made more modular etc.... |
#673
|
|||
|
|||
couple more pictures... the pretender was a particularly dark show.. and you can see the algorithm had a tough time finding exactly the right transistions cause it was so dark....
couple other thoughts... i certaintly need to do this on more shows... but my intuition says that this would work with less data, and certaintly less brightness data as i don't even access most of it.... my matlab script is virtually instantaneous, so its just a matter of extracting all the data from the show, and that shouldn't take more time than mpeg2dec does to just process the whole video, not even displaying it...... if we can figure out how to skip frames that would be even faster.... and would address the number one complaint people seem to have with it.. that it chews up their processor. |
#674
|
|||
|
|||
what decoder are you guys using? I usually use Sonic but I don't think it has CC support - will this essentially render the CS useless?
thx |
#675
|
|||
|
|||
CC support is in the encoder/card not in the decoder. So as long as you have a card that supports it (not all of them do, for instance the PVR USB2 does NOT), then you are fine regardless which decoder you use.
Although you may have to change a registry setting to turn CC on. That is covered elsewhere in these forums and in this thread in particular. And besides, COMSKIP works fine/great without CC data. It just works BETTER with CC info to use. Jason |
#676
|
|||
|
|||
Auto Delete of .log .txt
Sorry if this has been covered already, but going thru 34 pages can be tiresome
Do the log and text files comskip generates get automatically deleted? I have noticed that when I delete the mpg file the text and log files stay behind. I have been manually deleting them, and am wondering if there's a clean up procedure or auto-delete. Thanks! |
#677
|
|||
|
|||
No they don't.
Some of the STVs (like Cayar's) have an option on a menu to 'delete orphaned files' that will do it. Others run a script once a day to delete them (the script can be found somewhere in this thread). Jason Quote:
|
#678
|
||||
|
||||
I run this batch file once a day on my media server under Win XP Pro:
for /f "tokens=1 delims=." %%i in ('dir /s/b *-0.txt') do if NOT EXIST %%i.mpg (del %%i.txt) I schedule it once a day to clean up the junk. It's the same as what's posted somewhere in this thread, but the author showed me how to put more "%"'s in it to make in run in a DOS window in Win XP Pro. I run the VBS script that processes all my new recordings with COMSKIP 4 times a day, noon and midnight, 6AM and 6PM. So usually when I sit down to watch TV, most everything is already processed. And it's processed by my media sever, NOT by my SageTV machine. I'm happy to report it's all working prefectly. I've been scared to upgrade Sage or the Malone STV I'm running because I'm in a good state right now. I'm sure eventually I'll have to for other reasons, but until that day... Hope this perspective from a user and not a programmer is helpful. Andy |
#679
|
||||
|
||||
VideoRedo integration
I have been very happy with using comskip, especially for its ability to capture most commercial breaks so that I do not have to mark each one by hand when I am sweeping a file with VideoRedo.
My one challenge is in the code of comskip.c for generation of the leading line of the vprj file. In comskip.c of the jeredev branch, there is logic to fully qualify the file name if the input file name is simply in the local directory. This absolute filename makes it awkward to move the files around after running comskip.exe. VideoRedo handles local directory names quite well (as long as the vprj file is in the same directory as the mpeg), and it would be very nice if local file reference was an option for vprj file generation. I do not know how to make this request directly to the developer. -festus |
#680
|
||||
|
||||
Quote:
Quote:
Jere
__________________
Death to commercials!!! Latest ShowAnalyzer Beta version: 0.9.7
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
|
|