Accessing raw video data in DirectShow
October 18, 2013
In DirectShow one makes multimedia apps by building graphs where nodes (called filters)
process the data (capture or read, convert, compress, write etc.) and graph's edges are streams
of multimedia samples (video frames, chunks of audio, etc.). All the work with raw data
takes place inside the filters, and the host application usually doesn't deal with video/audio
data directly, it only arranges the filters in a graph telling them what to do. This is fine
for some typical tasks when you've got all necessary filters, but sometimes there is no ready
made filter for your task and then you need to create your own filter which may be rather
tedious and complicated. However there are many cases when you only want to peek at the data,
analyse or save some part of it, you don't want a transform that changes media type, so making
your own filter is an overkill. There is a nice standard filter in DirectShow which can
give you access to raw video data (let's stick with just video for now) without creating
your own filters, it's called
Sample Grabber.
MSDN says it's deprecated, but this filter is still available in all versions of Windows
including Windows 8. Even if it goes away some time in the future, recreating it will be very
easy, so your program will not need to change.
In this little tutorial I'll show how to make a small DirectShow app in C++ which takes video
stream from USB camera, applies a simple video effect (accessing raw video data) and shows it
on screen. The idea is simple: make a graph where video stream flows from the camera through
sample grabber to a video renderer. Each time a frame passes the sample grabber it calls my
callback where I manipulate with raw video data before it's sent down the stream for
displaying.
With GraphEditPlus
making such application is a matter of minutes.
First I need to create my graph in
GraphEditPlus and generate code for it. I start by selecting "Video Capture Sources" category
in the filters window. There is only one capture source on my laptop - "USB2.0 Camera", so I
take it to the graph with a double click.
The camera can provide video in different formats and resolutions. DirectShow filter
representing the camera exposes IAMStreamConfig interface which allows listing available
output formats and selecting one of them, in which to provide the data.
I right click its output pin and select "IAMStreamConfig::SetFormat" to see the list of
media types and select one of them:
This selection will be reflected in the generated source code. My camera can produce uncompressed
YUY2 video and compressed MJPG, both in different resolutions. Also, media format can be
either FORMAT_VideoInfo or FORMAT_VideoInfo2, and it's important to use the first one,
otherwise Sample Grabber will not accept it. So I select YUY2 640x480 FORMAT_VideoInfo.
Then I need to add the Sample Grabber, so I just start typing its name in the filters
search box and after entering "sa" here it is. Double click, added to the graph.
Then I just need to connect it with the camera and then render video stream from its output pin
(right click on the pin, "Render"). Graph is ready, I run it and see the video from my camera,
here's a view from my office window:
I tell GraphEditPlus to generate source code and open Visual Studio 2010. Two minutes from the start
and I already have a ready DirectShow app. Well, almost. In the good old times we had VC6,
DirextX 9 SDK and life was simple. And in C# world life is still simple.
But in C++ now you need some Windows SDK to use DirectShow, and in
recent versions of this SDK Microsoft got more serious about deprecating some parts of it
and there is no Qedit.h file which describes Sample Grabber (although qedit.h should really be
about DirectShow Editing Services, the engine of Movie Maker). The last version of Windows SDK
containing qedit.h is version 6.0 (SDK for Vista and .NET 3) but even there it is useless
because it refers to "dxtrans.h" which is missing. There is even a pragma message in qedit.h
saying "To compile qedit.h you must install the DirectX 9 SDK, to obtain the dxtrans.h header."
And although it's not mentioned, not every version of DX9 SDK will help. But luckily you don't
really need them. If you do have SDK with qedit.h then you can comment out reference to dxtrans.h
there and include it this way in your source file:
#define __IDxtCompositor_INTERFACE_DEFINED__
#define __IDxtAlphaSetter_INTERFACE_DEFINED__
#define __IDxtJpeg_INTERFACE_DEFINED__
#define __IDxtKey_INTERFACE_DEFINED__
#include <Qedit.h>
And if you don't have qedit.h in your SDK then you can just use this file:
SampleGrabber.h. It's an excerpt from original
headers describing just the Sample Grabber and nothing else.
So, I create a Win32 console application, paste the source code generated by GraphEditPlus,
add references to Windows SDK Include directory, Lib directory and two lib files
(strmiids.lib and quartz.lib, parts of DirectShow). In the BuildGraph function I see
a media format description structure is rigorously populated to be passed to
IAMStreamConfig::SetFormat, but some fields like dwBitRate and AvgTimePerFrame are not really
necessary, so they can be skipped. What's really important is media major type, subtype
(MEDIASUBTYPE_YUY2 in my case) and resolution.
The video rendering part is presented in the generated code in all detail, but it's also
not necessary to create and connect those filters by hand, simple RenderStream with NULL
in last argument is enough to render video stream from Sample Grabber on screen. So the graph
building code after all adjustments looks like this:
#include "SampleGrabber.h"
// {C1F400A0-3F08-11D3-9F0B-006008039E37}
DEFINE_GUID(CLSID_SampleGrabber,
0xC1F400A0, 0x3F08, 0x11D3, 0x9F, 0x0B, 0x00, 0x60, 0x08, 0x03, 0x9E, 0x37); //qedit.dll
HRESULT BuildGraph(IGraphBuilder *pGraph)
{
HRESULT hr = S_OK;
//graph builder
CComPtr<ICaptureGraphBuilder2> pBuilder;
hr = pBuilder.CoCreateInstance(CLSID_CaptureGraphBuilder2);
CHECK_HR(hr, _T("Can't create Capture Graph Builder"));
hr = pBuilder->SetFiltergraph(pGraph);
CHECK_HR(hr, _T("Can't SetFiltergraph"));
//add USB2.0 Camera
/*CComPtr<IBaseFilter> pUSB20Camera = CreateFilterByName(L"USB2.0 Camera", CLSID_VideoCaptureSources);
hr = pGraph->AddFilter(pUSB20Camera, L"USB2.0 Camera");
CHECK_HR(hr, _T("Can't add USB2.0 Camera to graph"));*/
AM_MEDIA_TYPE pmt;
ZeroMemory(&pmt, sizeof(AM_MEDIA_TYPE));
pmt.majortype = MEDIATYPE_Video;
pmt.subtype = MEDIASUBTYPE_YUY2;
pmt.formattype = FORMAT_VideoInfo;
pmt.bFixedSizeSamples = TRUE;
pmt.cbFormat = 88;
pmt.lSampleSize = 614400;
pmt.bTemporalCompression = FALSE;
VIDEOINFOHEADER format;
ZeroMemory(&format, sizeof(VIDEOINFOHEADER));
format.bmiHeader.biSize = 40;
format.bmiHeader.biWidth = 640;
format.bmiHeader.biHeight = 480;
format.bmiHeader.biPlanes = 1;
format.bmiHeader.biBitCount = 16;
format.bmiHeader.biCompression = 844715353;
format.bmiHeader.biSizeImage = 614400;
pmt.pbFormat = (BYTE*)&format;
CComQIPtr<IAMStreamConfig, &IID_IAMStreamConfig> isc(GetPin(pUSB20Camera, L"Capture"));
hr = isc->SetFormat(&pmt);
CHECK_HR(hr, _T("Can't set format"));
//add SampleGrabber
CComPtr<IBaseFilter> pSampleGrabber;
hr = pSampleGrabber.CoCreateInstance(CLSID_SampleGrabber);
CHECK_HR(hr, _T("Can't create SampleGrabber"));
hr = pGraph->AddFilter(pSampleGrabber, L"SampleGrabber");
CHECK_HR(hr, _T("Can't add SampleGrabber to graph"));
CComQIPtr<ISampleGrabber, &IID_ISampleGrabber> pSampleGrabber_isg(pSampleGrabber);
//here we provide our callback:
hr = pSampleGrabber_isg->SetCallback(new CallbackObject(), 0);
CHECK_HR(hr, _T("Can't set callback"));
//connect USB2.0 Camera and SampleGrabber
hr = pBuilder->RenderStream(NULL, NULL, pUSB20Camera, NULL, pSampleGrabber);
CHECK_HR(hr, _T("Can't render stream to SampleGrabber"));
//render the video in a window
hr = pBuilder->RenderStream(NULL, NULL, pSampleGrabber, NULL, NULL);
CHECK_HR(hr, _T("Can't render stream from SampleGrabber"));
return S_OK;
}
The crucial part is where I call ISampleGrabber::SetCallback. The first argument is
the callback object implementing
ISampleGrabberCB
interface and the second, 0, says which
callback method to call. There are two methods in ISampleGrabberCB, SampleCB and BufferCB,
we're really interested in SampleCB as it's the one that gets called on each sample. I'm going
to leak a few bytes of memory by non deleting the callback object, but it's ok in my case as
it's only going to be created once.
So we need an implementation of ISampleGrabberCB to be given to Sample Grabber. Here is one,
very simple:
class CallbackObject : public ISampleGrabberCB {
public:
CallbackObject() {};
STDMETHODIMP QueryInterface(REFIID riid, void **ppv)
{
if (NULL == ppv) return E_POINTER;
if (riid == __uuidof(IUnknown)) {
*ppv = static_cast<IUnknown*>(this);
return S_OK;
}
if (riid == __uuidof(ISampleGrabberCB)) {
*ppv = static_cast<ISampleGrabberCB*>(this);
return S_OK;
}
return E_NOINTERFACE;
}
STDMETHODIMP_(ULONG) AddRef() { return S_OK; }
STDMETHODIMP_(ULONG) Release() { return S_OK; }
//ISampleGrabberCB
STDMETHODIMP SampleCB(double SampleTime, IMediaSample *pSample);
STDMETHODIMP BufferCB(double SampleTime, BYTE *pBuffer, long BufferLen) { return S_OK; }
};
STDMETHODIMP CallbackObject::SampleCB(double SampleTime, IMediaSample *pSample)
{
if (!pSample)
return E_POINTER;
long sz = pSample->GetActualDataLength();
BYTE *pBuf = NULL;
pSample->GetPointer(&pBuf);
if (sz <= 0 || pBuf==NULL) return E_UNEXPECTED;
for(int i=0;i<sz;i+=2)
pBuf[i] = 255 - pBuf[i];
pSample->Release();
return S_OK;
}
It's a COM object, hence the obligatory QueryInterface, AddRef and Release, the latter two
implemented lousy, completely ignoring the COM reference counting thingy. Then go two
ISampleGrabberCB methods, one doing nothing, as it will never be called, and the other
is the most important one, this is where the magic happens. Each time a video frame is produced
by the camera it is passed to sample grabber and it calls this method providing an IMediaSample
pointer through which we can request amount of data in this sample and a pointer to the data
itself. The data is mutable, so if we don't change video resolution and format, we can just mutate
contents of this buffer and this is what the next filter down the chain will receive. In this
case I want to invert each pixel's intensity (luma) while keeping color (chroma) intact.
Since the video comes in YUY2 format this is pretty easy: every other byte in the buffer is
some pixel's luma, so I just subtract it from 255. And this is it, no more code is required
(the rest, i.e. main loop as well as filter creation and pin search routines are all generated by
GraphEditPlus). Here's what I see after I compile and run the program:
Instead of showing video on screen you can direct the stream to some muxer and file writer
to record it to disk. Or just use Null Renderer if you don't need the data after it leaves
Sample Grabber. For example, your callback can just save each frame to a bitmap file or send over
network. Anyway something needs to be connected to Sample Grabber's output pin,
so use the Null Renderer if you don't have anything meaningful to connect.
That's all, next time I'll show how to do the same in C#.
tags: directshow
|