Capturing the Screen

Introduction

Some times we want to capture the contents of the entire screen programmatically. The following explains how it can be done. Typically the immediate options we have, among others, are Using GDI and/or DirectX. Another option that is worth considering is Windows Media API. Here we would consider each of them and see how they can be used for our purpose. In each of these approaches, once we get the screen shot into our application defined memory or bitmap we can use it in generating a movie. Refer to the article Create Movie From HBitmap for more details about creating movies from bitmap sequences or to the article Simulation Recording for details on a simulation recording library for windows.

Capture it the GDI way

When performance is not an issue and when all that we want is just a snap shot of the desktop, we can consider the GDI option. This mechanism is based on the simple principle that the desktop is also a window, i.e. it has a Window Handle (HWND) and a device context (DC). If we can get the device context of the desktop to be captured, we can just blit those contents to our application defined device context in the normal way. And getting the device context of the desktop is pretty straight forward if we know its window handle, which can be obtained using the function GetDeksopWindow(). Thus the steps involved are :

Acquire the Desktop window handle using the function GetDesktopWindow();
Get the DC of the desktop window using the function GetDC();
Create a compatible DC for the Desktop DC along with a compatible bitmap to select into that compatible DC. These can be done using CreateCompatibleDC() and CreateCompatibleBitmap(); Selecting the bitmap into our DC can be done with SelectObject();
When ever you are ready to capture the screen just blit the contents of the Desktop DC into the created Compatible DC; That's all, You are done. The Compatible Bitmap we created now contains the contents of the screen at the moment of the capture.
Do not forget to release the objects when you are done; Memory is precious (for the other applications).

Example:

void CaptureScreen()
{
    int  nScreenWidth = GetSystemMetrics(SM_CXSCREEN);

    int  nScreenHeight = GetSystemMetrics(SM_CYSCREEN);

    HWND hDesktopWnd = GetDesktopWindow();

    HDC  hDesktopDC = GetDC(hDesktopWnd);

    HDC  hCaptureDC = CreateCompatibleDC(hDesktopDC);

    HBITMAP hCaptureBitmap = CreateCompatibleBitmap(hDesktopDC, nScreenWidth, nScreenHeight);

    SelectObject(hCaptureDC,hCaptureBitmap); 

    BitBlt(hCaptureDC,0,0,nScreenWidth,nScreenHeight,hDesktopDC,0,0,SRCCOPY); 

    SaveCapturedBitmap(hCaptureBitmap); //Place holder - Put your code here to save the captured image to disk

    ReleaseDC(hDesktopWnd,hDesktopDC);

    DeleteDC(hCaptureDC);

    DeleteObject(hCaptureBitmap);
}

In the above code snippet the function GetSystemMetrics() returns the screen width when used with SM_CXSCREEN and returns the screen height when called with SM_CYSCREEN. Refer to the accompanied source code for the details of how to save the captured bitmap to the disk and how to send it to the clipboard. Its pretty straight forward. The source code implements the above technique for capturing the screen contents at regular intervals and creates a movie out of the captured image sequences.

Download the Source for the Screen Capture Application using GDI : ScreenCap.zip

And the DirectX way of doing it

Capturing the screen shot with DirectX is a pretty easy task. DirectX offers a neat way of doing this.

Every DirectX application contains what we call buffer or surface to hold the contents of the video memory related to that application. This is called the Back Buffer of the application. Some applications might have more than one back buffer. And there is another buffer that every application can by default access - the Front buffer. This one, the front buuuuffer, holds the video memory related to the desktop contents and so essentially is the screen image.

By accessing the front buffer from our DirectX application we can capture the contents of the screen at that moment. Due to the low level mechanisms of DirectX we are guaranteed to get optimal performance - at least better than that of GDI approach.

Accessing the front buffer from the DirectX application is pretty easy and straight forward. The interface IDirect3DDevice8 provides GetFrontBuffer() method that takes a IDirect3DSurface8 object pointer and copies the contents of the front buffer onto that surface. The IDirect3DSurfce8 Object can be generated by using the method IDirect3DDevice8::CreateImageSurface(); Once the screen is captured onto the surface, we can use the function D3DXSaveSurfaceToFile() to save the surface directly to the disk in bitmap format. Thus the code to capture the screen look as follows:

extern IDirect3DDevice8* g_pd3dDevice;

void CaptureScreen()
{
    IDirect3DSurface8 * pSurface;

    g_pd3dDevice->CreateImageSurface(ScreenWidth,ScreenHeight,D3DFMT_A8R8G8B8,&pSurface);

    g_pd3dDevice->GetFrontBuffer(pSurface); // Capture the screen

    D3DXSaveSurfaceToFile("Desktop.bmp",D3DXIFF_BMP,pSurface,NULL,NULL); // Save the captured content to file

    pSurface->Release();    
}

In the above g_pd3dDevice is an IDirect3DDevice object and has been assumed to be properly initialized. This code snippet saves the captured image onto the disk directly. However, instead of saving to disk if we just want to operate on the image bits directly - we can do so by using the method IDirect3DSurface8::LockRect(). This gives a pointer to the surface memory - which is essentially a pointer to the bits of the captured image. We can copy the bits to our application defined memory and can operate of them. The following code snippet presents how the Surface contents can be copied into our application defined memory.

extern void* pBits;

extern IDirect3DDevice8* g_pd3dDevice;

IDirect3DSurface8 * pSurface;

g_pd3dDevice->CreateImageSurface(ScreenWidth,ScreenHeight,D3DFMT_A8R8G8B8,&pSurface);

g_pd3dDevice->GetFrontBuffer(pSurface);

D3DLOCKED_RECT lockedRect;

pSurface->LockRect(&lockedRect,NULL,D3DLOCK_NO_DIRTY_UPDATE|D3DLOCK_NOSYSLOCK|D3DLOCK_READONLY)));

for(int i=0; i < ScreenHeight; i++)
{
   memcpy( (BYTE*) pBits + i * ScreenWidth * BITSPERPIXEL / 8 , (BYTE*)  lockedRect.pBits + i* lockedRect.Pitch , ScreenWidth * BITSPERPIXEL / 8);
}

g_pSurface->UnlockRect();

pSurface->Release();

In the above pBits is a void* pointer. Make sure that we have allocated enough memory before copying into pBits. A typical value for BITSPERPIXEL is 32 bits per pixel. However it may vary depending on the your current monitor settings. The important point to note here is that the width of the surface is not same as the captured screen image width. Because of the issues involved in the memory alignment ( memory aligned to word boundaries are assumed to be accessed faster compared to non aligned memory) the surface might have added additional stuff at the end of each row to make them perfectly aligned to the word boundaries. The lockedRect.Pitch gives us the number of bytes between the starting points of two successive rows. That is, to advance to the correct point on the next row we should advance by Pitch not by Width. You can copy the surface bits in reverse using the following:

for( int i=0 ; i < ScreenHeight ; i++)
{
   memcpy((BYTE*) pBits +(ScreenHeight - i - 1) * ScreenWidth * BITSPERPIXEL/8 , (BYTE*)  lockedRect.pBits + i* lockedRect.Pitch , ScreenWidth* BITSPERPIXEL/8);
}

This may come handy when you are converting between top-down and bottom-up bitmaps.

While the above technique of LockRect() is the only way of accessing the captured image content on IDirect3DSurface8, we have a much sophisticated method defined in the latest IDirect3DSurface9 : the GetDC() method. We can use the IDirect3DSurface9::GetDC() method to get a GDI compatible device context for the DirectX image surface which makes it possible to directly blit the surface contents to our application defined DC. So try it instead if you are using DirectX 9.0.

Finally, a point worth noting when using this technique for screen capture is the caution mentioned in the documentation: The GetFrontBuffer() is a slow operation by design and should not be considered for using in performance critical applications. You have been warned !!.

Download the Source for the Screen Capture using DirectX 8.0: ScreenCapDx.zip. The source code implements the above technique for capturing the screen contents at regular intervals and creates a movie out of the captured image sequences.

Windows Media API for Capturing the Screen

Windows Media 9.0 supports screen captures using the Windows Media Encoder 9 API. It includes a codec named Windows Media Video 9 Screen codec that has been specially optimized to operate on the content produced through screen captures. The Windows Media Encoder API provides the interface IWMEncoder2 which can be used to capture the screen content efficiently.

Working with the Windows Media Encoder API for screen captures is pretty straight forward. First we need to start with the creation of an IWMEncoder2 object by using the CoCreateInstance() function. This can be done as

IWMEncoder2* g_pEncoder=NULL; 

CoCreateInstance(CLSID_WMEncoder,NULL,CLSCTX_INPROC_SERVER,IID_IWMEncoder2,(void**)&g_pEncoder);

The Encoder object thus created contains all the operations for working with the captured screen data. However, inorder to perform its operations properly, the encoder object depends on the settings defined in what is called a profile. A profile is nothing but a file containing all the settings that control the encoding operations. We can also create custom profiles at runtime with various customized options, such as codec options etc.. depending on the nature of the captured data. To use a profile with our screen capture application we create a custom profile based on the Windows Media Video 9 Screen codec. Custom Profile objects have been supported with the interface IWMEncProfile2. We can create a custom profile object by using the CoCreateInstance() function as

IWMEncProfile2* g_pProfile=NULL;

CoCreateInstance(CLSID_WMEncProfile2,NULL,CLSCTX_INPROC_SERVER,IID_IWMEncProfile2,(void**)&g_pProfile);

We need to specify the target audience for the encoder in the profile. Each profile can hold multiple number of audience configurations which are objects of interface IWMEncAudienceObj. Here we use one audience object for our profile. We create the audience object for our profile by using the method IWMEncProfile::AddAudience() which would return a pointer to IWMEncAudienceObj which can then be used for configurations such as video codec setting (IWMEncAudienceObj::put_VideoCodec()), video frame size settings (IWMEncAudienceObj::put_VideoHeight() and IWMEncAudienceObj::put_VideoWidth()) etc.. For example, we set the video codec to be Windows Media Video 9 Screen codec as

extern IWMEncAudienceObj* pAudience;

#define VIDEOCODEC MAKEFOURCC('M','S','S','2') //MSS2 is the fourcc for the screen codec

long lCodecIndex=-1;

g_pProfile->GetCodecIndexFromFourCC(WMENC_VIDEO, VIDEOCODEC, &lCodecIndex); //Get the Index of the Codec

pAudience->put_VideoCodec(0, lCodecIndex);

The fourcc is a kind of unique identifier for each codec in the world. The fourcc for the Windows Media Video 9 Screen codec is MSS2. The IWMEncAudienceObj::put_VideoCodec() accepts the profile index as the input to recognize a particular profile - which can be obtained by using the method IWMEncProfile::GetCodecIndexFromFourCC().

Once we have completed configuring the profile object we can choose that profile into our encoder by using the method IWMEncSourceGroup :: put_Profile() which is defined on the source group objects of the encoder. A source group is a collection of sources where each source might be a video stream or audio stream or html stream etc.. Each encoder object can work with many source groups from which it get the input data. Since our screen capture application uses only video stream - our encoder object need to have one source group with a single source, the video source, in it. This single video source need to configured to use the Screen Device as the input source which can be done by using the method IWMEncVideoSource2::SetInput(BSTR) as

extern IWMEncVideoSource2* pSrcVid;

pSrcVid->SetInput(CComBSTR("ScreenCap://ScreenCapture1");

The destination output can be configured to save into a video file (wmv movie) by using the method IWMEncFile::put_LocalFileName() which requires an IWMEncFile Object. This IWMEncFile object can be obtained using the method IWMEncoder::get_File() as

IWMEncFile* pOutFile=NULL;

g_pEncoder->get_File(&pOutFile);

pOutFile->put_LocalFileName(CComBSTR(szOutputFileName);

Now, once all the necessary configurations have been done on the encoder object we can use the method IWMEncoder::Start() to start capturing the screen. The methods IWMEncoder::Stop() and IWMEncoder::Pause might be used for stopping and pausing the capture.

While this deals with full screen capture, we can alternately select the regions of capture by adjusting the properties of input video source stream. For this we need to use the IPropertyBag interface of the IWmEnVideoSource2 object as

#define WMSCRNCAP_WINDOWLEFT        CComBSTR("Left")
#define WMSCRNCAP_WINDOWTOP         CComBSTR("Top")
#define WMSCRNCAP_WINDOWRIGHT       CComBSTR("Right")
#define WMSCRNCAP_WINDOWBOTTOM      CComBSTR("Bottom")
#define WMSCRNCAP_FLASHRECT         CComBSTR("FlashRect")
#define WMSCRNCAP_ENTIRESCREEN      CComBSTR("Screen")
#define WMSCRNCAP_WINDOWTITLE       CComBSTR("WindowTitle")

extern IWMEncVideoSource2* pSrcVid;

int nLeft, nRight, nTop, nBottom;

pSrcVid->QueryInterface(IID_IPropertyBag,(void**)&pPropertyBag);

CComVariant varValue = false;
pPropertyBag->Write(WMSCRNCAP_ENTIRESCREEN,&varValue);

varValue = nLeft;
pPropertyBag->Write( WMSCRNCAP_WINDOWLEFT, &varValue );

varValue = nRight;
pPropertyBag->Write( WMSCRNCAP_WINDOWRIGHT, &varValue );

varValue = nTop;
pPropertyBag->Write( WMSCRNCAP_WINDOWTOP, &varValue );

varValue = nBottom;
pPropertyBag->Write( WMSCRNCAP_WINDOWBOTTOM, &varValue );

The accompanied source code implements this technique for capturing the screen. One point that might be intersting, apart from the nice quality of the produced output movie, is that in this the mouse cursor is also captured. (By default GDI and DirectX are unlikely to capture the mouse cursor).

Download the Source Code for Capturing the Screen Using the Windows Media API: WMEncScrnCap.zip

Note that your system needs to be installed with Windows Media 9.0 SDK components to create applications using the WindowMedia9.0 API. You can download the the Windows Media Encoder SDK from the URL: http://www.microsoft.com/downloads/details.aspx?FamilyID=000a16f5-d62b-4303-bb22-f0c0861be25b

To run your applications, end users must install Windows Media Encoder 9 Series. When you distribute applications based on the Windows Media Encoder SDK, you must also include the Windows Media Encoder software, either by redistributing Windows Media Encoder in your setup or by requiring your users to install Windows Media Encoder themselves.

The Windows Media Encoder 9.0 Can be downloaded from: http://www.microsoft.com/windows/windowsmedia/forpros/encoder/default.mspx

Preventing the Screen Capture

After reading much about screen captures, the next topic one wants to discuss is how to prevent screen captures. Unfortunately, preventing a capture is little bit more complicated than performing a capture. The reason being - we are trying to restrict the usual behavior of an application. The content that is supposed to be visible to user's eyes, cannot be prevented from being captured/copied easily, because there is no clear boundary defined for what it means to be a user perceptible content in the Windows Environment (what user wants vs what user is allowed).

Having said that, there is a chance you can limit the usual screen capture possibilities. If we review all the typical screen capturing method that are available, there are two prominent categories: 1. User mode capturing 2. Kernel mode capturing; This article explained the different ways of user mode screen capture for programmers. The other method left (not covered here) is the Kernel mode, which is through writing video miniport mirror drivers.

A typical way of preventing screen capture for the user mode capturing applications is to hook the API discussed above and restrict the operations. For example, if you do not want your application content to be captured through, say the GDI method discussed above, you would typically hook the BitBlt method system wide and deny the operation for a particular process. Similarily, if you would like to prevent the capture through any other API, then you would have to hook that API also and deny those requests. As you can see, this is not so elegant solution. Prone to failures and difficult to extend or maintain, since you have to think of all possible API that can be used to capture the screen and hook each of them for denial.

The other more reliable approach to screen capture prevention is writing Video filter drivers. Typically you would have a kernel mode filter driver (that permits or denies the video blit operations) along with a user mode service to interact with it. The user mode service will take care of identifying the access security for the capturing processes and supplies those details to the kernel mode driver, which then will take care of either denying the blit request (by painting black) or processing it. You may not be required to implement a complete video display driver, since you are not doing any raster or display operations as such - you are just allowing or denying existing display operations. You would simply write a filter driver that sits on top of existing display driver and hack the calls to it. With thousands of calls to the display driver from all applications, which one to deny and which one not to - is determined by your user mode service.

Video filter driver is a more reliable solution for your screen capture prevention needs. However, its implementation is complex and costly. Contrariwise, API hooking is a relatively easy to implement solution, but prone to failures since we may have left few API from consideration (such as the cases where the monitor output is physically connected to video recorder or another PC Monitor. Also, the possibility of remote desktop or VNC connections where user is supposed to be allowed to see the content on different screen).

Beyond these two, there is one other solution, though. It is surprisingly simple, easy to implement and yet reliable. Based on the Windows core system services feature, it is not quite releated to screen capture or prevention as such, but nontheless is a good technique to achieve best results for preventing screen captures. If you tradeoff the conventional workflow with a little bit different style of working, you get kernel mode video driver level of security just with a simple user mode API from a simple ordinary service. That technique is rather unconventional and is quite beyond the scope of this article. If you would like to know more details about it, please feel free to contact the author.

Conclusions

All the variety of techniques discussed in this article are aimed at single goal - capturing the contents of the screen. However, as can be guessed easily, the results vary depending upon the particular technique that is being employed in the program. If all that we want is just a random snap shot occasionally, the GDI approach is a good choice given its simplicity. However, using Windows media would be a better option if we want more professional results. One point worth noting is - the quality of the content captured through these mechanisms might as well depend on the settings of the system. For example, disabling hardware acceleration (Desktop properties | settings | Advanced | Troubleshoot) might drastically improve the overall quality and performance of the capture application.

P.Gopalakrishna

Homepage Other Articles