If your author does not supply external image files together with their Word file, your options for extracting image files from the Word file depend both on the file type of the Word file (DOC or DOCX) and also on the type of images that are present. There are two fundamentally different types of image files:


Bitmap images are images such as photographs that are simply a collection of pixels, e.g. GIF, JPG, PNG, TIF, which eventually become blurred as they are enlarged.


Vector graphics are images that look the same even if they are magnified. Rather than being made up of pixels, they are made up of dynamic objects that are resized and redrawn by the source software as the image is resized. Examples include EPS, AI (Adobe Illustrator), EMF (Enhanced Metafile, used for transfer of images such as Excel charts or Powerpoint slides between Office applications) and WMF (Windows Metafile, the internal Office drawing format).


Extracting bitmap images from DOC files

If the author has supplied a DOC file, you can either use the eXtyles automated image export process at Activation or Cleanup, or you can copy and paste the bitmap images out of the file. Both of these alternatives are limited by the way in which bitmap images are stored in DOC files.


eXtyles can export bitmap images in a limited range of formats (JPG, GIF), and they are limited to a resolution of 72 d.p.i. If you attempt to copy bitmap images out of the Word file by hand, they will be copied to the clipboard at 72 d.p.i., so it is likely that they will suffer significant loss of resolution.


For these reasons, extracting bitmap images directly from DOC files is unlikely to be your preferred option, and may only be a workable option for the simplest images. There are other possible workflows that can be resorted to in extremis, such as printing the Word file to PDF and then using a graphics application that can open PDF (such as Adobe Photoshop) to grab better-quality representations of the images, but none of them is without its drawbacks.


Extracting bitmap images from DOCX files

Fortunately, the DOCX format allows much more satisfactory extraction of bitmap images. This is because the DOCX format stores a native (or near-native) copy of any embedded bitmap images.


This means that both automatic export of images on Activation or Cleanup by eXtyles and manual extraction can result in bitmap images that are identical or similar to the original graphics files prepared by the author.


To extract images from a DOCX file by hand, you can take advantage of the fact that the DOCX file is actually a ZIP archive. If you make a copy of the DOCX file and change the file extension by hand from .DOCX to .ZIP (ignoring Microsoft’s very sensible warning about the possible consequences of changing a file extension), you can then use WinZip or your preferred unzipping method to extract the contents of the DOCX file.


Inside the unzipped archive, you will find a subdirectory called “word” that contains a subdirectory called “media”. Inside the word\media subdirectory, you will find all of the image files that are contained in the DOCX file.


Note: if you want to recover the DOCX file from the renamed ZIP file, you should change the file extension of the zipped archive back to DOCX rather than trying to rezip the extracted files into a new ZIP file that you then rename as a DOCX file, as any changes that you may have made to the unzipped files may prevent successful conversion back to a useable DOCX file.


Extracting vector graphics from Word files

Extracting vector graphics from Word depends on the availability of the source software (if you don’t have the source software installed, the best you can hope for is to obtain a WMF file by unzipping the DOCX file). And, even if you can access the original image file, most vector graphics formats cannot be used for online presentation in a browser, so you would need to convert the file to a bitmap format at some point for web presentation.


Although Word drawing objects and images created in other Office applications can be extracted from DOCX files in the same way as bitmap images, the resulting WMF or EMF files cannot be satisfactorily manipulated outside of the source Office application. From a DOC file, these images would be extracted by eXtyles as WMF files.


For these reasons, we would generally recommend that you should not attempt to use vector graphics embedded in Word in an XML workflow, but instead you should request the author to supply the images as external files in a bitmap format.