AspPDF.com -- Chapter 17: PDF to Image Conversion

17.1 Page.ToImage Method

17.2 CMYK-to-RGB Conversion

17.3 Error Log

17.4 Other ToImage Parameters

17.5 Image Extraction

17.6 Printing

17.7 Structured Text Extraction

17.8 Image Replacement

Starting with Version 2.0, AspPDF is capable of converting a PDF page to an image via the method PdfPage.ToImage.

The ToImage method returns an instance of the PdfPreview object which performs the PDF-to-image conversion and generates the page image. The image can then be saved to disk, memory or an HTTP stream. The ToImage method accepts an optional PdfParam object or parameter string argument.

PdfPreview generates images in PNG format. This versatile image format is ideal for the task since it supports true colors and uses artifact-free lossless compression. PNG format is fully supported by all major browsers.

The following code sample generates a PDF document from a URL, saves/reopens it, and converts page 1 of the document to an image. The saving/reopening step is necessary because the PdfPage.ToImage method can only be used on existing, not new, documents.

Set Pdf = Server.CreateObject("Persits.Pdf")
Set Doc = Pdf.CreateDocument
Doc.ImportFromUrl "http://www.aspupload.com", "landscape=true"

' Save and reopen as Page.Preview only works on new documents.
Set NewDoc = Pdf.OpenDocumentBinary( Doc.SaveToMemory )

' Create preview for Page 1 at 50% scale.
Set Page = NewDoc.Pages(1)
Set Preview = Page.ToImage("ResolutionX=36; ResolutionY=36" )

Preview.SaveHttp( "filename=preview.png" )

IPdfManager objPdf = new PdfManager();
IPdfDocument objDoc = objPdf.CreateDocument( Missing.Value );
objDoc.ImportFromUrl( "http://www.aspupload.com", "landscape=true",
Missing.Value, Missing.Value );

// Save and reopen as Page.Preview only works on new documents.
IPdfDocument objNewDoc =
objPdf.OpenDocumentBinary( objDoc.SaveToMemory(), Missing.Value );

// Create preview for Page 1 at 50% scale.
IPdfPage objPage = objNewDoc.Pages[1];
IPdfPreview objPreview = objPage.ToImage("ResolutionX=36; ResolutionY=36" );

objPreview.SaveHttp( "filename=preview.png" );

Click on the links below to run this code sample:

http://localhost/asppdf/manual_17/17_pdftoimage.asp http://localhost/asppdf/manual_17/17_pdftoimage.aspx

By default, the resultant image's width and height in pixels match the page's width and height in user units. For example, a standard US Letter page (8.5 x 11 inches or 612 x 792 user units) becomes a 612 x 792 pixel image (in case of a landscape-oriented page, the width and height of the image matches the height and width of the page, respectively.)

To set the page image to a desired size, the ResolutionX and ResolutionY parameters to the ToImage method should be used. These parameters are 72 (dpi) by default. Note that in the code sample above, these parameters are both set to 36 which makes the image dimensions half of what they would be by default. Setting these parameters to a number larger (smaller) than 72 makes the resultant image proportionally larger (smaller). The ResolutionX and ResolutionY values usually equal each other to avoid a distortion in the image's aspect ratio.

To obtain the pixel dimensions of the resultant image, use the PdfPreview object's properties Width and Height.

Page images generated by the ToImage method are always in the RGB color space, but the original PDF being converted to an image may contain images and graphics in the CMYK color space, in which case they have to be converted to RGB.

To achieve reasonably good color reproduction, the ToImage method performs a series of complex non-linear color transformations based on profiles, the standard color space definitions established by the International Color Consortium (ICC). Profile-based CMYK-to-RGB conversion is a fairly slow process. If your PDF document contains large high-resolution CMYK images and performance is of essence, use the parameter FastCMYK=True which invokes a simple linear formula for CMYK-to-RGB conversion and offers some performance improvement at the expense of color-reproduction quality.

The following images demonstrate the effect of the FastCMYK parameter:

Original Document:

FastCMYK=False (default):

FastCMYK=True:

The majority of PDF documents are self-sufficient since they embed all the fonts they use. Documents like that make it easy and straightforward for PDF viewers such as Acrobat Reader and PDF-to-image converters such as AspPDF to do their jobs.

Some PDF documents, however, only reference font names and do not embed the fonts themselves. While large applications such as Acrobat have the luxury of being deployed with a whole library of fonts they can use, AspPDF only contains the 14 required standard PDF fonts in it and must rely on the fonts already installed on the system in case the PDF document does not embed a certain font. Therefore, not every document can be rendered properly.

To help diagnose such issues, the PdfPreview object provides the property Log that returns a string of errors encountered during the PDF-to-image conversion process. Most of these errors are usually font-related. Error entries are separated by a pair of CRLFs. To enable error logging, the parameter Debug=True must be used:

Set Preview = Page.ToImage( "ResolutionX=20; ResolutionY=20; Debug=True" )
Response.Write Preview.Log

A typical log string may look as follows:

Font 'Palatino-Roman' was replaced by 'Helvetica'.

Could not find external TrueType font 'ArialUnicodeMS'.

A PDF page may contain a /Rotate attribute set to 90, 180 or 270 (degrees), which makes this page appear in the landscape mode or upside-down. By default, the ToImage method takes the Rotate value into account to orient the image appropriately. If, for whatever reason, the Rotate value needs to be ignored, the parameter IgnoreRotate should be set to True.

Set Preview = Page.ToImage( "...; IgnoreRotate=True" )

Also, some PDF pages contain /MediaBox and /CropBox attributes which are different from each other. By default, the ToImage method uses the CropBox attribute to calculate the dimensions of the resultant image, which is consistent with the behavior of all major PDF viewers. If, for whatever reason, the MediaBox attribute needs to be used instead (which usually covers a larger area than the CropBox), the parameter IgnoreCropBox should be set to True. This may cause the image to include areas of the page that are normally invisible when viewed in a PDF viewer.

Set Preview = Page.ToImage( "...; IgnoreCropBox=True" )

As an additional bonus, the PdfPreview object is capable of extracting images from a PDF page. The method PdfPreview.ExtractImage returns a new instance of the PdfPreview object representing an image specified by a 1-based index. This image can then be saved the regular way, via the methods Save, SaveToMemory or SaveHttp. If the specified index exceeds the number of images on the page, ExtractImage returns Nothing (or null in C#.)

The images are always saved as PNGs as this is PdfPreview's format of choice.

The following code snippet opens a PDF documents and saves all of the images from page 1 to disk:

Set PDF = Server.CreateObject("Persits.PDF")
Set Doc = PDF.OpenDocument("c:\path\document.pdf")
Set Page = Doc.Pages(1)

Set Preview = Page.ToImage

i = 1
Do While True
   Set Image = Preview.ExtractImage(i)
   If Image Is Nothing Then
      Exit Do
   Else
      Image.Save "c:\path\image.png", False
   End If
   i = i + 1
Loop

IPdfManager objPDF = new PdfManager();
IPdfDocument objDoc = objPDF.OpenDocument( @"c:\path\document.pdf", Missing.Value );
IPdfPreview objPreview = objDoc.Pages[1].ToImage(Missing.Value);

int i = 1;
IPdfPreview objImage;

while( ( objImage = objPreview.ExtractImage(i++) ) != null )
{
objImage.Save( @"c:\myprojects\aspdf\asp\files\image.png", false );
}

As of Version 3.6, AspPDF is capable of replacing images in an existing PDF document with other images or graphics. The image extraction and image replacement features can be used jointly to shrink the overall size of a PDF document by substituting its high-resolution images with their lower-resolution versions. Image replacement is covered in detail in Section 17.8 below.

17.6.1 Individual Page Printing

As of Version 2.1, the PdfPreview object offers automatic printing functionality via the method SendToPrinter. This method sends the image of a page contained in this PdfPreview object to a printer.

The SendToPrinter method accepts two arguments: the local or network name of the printer and, optionally, a parameter list adjusting the appearance of the printout. As of Version 2.6.0.3, if the printer name is set to an empty string, the default printer name for the current machine is used.

The print quality is determined by the resolution of the image being printed. It is therefore recommended that the ResolutionX and ResolutionY parameters to the ToImage method be set to at least 300 or, better yet, 600.

By default, the SendToPrinter method prints the image as it is, without stretching it. If the parameter Stretch is set to True, the image is stretched to cover the entire print area. Additionaly, you can use the parameters ScaleX and ScaleY to scale the image up or down. For example, the values "ScaleX = 0.5; ScaleY= 0.5" scales the image down by 50%.

The following code sample sends page 2 of c:\path\document.pdf to the printer. The image is stretched to cover the entire print area.

Set PDF = Server.CreateObject("Persits.PDF")
Set Doc = PDF.OpenDocument("c:\path\document.pdf")
Set Page = Doc.Pages(2)

Set Preview = Page.ToImage("ResolutionX=600; ResolutionY=600")

Preview.SendToPrinter "\\192.168.1.2\HP LaserJet 6P", "Stretch=true"

IPdfManager objPDF = new PdfManager();
IPdfDocument objDoc = objPDF.OpenDocument( @"c:\path\document.pdf", Missing.Value );
IPdfPage objPage = objDoc.Pages[2];

IPdfPreview objPreview = objPage.ToImage("ResolutionX=600; ResolutionY=600");

objPreview.SendToPrinter( @"\\192.168.1.2\HP LaserJet 6P", "Stretch=true" );

You may encounter the error "Access is denied" when attempting to send the image to a network printer. To avoid the error, you need to impersonate an interactive user account with the LogonUser method of the PdfManager object, as follows:

...
PDF.LogonUser "domain", "username", "password"
Preview.SendToPrinter "\\192.168.1.2\HP LaserJet 6P", "Stretch=true"

...
objPDF.LogonUser( "domain", "username", "password", Missing.Value );
objPreview.SendToPrinter( @"\\192.168.1.2\HP LaserJet 6P", "Stretch=true" );

The LogonUser method was added in Version 2.1 as well. The first argument is a Windows domain and can be an empty string. The 2nd and 3rd arguments are the username and password of the account to be impersonated. The 4th argument is the login type and usually omitted.

17.6.2 Document Printing

As of Version 2.7, the page printing functionality described in the previous sub-section has been expanded to support the printing of an entire document or any portion thereof, in both simplex (one-sided) and duplex (double-sided) modes.

In AspPDF 2.7+, the PdfDocument object has been given its own SendToPrinter method which sends the entire document (or an arbitrary set of pages) to the printer as opposed to just an individual page. Internally, the PdfDocument.SendToPrinter method iterates through the pages of the document, creates PdfPreview objects for each of them and sends the page images to the printer one by one. If your printer supports duplex printing, this method can optionally print double-sided documents in both long-edge and short-edge binding modes.

The PdfDocument.SendToPrinter method expects the same arguments as its PdfPreview.SendToPrinter counterpart: the printer name (local or network) and a list of parameters. In addition to the parameters described above, this method also accepts parameters controlling the duplex mode, as well as the page ranges to be printed.

The Duplex parameter controls duplex printing. Duplex=1 enables duplex printing in the regular long-edge-binding mode, and Duplex=2 in the short-edge binding mode. If this parameter is set to 0 or omitted, the regular simplex (one-sided) printing is used.

The From1/To1, From2/To2, ..., FromN/ToN pairs of parameters specify the ranges of pages to be printed. Page indices are 1-based. If the specified ranges overlap, the overlapping pages will be printed multiple times. By default, the entire document is printed.

The following code prints pages 2, 5, and 7-10 of a document in a duplex long-edge-binding mode:

Set Doc = Pdf.OpenDocument( "c:\path\doc.pdf" )

Doc.SendToPrinter "Brother HL-2270DW series", _
"Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10"

PdfDocument objDoc = objPdf.OpenDocument( @"c:\path\doc.pdf" );

objDoc.SendToPrinter( "Brother HL-2270DW series",
"Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10" );

As of Version 2.9, the PdfDocument.SendToPrinter method also supports printer tray (or bin) selection via the Tray parameter. The Microsoft documentation defines the following values for this parameter:

First (1), upper (1), only one (1), lower (2), middle (3), manual (4), envelope (5), envelope manual (6), auto (7), tractor (8), small format (9), large format (10), large capacity (11), cassette (14), form source (15), last (15).

However, many printers use driver-specific tray values that start at 256 and up. The correct Tray values for such printers should be determined by trial and error.

As of Version 3.2.0.2, the PdfDocument.SendToPrinter method also supports printing multiple copies via the Copies parameter (1 by default.)

As of Version 3.4.0.3, the Collate parameter (False by default) is supported, which enable collation if set to True. For example, if you print two copies of a three-page document and you choose not to collate them, the pages print in this order: 1, 1, 2, 2, 3, 3. If you choose to collate, the pages print in this order: 1, 2, 3, 1, 2, 3.

Also, Version 3.4.0.3 adds two more parameters to support label printers: PaperWidth and PaperHeight. These parameters specify the paper width and height in tenths of a millimeter. For example, if your label printer uses 2 ⁵/₁₆" x 4" labels, the PaperWidth and PaperHeight parameters should be set to 590 and 1020, respectively. If these parameters are not used, a document may come out shrunk when printed on a label printer.

As of version 2.8, the PdfPreview object has been expanded to perform yet another useful task: extracting text strings from the document along with their respective coordinates. This feature enables you to know exactly where on the page a particular text item is. Regular (coordinate-less) text extraction is described in Section 9.4 - Content Extraction.

PdfPreview's TextItems property returns a collection of objects, each encapsulating the text fragment and its respective coordinates and dimensions. To avoid adding a new object to AspPDF's already populous object diagram, we have retrofitted the PdfRect for the task by adding a new Text property to this object which returns the actual fragment of text in Unicode format. The existing PdfRect properties, Left, Bottom, Width and Height, return the coordinates of the lower-left corner and horizontal and vertical dimensions of the fragment, respectively.

To populate the TextItems collection, the PdfPage.ToImage method must be called with the parameter ExtractText set to a non-zero value. ExtractText can be a combination (sum) of the following flags:

Bit 1 (1): Enables text extraction. If this flag is not set, text extraction is not performed and the TextItems collection is empty.
Bit 2 (2): Sorts text fragments in the order from top to bottom, and from left to right. If this flag is not set, the text fragments in the TextItems collection appear in an arbitrary order.
Bit 3 (4): Glues adjacent text fragments together. If this flag is not set, a single text fragment may contain a single word, a part of the word or even a single character. Setting this flag usually combines all or most text fragments of a paragraph line into a single long string. For this flag to work, bit 2 must also be set.
Bit 4 (8): Does not glue adjacent text fragments if there is a space character separating them. For this flag to work, flags 2 and 3 must also be set.

The following code sample draws red outlines around all text fragments it finds on a page, as well as the order in which each fragment is encountered in the collection (as shown on the image below.)

Set Pdf = Server.CreateObject("Persits.Pdf")
Set Doc = PDF.OpenDocument( Server.MapPath("population.pdf"))
Set Page = doc.Pages(1)
Set Preview = Page.ToImage("extracttext=7") ' sort/glue

Page.Canvas.LineWidth = 0.5
Page.Canvas.SetFillColor 1, 0, 0

Dim i: i = 1

For Each rect In Preview.TextItems
   Page.Canvas.SetColor 1, 0, 0
   Page.Canvas.SetFillColor 1, 0, 0

   ' Red outline
   Page.Canvas.DrawRect rect.Left, rect.Bottom, rect.Width, rect.Height

   'Small box on top to display count
   Page.Canvas.FillRect rect.Left, rect.Top, 10, 5
   Page.Canvas.DrawRect rect.Left, rect.Top, 10, 5
   Page.Canvas.DrawText i, "x=" & rect.Left + 1 & "; y=" & rect.Top + 6 & ";color=white; size=5", Doc.Fonts("Helvetica")

   i = i + 1
Next

Filename = Doc.Save( Server.MapPath("extracttext.pdf"), False )

IPdfManager objPdf = new PdfManager();
IPdfDocument objDoc = objPdf.OpenDocument(Server.MapPath("population.pdf"), Missing.Value);
IPdfPage objPage = objDoc.Pages[1];
IPdfPreview objPreview = objPage.ToImage("extracttext=7"); // sort/glue

objPage.Canvas.LineWidth = 0.5f;
objPage.Canvas.SetFillColor( 1, 0, 0 );

int i = 1;

foreach( IPdfRect rect in objPreview.TextItems )
{
   objPage.Canvas.SetColor( 1, 0, 0 );
   objPage.Canvas.SetFillColor( 1, 0, 0 );

   // Red outline
   objPage.Canvas.DrawRect( rect.Left, rect.Bottom, rect.Width, rect.Height );

   // Small box on top to display count
   objPage.Canvas.FillRect( rect.Left, rect.Top, 10, 5 );
   objPage.Canvas.DrawRect( rect.Left, rect.Top, 10, 5 );
   objPage.Canvas.DrawText( i.ToString(), "x=" + (rect.Left + 1).ToString() + "; y=" + (rect.Top + 6).ToString() + ";color=white; size=5",
      objDoc.Fonts["Helvetica", Missing.Value] );

   i++;
}

String strFilename = objDoc.Save( Server.MapPath("extracttext.pdf"), false );

Click on the links below to run this code sample:

http://localhost/asppdf/manual_17/17_extracttext.asp http://localhost/asppdf/manual_17/17_extracttext.aspx

17.8.1 Feature Overview

As of Version 3.6, AspPDF is capable of replacing images in an existing PDF documents with other images or graphics. This feature is useful, among other things, for reducing the overall size of a PDF document by replacing its high-resolution images with their lower-resolution versions.

The image replacement feature is built on top of the document stitching functionality described in Section 14.1. Consider the following code:

Set Doc2 = PDF.OpenDocument(path)
Set Doc1 = PDF.CreateDocument
Doc1.AppendDocument Doc2
Doc1.Save path2

This code creates an empty new document, Doc1, and appends Doc2 to it, thus creating a document almost completely identical to Doc2. All items comprising Doc2 are copied to Doc1 during the appending operation, including images.

If a certain image in Doc2 needs to be replaced with another image, the new image needs to be opened via the Doc1 object's OpenImage method, and a mapping between the new and old images needs to be added with the method AddImageReplacement (introduced in Version 3.6):

Set Doc2 = PDF.OpenDocument( path1 )
Set Doc1 = PDF.CreateDocument

Set Image = Doc1.OpenImage( imagePath );

Doc1.AddImageReplacement imageID, Image

Doc1.AppendDocument Doc2
Doc1.Save path2

This code copies all items from Doc2 to Doc1 except the image item specified by imageID. That item is not copied and the Image object is used in its place.

The 1st argument to the AddImageReplacement method specifies the internal ID of the image item to be replaced within Doc2, and the 2nd argument is an instance of the PdfImage object to replace it with. The 2nd argument can also be an instance of the PdfGraphics object if the image needs to be replaced with a graphics as opposed to another image. The method AddImageReplacement can be called multiple times if multiple images need to be replaced.

An image ID is a string containing two numbers separated by an underscore, such as "123_0". The 2nd number is usually 0. Image IDs can be obtained via the Log property of the object representing an extracted image, or via the PdfPreview.ImageItems collection, as described below.

17.8.2 Combining Image Extraction and Image Replacement

The following code sample replaces all images in a PDF document with their resized versions. Image resizing is performed with the help of AspJpeg, another Persits component which can be downloaded from www.aspjpeg.com.

First, the code sample extracts all images from the document using the procedure described in Section 17.5 earlier in this chapter. The extracted images are then resized by 50%. Their names and image IDs are recorded in arrays. The image ID of an extracted image is returned by the Log property of the PdfPreview representing this extracted image.

Lastly, the AddImageReplacement method is called for every resized image in the array, followed by a call to AppendDocument and Save to complete the image replacement operation.

' Arrays to store image IDs and filenames Dim ImageIDs(100)
Dim Filenames(100)

' Use AspJpeg (www.aspjpeg.com) to resize images
Set Jpeg = Server.CreateObject("Persits.Jpeg")

Set Pdf = Server.CreateObject("Persits.Pdf")

Set OriginalDoc = PDF.OpenDocument( Server.MapPath("twoimages.pdf") )

' Extract all images, resize, build a list of names and image IDs
Set Page = OriginalDoc.Pages(1)
Set Preview = Page.ToImage

i = 1
Do While True
   Set Image = Preview.ExtractImage(i)
   If Image Is Nothing Then
      Exit Do
   Else
      Jpeg.OpenBinary( Image.SaveToMemory )
      Jpeg.PreserveAspectRatio = True
      Jpeg.Width = Jpeg.OriginalWidth / 2 ' resize by 50%

      ImageIDs(i-1) = Image.Log ' Log property returns Image ID
      Filenames(i-1) = Jpeg.SaveUnique( Server.MapPath("extracted_image.jpg") )
   End If
   i = i + 1
Loop

' Now perform image replacement
Set NewDoc = PDF.CreateDocument

For n = 0 To i - 2
   Set ExtractedImage = NewDoc.OpenImage( Server.MapPath( Filenames(n) ) )
   NewDoc.AddImageReplacement ImageIDs(n), ExtractedImage
Next

NewDoc.AppendDocument OriginalDoc

Filename = NewDoc.Save( Server.MapPath("imagereplacement.pdf"), False )

// Arrays to store image IDs and filenames
string [] ImageIDs = new string[100];
string [] Filenames = new string[100];

// Use AspJpeg (www.aspjpeg.com) to resize images
ASPJpeg objJpeg = new ASPJpeg();

IPdfManager objPDF = new PdfManager();

IPdfDocument objOriginalDoc = objPDF.OpenDocument( Server.MapPath("twoimages.pdf") );

// Extract all images, resize, build a list of names and image IDs
IPdfPage objPage = objOriginalDoc.Pages[1];
IPdfPreview objPreview = objPage.ToImage();
IPdfPreview objImage;

int i = 1;
while( ( objImage = objPreview.ExtractImage(i++) ) != null )
{
 objJpeg.OpenBinary( objImage.SaveToMemory() );
 objJpeg.PreserveAspectRatio = 1;
 objJpeg.Width = objJpeg.OriginalWidth / 2; // resize by 50%

 ImageIDs[i-2] = objImage.Log; // Log property returns Image ID
 Filenames[i-2] = objJpeg.SaveUnique( Server.MapPath("extracted_image.jpg") );
}

// Now perform image replacement
IPdfDocument objNewDoc = objPDF.CreateDocument( Missing.Value );

for( int n = 0; n < i - 2; n++ )
{
 IPdfImage objExtractedImage = objNewDoc.OpenImage(Server.MapPath(Filenames[n]));
 objNewDoc.AddImageReplacement( ImageIDs[n], objExtractedImage );
}

objNewDoc.AppendDocument( objOriginalDoc );

string strFilename = objNewDoc.Save(Server.MapPath("imagereplacement.pdf"), false);

Click on the links below to run this code sample:

http://localhost/asppdf/manual_17/17_imagereplacement.asp http://localhost/asppdf/manual_17/17_imagereplacement.aspx

This code sample reduces the size of the PDF document twoimages.pdf (included in the installation) from 117 KB to 30 KB.

17.8.3 Obtaining Image Information

In addition to the AddImageReplacement method, AspPDF 3.6 also offers a way to obtain a list of images in a PDF document, including information about each image's pixel size, displacement, scaling, rotation, image ID and coordinate transformation matrix, without actually performing image extraction. This information is obtained via the PdfPreview object's ImageItems property which returns a collection of PdfRect objects, each representing an image within the PDF document. This collection is similar to that returned by the TextItems property described above.

To populate the ImageItems collection, the PdfPage.ToImage method must be called with the parameter ImageInfo set to True.

The width and height of the image are returned by the PdfRect properties Right and Top, respectively. The image ID is returned by the PdfRect property Text.

The displacement, scaling, rotation and coordinate transformation matrix values are encapsulated in a PdfParam object returned by the PdfRect property ImageInfo (introduced in Version 3.6). For example, the following code snippet displays the object ID, width, height and scaling factors for all images of a document:

Set PDF = Server.CreateObject("Persits.PDF")
Set Doc = PDF.OpenDocument("c:\path\doc.pdf")

For Each Page In Doc.Pages
 Set Preview = Page.ToImage("imageinfo=true")
 For Each Rect In Preview.ImageItems
 Response.Write "Object ID=" & Rect.Text & " "
 Response.Write "Width=" & Rect.Right & " "
 Response.Write "Height=" & Rect.Top & " "
 Response.Write "ScaleX=" & Rect.ImageInfo("ScaleX") & " "
 Response.Write "ScaleY=" & Rect.ImageInfo("ScaleY") & " "

 Response.Write ""
 Next
Next

The full list of values encapsulated in the PdfParam object returned by the Rect.ImageInfo property is as follows:

Property

Meaning

shiftX

Horizontal displacement

shiftY

Vertical displacement

rotation

Rotation in degrees

scaleX

Horizontal scaling

scaleY

Vertical scaling

hasMask

1 if the image has transparency, 0 otherwise

a, b, c, d, e, f

The 6 components of the coordinate transformation matrix

Access to this information can be quite useful when an image needs to be replaced with a graphics, as discussed below.

17.8.4 Replacing Images with Graphics

As shown in the Section 17.8.2 code sample, replacing images with other images with the same aspect ratio is very straightforward. However, if the aspect ratio of the replacement image is different from that of the original image, a vertical or horizontal distortion will occur as the new image will be stretched to fill the area occupied by the old image. To avoid distortions, an image can be replaced with a graphics as opposed to another image. The graphics may have arbitrary content, including other images, drawings, text, etc.

The following code sample replaces all images in a PDF document with a red image with the caption "Image Removed" on a yellow background filling the entire area of the original image. The red image is not stretched in any way and centered inside the yellow area.

Set PDF = Server.CreateObject("Persits.PDF")
Set OriginalDoc = PDF.OpenDocument(Server.MapPath("twoimages.pdf"))

' Replace images with graphics
Set NewDoc = PDF.CreateDocument
Set Image = NewDoc.OpenImage(Server.MapPath("17_imageremoved.png"))
For Each page In OriginalDoc.Pages
   Set preview = page.ToImage("imageinfo=true")

   For Each rect In preview.ImageItems
      ' Create a graphics which will host the image on the new document.
      Width = rect.Right * rect.ImageInfo("ScaleX")
      Height = rect.Top * rect.ImageInfo("ScaleY")
      Set param = PDF.CreateParam
      param("left").Value = 0
      param("bottom").Value = 0
      param("right").Value = Width
      param("top").Value = Height

      ' Neutralize image coordinate transformation matrix
      param("a").Value = 1 /Width
      param("b").Value = 0
      param("c").Value = 0
      param("d").Value = 1 / Height
      param("e").Value = 0
      param("f").Value = 0

      Set gr = newDoc.CreateGraphics(param)
      gr.Canvas.SetFillColor 1, 1, 0 ' yellow
      gr.Canvas.FillRect 0, 0, Width, Height
      gr.Canvas.SetColor 0, 0, 0 ' black
      gr.Canvas.LineWidth = 10 ' border
      gr.Canvas.DrawRect 0, 0, Width, Height

      Set param2 = PDF.CreateParam
      param2("x") = (Width - Image.Width) / 2
      param2("y") = (Height - Image.Height) / 2
      gr.Canvas.DrawImage Image, param2

      ' Replace image with graphics
      NewDoc.AddImageReplacement rect.Text, gr
   Next
Next

NewDoc.AppendDocument OriginalDoc
NewDoc.Save Server.MapPath("replacedimage_gr.pdf"), false

IPdfManager objPDF = new PdfManager();
IPdfDocument objOriginalDoc = objPDF.OpenDocument(Server.MapPath("twoimages.pdf"));

// Replace images with graphics
IPdfDocument objNewDoc = objPDF.CreateDocument();
IPdfImage objImage = objNewDoc.OpenImage(Server.MapPath("17_imageremoved.png"));

foreach( IPdfPage objPage in objOriginalDoc.Pages )
{
 IPdfPreview objPreview = objPage.ToImage("imageinfo=true");

 foreach( IPdfRect rect in objPreview.ImageItems )
 {
 // Create a graphics which will host the image on the new document.
 float fWidth = rect.Right * rect.ImageInfo["ScaleX"].Value;
 float fHeight = rect.Top * rect.ImageInfo["ScaleY"].Value;
 IPdfParam param = objPDF.CreateParam();
 param["left"].Value = 0;
 param["bottom"].Value = 0;
 param["right"].Value = fWidth;
 param["top"].Value = fHeight;

 // Neutralize image coordinate transformation matrix
 param["a"].Value = 1 / fWidth;
 param["b"].Value = 0;
 param["c"].Value = 0;
 param["d"].Value = 1 / fHeight;
 param["e"].Value = 0;
 param["f"].Value = 0;

 IPdfGraphics gr = objNewDoc.CreateGraphics(param);
 gr.Canvas.SetFillColor( 1, 1, 0 ); // yellow
 gr.Canvas.FillRect( 0, 0, fWidth, fHeight );
 gr.Canvas.SetColor( 0, 0, 0 ); // black
 gr.Canvas.LineWidth = 10; // border
 gr.Canvas.DrawRect( 0, 0, fWidth, fHeight );

 IPdfParam param2 = objPDF.CreateParam();
 param2["x"].Value = (fWidth - objImage.Width) / 2;
 param2["y"].Value = (fHeight - objImage.Height) / 2;
 gr.Canvas.DrawImage( objImage, param2 );

 // Replace image with graphics
 objNewDoc.AddImageReplacement( rect.Text, gr );
 }
}

objNewDoc.AppendDocument( objOriginalDoc );
string strFilename = objNewDoc.Save(Server.MapPath("replacedimage_gr.pdf"), false);

lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>";

Click on the links below to run this code sample:

http://localhost/asppdf/manual_17/17_image2graphics.asp http://localhost/asppdf/manual_17/17_image2graphics.aspx

Chapter 17: PDF to Image Conversion

Contents

17.1 Page.ToImage Method

17.2 CMYK-to-RGB Conversion

17.3 Error Log

17.4 Other ToImage Parameters

17.5 Image Extraction

17.6 Printing

17.7 Structured Text Extraction

17.8 Image Replacement