org.apache.pdfbox.pdmodel
Class PDDocument

java.lang.Object
  extended by org.apache.pdfbox.pdmodel.PDDocument
All Implemented Interfaces:
java.awt.print.Pageable

public class PDDocument
extends java.lang.Object
implements java.awt.print.Pageable

This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!

This class implements the Pageable interface, but since PDFBox version 1.3.0 you should be using the PDPageable adapter instead (see PDFBOX-788).

Version:
$Revision: 1.47 $
Author:
Ben Litchfield

Field Summary
 
Fields inherited from interface java.awt.print.Pageable
UNKNOWN_NUMBER_OF_PAGES
 
Constructor Summary
PDDocument()
          Constructor, creates a new PDF Document with no pages.
PDDocument(COSDocument doc)
          Constructor that uses an existing document.
 
Method Summary
 void addPage(PDPage page)
          This will add a page to the document.
 void addSignature(PDSignature sigObject, SignatureInterface signatureInterface)
           
 void addSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options)
          This will add a signature to the document.
 void clearWillEncryptWhenSaving()
          Deprecated. Do not rely on this method anymore. It is the responsability of COSWriter to hold this state.
 void close()
          This will close the underlying COSDocument object.
 void decrypt(java.lang.String password)
          This will decrypt a document.
 void encrypt(java.lang.String ownerPassword, java.lang.String userPassword)
          This will mark a document to be encrypted.
 AccessPermission getCurrentAccessPermission()
          Returns the access permissions granted when the document was decrypted.
 COSDocument getDocument()
          This will get the low level document.
 PDDocumentCatalog getDocumentCatalog()
          This will get the document CATALOG.
 PDDocumentInformation getDocumentInformation()
          This will get the document info dictionary.
 PDEncryptionDictionary getEncryptionDictionary()
          This will get the encryption dictionary for this document.
 int getNumberOfPages()
          
 java.lang.String getOwnerPasswordForEncryption()
          Deprecated. Do not rely on this method anymore.
 int getPageCount()
          Deprecated. Use the getNumberOfPages method instead!
 java.awt.print.PageFormat getPageFormat(int pageIndex)
          Deprecated. Use the PDPageable adapter class
 java.util.Map<java.lang.String,java.lang.Integer> getPageMap()
          This will return the Map containing the mapping from object-ids to pagenumbers.
 java.awt.print.Printable getPrintable(int pageIndex)
          
 SecurityHandler getSecurityHandler()
          Get the security handler that is used for document encryption.
 PDSignature getSignatureDictionary()
           
 java.lang.String getUserPasswordForEncryption()
          Deprecated. Do not rely on this method anymore.
 PDPage importPage(PDPage page)
          This will import and copy the contents from another location.
 boolean isAllSecurityToBeRemoved()
           
 boolean isEncrypted()
          This will tell if this document is encrypted or not.
 boolean isOwnerPassword(java.lang.String password)
          Deprecated.  
 boolean isUserPassword(java.lang.String password)
          Deprecated.  
static PDDocument load(java.io.File file)
          This will load a document from a file.
static PDDocument load(java.io.File file, RandomAccess scratchFile)
          This will load a document from a file.
static PDDocument load(java.io.InputStream input)
          This will load a document from an input stream.
static PDDocument load(java.io.InputStream input, boolean force)
          This will load a document from an input stream.
static PDDocument load(java.io.InputStream input, RandomAccess scratchFile)
          This will load a document from an input stream.
static PDDocument load(java.io.InputStream input, RandomAccess scratchFile, boolean force)
          This will load a document from an input stream.
static PDDocument load(java.lang.String filename)
          This will load a document from a file.
static PDDocument load(java.lang.String filename, boolean force)
          This will load a document from a file.
static PDDocument load(java.lang.String filename, RandomAccess scratchFile)
          This will load a document from a file.
static PDDocument load(java.net.URL url)
          This will load a document from a url.
static PDDocument load(java.net.URL url, boolean force)
          This will load a document from a url.
static PDDocument load(java.net.URL url, RandomAccess scratchFile)
          This will load a document from a url.
 void openProtection(DecryptionMaterial pm)
          Tries to decrypt the document in memory using the provided decryption material.
 void print()
          This will send the PDF document to a printer.
 void print(java.awt.print.PrinterJob printJob)
           
 void protect(ProtectionPolicy pp)
          Protects the document with the protection policy pp.
 boolean removePage(int pageNumber)
          Remove the page from the document.
 boolean removePage(PDPage page)
          Remove the page from the document.
 void save(java.io.OutputStream output)
          This will save the document to an output stream.
 void save(java.lang.String fileName)
          This will save this document to the filesystem.
 void saveIncremental(java.io.FileInputStream input, java.io.OutputStream output)
           
 void saveIncremental(java.lang.String fileName)
           
 void setAllSecurityToBeRemoved(boolean allSecurityToBeRemoved)
           
 void setDocumentInformation(PDDocumentInformation info)
          This will set the document information for this document.
 void setEncryptionDictionary(PDEncryptionDictionary encDictionary)
          This will set the encryption dictionary for this document.
 void silentPrint()
          This will send the PDF to the default printer without prompting the user for any printer settings.
 void silentPrint(java.awt.print.PrinterJob printJob)
          This will send the PDF to the default printer without prompting the user for any printer settings.
 boolean wasDecryptedWithOwnerPassword()
          Deprecated. use getCurrentAccessPermission instead
 boolean willEncryptWhenSaving()
          Deprecated. Do not rely on this method anymore. It is the responsibility of COSWriter to hold this state
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDDocument

public PDDocument()
           throws java.io.IOException
Constructor, creates a new PDF Document with no pages. You need to add at least one page for the document to be valid.

Throws:
java.io.IOException - If there is an error creating this document.

PDDocument

public PDDocument(COSDocument doc)
Constructor that uses an existing document. The COSDocument that is passed in must be valid.

Parameters:
doc - The COSDocument that this document wraps.
Method Detail

getPageMap

public final java.util.Map<java.lang.String,java.lang.Integer> getPageMap()
This will return the Map containing the mapping from object-ids to pagenumbers.

Returns:
the pageMap

addPage

public void addPage(PDPage page)
This will add a page to the document. This is a convenience method, that will add the page to the root of the hierarchy and set the parent of the page to the root.

Parameters:
page - The page to add to the document.

addSignature

public void addSignature(PDSignature sigObject,
                         SignatureInterface signatureInterface)
                  throws java.io.IOException,
                         SignatureException
Throws:
java.io.IOException
SignatureException

addSignature

public void addSignature(PDSignature sigObject,
                         SignatureInterface signatureInterface,
                         SignatureOptions options)
                  throws java.io.IOException,
                         SignatureException
This will add a signature to the document.

Parameters:
sigObject - is the PDSignature model
signatureInterface - is a interface which provides signing capabilities
options -
Throws:
java.io.IOException - if there is an error creating required fields
SignatureException

removePage

public boolean removePage(PDPage page)
Remove the page from the document.

Parameters:
page - The page to remove from the document.
Returns:
true if the page was found false otherwise.

removePage

public boolean removePage(int pageNumber)
Remove the page from the document.

Parameters:
pageNumber - 0 based index to page number.
Returns:
true if the page was found false otherwise.

importPage

public PDPage importPage(PDPage page)
                  throws java.io.IOException
This will import and copy the contents from another location. Currently the content stream is stored in a scratch file. The scratch file is associated with the document. If you are adding a page to this document from another document and want to copy the contents to this document's scratch file then use this method otherwise just use the addPage method.

Parameters:
page - The page to import.
Returns:
The page that was imported.
Throws:
java.io.IOException - If there is an error copying the page.

getDocument

public COSDocument getDocument()
This will get the low level document.

Returns:
The document that this layer sits on top of.

getDocumentInformation

public PDDocumentInformation getDocumentInformation()
This will get the document info dictionary. This is guaranteed to not return null.

Returns:
The documents /Info dictionary

setDocumentInformation

public void setDocumentInformation(PDDocumentInformation info)
This will set the document information for this document.

Parameters:
info - The updated document information.

getDocumentCatalog

public PDDocumentCatalog getDocumentCatalog()
This will get the document CATALOG. This is guaranteed to not return null.

Returns:
The documents /Root dictionary

isEncrypted

public boolean isEncrypted()
This will tell if this document is encrypted or not.

Returns:
true If this document is encrypted.

getEncryptionDictionary

public PDEncryptionDictionary getEncryptionDictionary()
                                               throws java.io.IOException
This will get the encryption dictionary for this document. This will still return the parameters if the document was decrypted. If the document was never encrypted then this will return null. As the encryption architecture in PDF documents is plugable this returns an abstract class, but the only supported subclass at this time is a PDStandardEncryption object.

Returns:
The encryption dictionary(most likely a PDStandardEncryption object)
Throws:
java.io.IOException - If there is an error determining which security handler to use.

setEncryptionDictionary

public void setEncryptionDictionary(PDEncryptionDictionary encDictionary)
                             throws java.io.IOException
This will set the encryption dictionary for this document.

Parameters:
encDictionary - The encryption dictionary(most likely a PDStandardEncryption object)
Throws:
java.io.IOException - If there is an error determining which security handler to use.

getSignatureDictionary

public PDSignature getSignatureDictionary()
                                   throws java.io.IOException
Throws:
java.io.IOException

isUserPassword

@Deprecated
public boolean isUserPassword(java.lang.String password)
                       throws java.io.IOException,
                              CryptographyException
Deprecated. 

This will determine if this is the user password. This only applies when the document is encrypted and uses standard encryption.

Parameters:
password - The plain text user password.
Returns:
true If the password passed in matches the user password used to encrypt the document.
Throws:
java.io.IOException - If there is an error determining if it is the user password.
CryptographyException - If there is an error in the encryption algorithms.

isOwnerPassword

@Deprecated
public boolean isOwnerPassword(java.lang.String password)
                        throws java.io.IOException,
                               CryptographyException
Deprecated. 

This will determine if this is the owner password. This only applies when the document is encrypted and uses standard encryption.

Parameters:
password - The plain text owner password.
Returns:
true If the password passed in matches the owner password used to encrypt the document.
Throws:
java.io.IOException - If there is an error determining if it is the user password.
CryptographyException - If there is an error in the encryption algorithms.

decrypt

public void decrypt(java.lang.String password)
             throws CryptographyException,
                    java.io.IOException,
                    InvalidPasswordException
This will decrypt a document. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.

Parameters:
password - Either the user or owner password.
Throws:
CryptographyException - If there is an error decrypting the document.
java.io.IOException - If there is an error getting the stream data.
InvalidPasswordException - If the password is not a user or owner password.

wasDecryptedWithOwnerPassword

@Deprecated
public boolean wasDecryptedWithOwnerPassword()
Deprecated. use getCurrentAccessPermission instead

This will tell if the document was decrypted with the master password. This entry is invalid if the PDF was not decrypted.

Returns:
true if the pdf was decrypted with the master password.

encrypt

public void encrypt(java.lang.String ownerPassword,
                    java.lang.String userPassword)
             throws CryptographyException,
                    java.io.IOException
This will mark a document to be encrypted. The actual encryption will occur when the document is saved. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.

Parameters:
ownerPassword - The owner password to encrypt the document.
userPassword - The user password to encrypt the document.
Throws:
CryptographyException - If an error occurs during encryption.
java.io.IOException - If there is an error accessing the data.

getOwnerPasswordForEncryption

@Deprecated
public java.lang.String getOwnerPasswordForEncryption()
Deprecated. Do not rely on this method anymore.

The owner password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.

Returns:
The owner password passed to the encrypt method.

getUserPasswordForEncryption

@Deprecated
public java.lang.String getUserPasswordForEncryption()
Deprecated. Do not rely on this method anymore.

The user password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.

Returns:
The user password passed to the encrypt method.

willEncryptWhenSaving

@Deprecated
public boolean willEncryptWhenSaving()
Deprecated. Do not rely on this method anymore. It is the responsibility of COSWriter to hold this state

Internal method do determine if the document will be encrypted when it is saved.

Returns:
True if encrypt has been called and the document has not been saved yet.

clearWillEncryptWhenSaving

@Deprecated
public void clearWillEncryptWhenSaving()
Deprecated. Do not rely on this method anymore. It is the responsability of COSWriter to hold this state.

This shoule only be called by the COSWriter after encryption has completed.


load

public static PDDocument load(java.net.URL url)
                       throws java.io.IOException
This will load a document from a url.

Parameters:
url - The url to load the PDF from.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.net.URL url,
                              boolean force)
                       throws java.io.IOException
This will load a document from a url. Used for skipping corrupt pdf objects

Parameters:
url - The url to load the PDF from.
force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.net.URL url,
                              RandomAccess scratchFile)
                       throws java.io.IOException
This will load a document from a url.

Parameters:
url - The url to load the PDF from.
scratchFile - A location to store temp PDFBox data for this document.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.lang.String filename)
                       throws java.io.IOException
This will load a document from a file.

Parameters:
filename - The name of the file to load.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.lang.String filename,
                              boolean force)
                       throws java.io.IOException
This will load a document from a file. Allows for skipping corrupt pdf objects

Parameters:
filename - The name of the file to load.
force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.lang.String filename,
                              RandomAccess scratchFile)
                       throws java.io.IOException
This will load a document from a file.

Parameters:
filename - The name of the file to load.
scratchFile - A location to store temp PDFBox data for this document.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.File file)
                       throws java.io.IOException
This will load a document from a file.

Parameters:
file - The name of the file to load.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.File file,
                              RandomAccess scratchFile)
                       throws java.io.IOException
This will load a document from a file.

Parameters:
file - The name of the file to load.
scratchFile - A location to store temp PDFBox data for this document.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.InputStream input)
                       throws java.io.IOException
This will load a document from an input stream.

Parameters:
input - The stream that contains the document.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.InputStream input,
                              boolean force)
                       throws java.io.IOException
This will load a document from an input stream. Allows for skipping corrupt pdf objects

Parameters:
input - The stream that contains the document.
force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.InputStream input,
                              RandomAccess scratchFile)
                       throws java.io.IOException
This will load a document from an input stream.

Parameters:
input - The stream that contains the document.
scratchFile - A location to store temp PDFBox data for this document.
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

load

public static PDDocument load(java.io.InputStream input,
                              RandomAccess scratchFile,
                              boolean force)
                       throws java.io.IOException
This will load a document from an input stream. Allows for skipping corrupt pdf objects

Parameters:
input - The stream that contains the document.
scratchFile - A location to store temp PDFBox data for this document.
force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Returns:
The document that was loaded.
Throws:
java.io.IOException - If there is an error reading from the stream.

save

public void save(java.lang.String fileName)
          throws java.io.IOException,
                 COSVisitorException
This will save this document to the filesystem.

Parameters:
fileName - The file to save as.
Throws:
java.io.IOException - If there is an error saving the document.
COSVisitorException - If an error occurs while generating the data.

save

public void save(java.io.OutputStream output)
          throws java.io.IOException,
                 COSVisitorException
This will save the document to an output stream.

Parameters:
output - The stream to write to.
Throws:
java.io.IOException - If there is an error writing the document.
COSVisitorException - If an error occurs while generating the data.

saveIncremental

public void saveIncremental(java.lang.String fileName)
                     throws java.io.IOException,
                            COSVisitorException
Throws:
java.io.IOException
COSVisitorException

saveIncremental

public void saveIncremental(java.io.FileInputStream input,
                            java.io.OutputStream output)
                     throws java.io.IOException,
                            COSVisitorException
Throws:
java.io.IOException
COSVisitorException

getPageCount

@Deprecated
public int getPageCount()
Deprecated. Use the getNumberOfPages method instead!

This will return the total page count of the PDF document. Note: This method is deprecated in favor of the getNumberOfPages method. The getNumberOfPages is a required interface method of the Pageable interface. This method will be removed in a future version of PDFBox!!

Returns:
The total number of pages in the PDF document.

getNumberOfPages

public int getNumberOfPages()

Specified by:
getNumberOfPages in interface java.awt.print.Pageable

getPageFormat

@Deprecated
public java.awt.print.PageFormat getPageFormat(int pageIndex)
Deprecated. Use the PDPageable adapter class

Returns the format of the page at the given index when using a default printer job returned by PrinterJob.getPrinterJob().

Specified by:
getPageFormat in interface java.awt.print.Pageable
Parameters:
i - page index, zero-based
Returns:
page format
Throws:
java.lang.IndexOutOfBoundsException - if the page index is invalid

getPrintable

public java.awt.print.Printable getPrintable(int pageIndex)

Specified by:
getPrintable in interface java.awt.print.Pageable

print

public void print(java.awt.print.PrinterJob printJob)
           throws java.awt.print.PrinterException
Parameters:
printJob - The printer job.
Throws:
java.awt.print.PrinterException - If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.
See Also:
print()

print

public void print()
           throws java.awt.print.PrinterException
This will send the PDF document to a printer. The printing functionality depends on the org.apache.pdfbox.pdfviewer.PageDrawer functionality. The PageDrawer is a work in progress and some PDFs will print correctly and some will not. This is a convenience method to create the java.awt.print.PrinterJob. The PDDocument implements the java.awt.print.Pageable interface and PDPage implementes the java.awt.print.Printable interface, so advanced printing capabilities can be done by using those interfaces instead of this method.

Throws:
java.awt.print.PrinterException - If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.

silentPrint

public void silentPrint()
                 throws java.awt.print.PrinterException
This will send the PDF to the default printer without prompting the user for any printer settings.

Throws:
java.awt.print.PrinterException - If there is an error while printing.
See Also:
print()

silentPrint

public void silentPrint(java.awt.print.PrinterJob printJob)
                 throws java.awt.print.PrinterException
This will send the PDF to the default printer without prompting the user for any printer settings.

Parameters:
printJob - A printer job definition.
Throws:
java.awt.print.PrinterException - If there is an error while printing.
See Also:
print()

close

public void close()
           throws java.io.IOException
This will close the underlying COSDocument object.

Throws:
java.io.IOException - If there is an error releasing resources.

protect

public void protect(ProtectionPolicy pp)
             throws BadSecurityHandlerException
Protects the document with the protection policy pp. The document content will be really encrypted when it will be saved. This method only marks the document for encryption.

Parameters:
pp - The protection policy.
Throws:
BadSecurityHandlerException - If there is an error during protection.
See Also:
StandardProtectionPolicy, PublicKeyProtectionPolicy

openProtection

public void openProtection(DecryptionMaterial pm)
                    throws BadSecurityHandlerException,
                           java.io.IOException,
                           CryptographyException
Tries to decrypt the document in memory using the provided decryption material.

Parameters:
pm - The decryption material (password or certificate).
Throws:
BadSecurityHandlerException - If there is an error during decryption.
java.io.IOException - If there is an error reading cryptographic information.
CryptographyException - If there is an error during decryption.
See Also:
StandardDecryptionMaterial, PublicKeyDecryptionMaterial

getCurrentAccessPermission

public AccessPermission getCurrentAccessPermission()
Returns the access permissions granted when the document was decrypted. If the document was not decrypted this method returns the access permission for a document owner (ie can do everything). The returned object is in read only mode so that permissions cannot be changed. Methods providing access to content should rely on this object to verify if the current user is allowed to proceed.

Returns:
the access permissions for the current user on the document.

getSecurityHandler

public SecurityHandler getSecurityHandler()
Get the security handler that is used for document encryption.

Returns:
The handler used to encrypt/decrypt the document.

isAllSecurityToBeRemoved

public boolean isAllSecurityToBeRemoved()

setAllSecurityToBeRemoved

public void setAllSecurityToBeRemoved(boolean allSecurityToBeRemoved)