|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pdfbox.util.PDFStreamEngine
org.apache.pdfbox.util.PDFTextStripper
org.apache.pdfbox.util.PDFTextStripperByArea
public class PDFTextStripperByArea
This will extract text from a specified region in the PDF.
Field Summary |
---|
Fields inherited from class org.apache.pdfbox.util.PDFTextStripper |
---|
charactersByArticle, document, output, outputEncoding, systemLineSeparator |
Constructor Summary | |
---|---|
PDFTextStripperByArea()
Constructor. |
|
PDFTextStripperByArea(java.util.Properties props)
Instantiate a new PDFTextStripperArea object. |
|
PDFTextStripperByArea(java.lang.String encoding)
Instantiate a new PDFTextStripperArea object. |
Method Summary | |
---|---|
void |
addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
Add a new region to group text by. |
void |
extractRegions(PDPage page)
Process the page to extract the region text. |
java.util.List<java.lang.String> |
getRegions()
Get the list of regions that have been setup. |
java.lang.String |
getTextForRegion(java.lang.String regionName)
Get the text for the region, this should be called after extractRegions(). |
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the text to the list of characters on a page. |
protected void |
writePage()
This will print the processed page text to the output stream. |
Methods inherited from class org.apache.pdfbox.util.PDFStreamEngine |
---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PDFTextStripperByArea() throws java.io.IOException
java.io.IOException
- If there is an error loading properties.public PDFTextStripperByArea(java.util.Properties props) throws java.io.IOException
props
- The properties containing the mapping of operators to
PDFOperator classes.
java.io.IOException
- If there is an error reading the properties.public PDFTextStripperByArea(java.lang.String encoding) throws java.io.IOException
encoding
- The encoding that the output will be written in.
java.io.IOException
- If there is an error reading the properties.Method Detail |
---|
public void addRegion(java.lang.String regionName, java.awt.geom.Rectangle2D rect)
regionName
- The name of the region.rect
- The rectangle area to retrieve the text from.public java.util.List<java.lang.String> getRegions()
public java.lang.String getTextForRegion(java.lang.String regionName)
regionName
- The name of the region to get the text from.
public void extractRegions(PDPage page) throws java.io.IOException
page
- The page to extract the regions from.
java.io.IOException
- If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition
in class PDFTextStripper
text
- The text to process.protected void writePage() throws java.io.IOException
writePage
in class PDFTextStripper
java.io.IOException
- If there is an error writing the text.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |