com.itextpdf.text.pdf.parser
Class SimpleTextExtractionStrategy

java.lang.Object
  extended by com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy
All Implemented Interfaces:
RenderListener, TextExtractionStrategy

public class SimpleTextExtractionStrategy
extends Object
implements TextExtractionStrategy

A simple text extraction renderer. This renderer keeps track of the current Y position of each string. If it detects that the y position has changed, it inserts a line break into the output. If the PDF renders text in a non-top-to-bottom fashion, this will result in the text not being a true representation of how it appears in the PDF. This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.

Since:
2.1.5

Constructor Summary
SimpleTextExtractionStrategy()
          Creates a new text extraction renderer.
 
Method Summary
 void beginTextBlock()
          Called when a new text block is beginning (i.e.
 void endTextBlock()
          Called when a text block has ended (i.e.
 String getResultantText()
          Returns the result so far.
 void renderImage(ImageRenderInfo renderInfo)
          no-op method - this renderer isn't interested in image events
 void renderText(TextRenderInfo renderInfo)
          Captures text using a simplified algorithm for inserting hard returns and spaces
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleTextExtractionStrategy

public SimpleTextExtractionStrategy()
Creates a new text extraction renderer.

Method Detail

beginTextBlock

public void beginTextBlock()
Description copied from interface: RenderListener
Called when a new text block is beginning (i.e. BT)

Specified by:
beginTextBlock in interface RenderListener
Since:
5.0.1

endTextBlock

public void endTextBlock()
Description copied from interface: RenderListener
Called when a text block has ended (i.e. ET)

Specified by:
endTextBlock in interface RenderListener
Since:
5.0.1

getResultantText

public String getResultantText()
Returns the result so far.

Specified by:
getResultantText in interface TextExtractionStrategy
Returns:
a String with the resulting text.

renderText

public void renderText(TextRenderInfo renderInfo)
Captures text using a simplified algorithm for inserting hard returns and spaces

Specified by:
renderText in interface RenderListener
Parameters:
renderInfo - render info

renderImage

public void renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events

Specified by:
renderImage in interface RenderListener
Parameters:
renderInfo - information specifying what to render
Since:
5.0.1
See Also:
RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)