ZippedDocumentTextExtractor (Pincette API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

be.re.repo.mod
Class ZippedDocumentTextExtractor

java.lang.Object
  be.re.repo.mod.ZippedDocumentTextExtractor

public class ZippedDocumentTextExtractor
extends Object
extends Object

Implements a mechanism to extract text from zipped documents containing XML entities. Possible formats are ODF, ePub, Office Open XML, etc. The documents are processed in a streaming-oriented fashion.

Author:: Werner Donné

Nested Class Summary
`static interface`	`ZippedDocumentTextExtractor.FilterFactory`

Constructor Summary
`ZippedDocumentTextExtractor()`

Method Summary
`static Reader`	`create(InputStream in, ZippedDocumentTextExtractor.FilterFactory filterFactory, String[] entryPatterns)` Retrieves text from a document.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

ZippedDocumentTextExtractor

public ZippedDocumentTextExtractor()

Method Detail

create

public static Reader create(InputStream in,
                            ZippedDocumentTextExtractor.FilterFactory filterFactory,
                            String[] entryPatterns)
                     throws IOException

Retrieves text from a document.

Parameters:: in - the original document stream.; filterFactory - a factory to create a filter that is selective about which elements contribute to the text or that can transform the text. It may be null.; entryPatterns - the regular expressions that select the ZIP-entries based on their name. If the array is empty no entries will be selected at all.
Returns:: The extracted text stream.
Throws:: IOException