pdf.GetText

Name

pdf.GetText -- get text on page (V1.1)

Synopsis

t$ = pdf.GetText(id, page, idx, len)

Function

This function can be used to extract the text starting at the index specified by idx and spanning over len number of characters from a page. Note that character indices start at 0. If you pass -1 in len, pdf.GetText() will automatically extract all remaining characters after the specified index.

The page to use must be specified in the page argument. It must be a number in the range of 1 to the total number of pages in the document and the page must have been previously loaded using pdf.LoadPage() with the text argument set to True. The PDF document specified by id must have been previously opened using pdf.OpenDocument().

Inputs

id: identifier of the PDF document to use
page: page number to use (starting from 1)
idx: character index to use (starting from 0)
len: number of characters to use or -1 for all remaining characters

Results

t$: text that has been extracted

Show TOC