Commit 27da8253 authored by csillag's avatar csillag

Simply PDF text extraction

Removing some obsolete conditional code,
which was necessary for some obsolete version
of PDF.js.

Fortunately, by now, all supported versions
of PDF.js uses the same data format, so this
workaround can go.
parent f42c3656
...@@ -152,12 +152,8 @@ class window.PDFTextMapper extends PageTextMapperCore ...@@ -152,12 +152,8 @@ class window.PDFTextMapper extends PageTextMapperCore
# Wait for the data to be extracted # Wait for the data to be extracted
page.getTextContent().then (data) => page.getTextContent().then (data) =>
# There is some variation about what I might find here,
# depending on PDF.js version, so we need to do some guesswork.
textData = data.bidiTexts ? data.items ? data
# First, join all the pieces from the bidiTexts # First, join all the pieces from the bidiTexts
rawContent = (text.str for text in textData).join " " rawContent = (text.str for text in data.items).join " "
# Do some post-processing # Do some post-processing
content = @_parseExtractedText rawContent content = @_parseExtractedText rawContent
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment