All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.pdfbox.resources.PDFMarkedContentExtractor.properties Maven / Gradle / Ivy

Go to download

The Apache PDFBox library is an open source Java tool for working with PDF documents.

There is a newer version: 3.0.2
Show newest version
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This table is maps PDF stream operators to concrete OperatorProcessor
# subclasses that are used by the PDFStreamEngine class to interpret the
# PDF document. The classes configured here allow the PDFTextStripper
# subclass of PDFStreamEngine to extract text content of the document.

BT = org.apache.pdfbox.util.operator.BeginText
cm = org.apache.pdfbox.util.operator.Concatenate
Do = org.apache.pdfbox.util.operator.Invoke
ET = org.apache.pdfbox.util.operator.EndText
gs = org.apache.pdfbox.util.operator.SetGraphicsStateParameters
q  = org.apache.pdfbox.util.operator.GSave
Q  = org.apache.pdfbox.util.operator.GRestore
T* = org.apache.pdfbox.util.operator.NextLine
Tc = org.apache.pdfbox.util.operator.SetCharSpacing
Td = org.apache.pdfbox.util.operator.MoveText
TD = org.apache.pdfbox.util.operator.MoveTextSetLeading
Tf = org.apache.pdfbox.util.operator.SetTextFont
Tj = org.apache.pdfbox.util.operator.ShowText
TJ = org.apache.pdfbox.util.operator.ShowTextGlyph
TL = org.apache.pdfbox.util.operator.SetTextLeading
Tm = org.apache.pdfbox.util.operator.SetMatrix
Tr = org.apache.pdfbox.util.operator.SetTextRenderingMode
Ts = org.apache.pdfbox.util.operator.SetTextRise
Tw = org.apache.pdfbox.util.operator.SetWordSpacing
Tz = org.apache.pdfbox.util.operator.SetHorizontalTextScaling
w  = org.apache.pdfbox.util.operator.SetLineWidth
\' = org.apache.pdfbox.util.operator.MoveAndShow
\" = org.apache.pdfbox.util.operator.SetMoveAndShow

BDC = org.apache.pdfbox.util.operator.BeginMarkedContentSequenceWithProperties
BMC = org.apache.pdfbox.util.operator.BeginMarkedContentSequence
EMC = org.apache.pdfbox.util.operator.EndMarkedContentSequence

# The following operators are not relevant to text extraction,
# so we can silently ignore them.

b
B
b*
B*
BI
BX
c
CS
cs
d
d0
d1
DP
El
EX
f
F
f*
G
g
h
i
ID
j
J
K
k
l
m
M
MP
n
re
RG
rg
ri
s
S
SC
sc
SCN
scn
sh
v
W
W*
y




© 2015 - 2024 Weber Informatics LLC | Privacy Policy