Class OfficeReader
- java.lang.Object
-
- writer2latex.office.OfficeReader
-
public class OfficeReader extends java.lang.ObjectThis class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.
-
-
Constructor Summary
Constructors Constructor Description OfficeReader(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft)Constructor; read a document
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddFigureSequenceName(java.lang.String sName)Add a sequence name for figure captions.voidaddTableSequenceName(java.lang.String sName)Add a sequence name for table captions.booleanbookmarkInHeading(java.lang.String sName)Is this bookmark contained in a heading?booleanbookmarkInList(java.lang.String sName)Is this bookmark contained in a list?java.lang.StringfixRelativeLink(java.lang.String sLink)In OpenDocument package format ../ means "leave the package".intgetBookmarkHeadingLevel(java.lang.String sName)Get the level of the heading associated with this bookmarkintgetBookmarkListLevel(java.lang.String sName)Get the list level associated with a bookmark in a listjava.lang.StringgetBookmarkListStyle(java.lang.String sName)Get the list style name associated with a bookmark in a listStyleWithPropertiesgetCellStyle(java.lang.String sName)OfficeStyleFamilygetCellStyles()static intgetCharacterCount(org.w3c.dom.Node node)Counts the number of characters (text nodes) in this element excluding footnotes etc.StyleWithPropertiesgetColumnStyle(java.lang.String sName)OfficeStyleFamilygetColumnStyles()org.w3c.dom.ElementgetContent()Get the content elementStyleWithPropertiesgetDefaultCellStyle()StyleWithPropertiesgetDefaultDrawingPageStyle()StyleWithPropertiesgetDefaultFrameStyle()StyleWithPropertiesgetDefaultParStyle()StyleWithPropertiesgetDefaultPresentationStyle()StyleWithPropertiesgetDrawingPageStyle(java.lang.String sName)OfficeStyleFamilygetDrawingPageStyles()EmbeddedObjectgetEmbeddedObject(java.lang.String sName)Get an embedded object in this office documentPropertySetgetEndnotesConfiguration()org.w3c.dom.ElementgetFirstImage()Get the very first image in this document, if anyMasterPagegetFirstMasterPage()Returns the first master page used in the document.FontDeclarationgetFontDeclaration(java.lang.String sName)Get a specific font declarationOfficeStyleFamilygetFontDeclarations()Get the collection of all font declarations.PropertySetgetFootnotesConfiguration()FormsReadergetForms()Get the forms belonging to this document.StyleWithPropertiesgetFrameStyle(java.lang.String sName)OfficeStyleFamilygetFrameStyles()StyleWithPropertiesgetHeadingStyle(int nLevel)Returns the paragraph style associated with headings of a specific level.ListStylegetListStyle(java.lang.String sName)OfficeStyleFamilygetListStyles()java.lang.StringgetMajorityLanguage()Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than styleMasterPagegetMasterPage(java.lang.String sName)OfficeStyleFamilygetMasterPages()static chargetNextChar(org.w3c.dom.Node node)Return the next character in logical orderListStylegetOutlineStyle()PageLayoutgetPageLayout(java.lang.String sName)OfficeStyleFamilygetPageLayouts()static org.w3c.dom.ElementgetParagraph(org.w3c.dom.Element node)Get the paragraph or heading containing a nodeStyleWithPropertiesgetParStyle(java.lang.String sName)OfficeStyleFamilygetParStyles()StyleWithPropertiesgetPresentationStyle(java.lang.String sName)OfficeStyleFamilygetPresentationStyles()StyleWithPropertiesgetRowStyle(java.lang.String sName)OfficeStyleFamilygetRowStyles()StyleWithPropertiesgetSectionStyle(java.lang.String sName)OfficeStyleFamilygetSectionStyles()java.lang.StringgetSequenceFromRef(java.lang.String sRefName)Get the sequence name associated with a reference namejava.lang.StringgetSequenceName(org.w3c.dom.Element par)Get the sequence name associated with a paragraphTableReadergetTableReader(org.w3c.dom.Element node)Read a table from a table:table nodeStyleWithPropertiesgetTableStyle(java.lang.String sName)OfficeStyleFamilygetTableStyles()static java.lang.StringgetTextContent(org.w3c.dom.Node node)StyleWithPropertiesgetTextStyle(java.lang.String sName)OfficeStyleFamilygetTextStyles()TocReadergetTocReader(org.w3c.dom.Element onode)Returns a reader for a specific tocbooleanhasBookmarkRefTo(java.lang.String sName)Is there a reference to this bookmark?booleanhasEndnoteRefTo(java.lang.String sId)Is there a reference to this endnote?booleanhasFootnoteRefTo(java.lang.String sId)Is there a reference to this footnote id?booleanhasLinkTo(java.lang.String sName)Is there a link to this sequence anchor name?booleanhasNoteRefTo(java.lang.String sId)Is there a reference to this note id?booleanhasReferenceRefTo(java.lang.String sName)Is there a reference to this reference mark?booleanhasSequenceRefTo(java.lang.String sId)Is there a reference to this sequence field?static booleanisDrawElement(org.w3c.dom.Node node)Checks, if a node is an element in the draw namespacebooleanisFigureSequenceName(java.lang.String sName)Does this sequence name belong to a lof?booleanisIndexSourceStyle(java.lang.String sStyleName)Is this style used in some toc as an index source style?booleanisInPackage(java.lang.String sUrl)Checks whether this url is internal to the packagestatic booleanisNoteElement(org.w3c.dom.Node node)Checks, if a node is an element representing a note (footnote/endnote)static booleanisNoTextPar(org.w3c.dom.Node node)Checks, if the only text content of this node is whitespace.booleanisOpenDocument()Is this an OASIS OpenDocument or an OOo 1.0 document?booleanisPackageFormat()Checks whether or not this document is in package formatbooleanisPresentation()Is this a presentation document?static booleanisSingleParagraph(org.w3c.dom.Node node)Checks, if this node contains at most one element, and that this is a paragraph.booleanisSpreadsheet()Is this a spreadsheet document?static booleanisTableElement(org.w3c.dom.Node node)Checks, if a node is an element in the table namespacebooleanisTableSequenceName(java.lang.String sName)Does this sequence name belong to a lot?booleanisText()Is this an text document?static booleanisTextElement(org.w3c.dom.Node node)Checks, if a node is an element in the text namespacestatic booleanisWhitespace(java.lang.String s)Checks, if this text is whitespacestatic booleanisWhitespaceContent(org.w3c.dom.Node node)Checks, if the only text content of this node is whitespacebooleanreferenceMarkInHeading(java.lang.String sName)Is this reference mark contained in a heading?
-
-
-
Constructor Detail
-
OfficeReader
public OfficeReader(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft)
Constructor; read a document
-
-
Method Detail
-
isTextElement
public static boolean isTextElement(org.w3c.dom.Node node)
Checks, if a node is an element in the text namespace- Parameters:
node- the node to check- Returns:
- true if this is a text element
-
isTableElement
public static boolean isTableElement(org.w3c.dom.Node node)
Checks, if a node is an element in the table namespace- Parameters:
node- the node to check- Returns:
- true if this is a table element
-
isDrawElement
public static boolean isDrawElement(org.w3c.dom.Node node)
Checks, if a node is an element in the draw namespace- Parameters:
node- the node to check- Returns:
- true if this is a draw element
-
isNoteElement
public static boolean isNoteElement(org.w3c.dom.Node node)
Checks, if a node is an element representing a note (footnote/endnote)- Parameters:
node- the node to check- Returns:
- true if this is a note element
-
getParagraph
public static org.w3c.dom.Element getParagraph(org.w3c.dom.Element node)
Get the paragraph or heading containing a node- Parameters:
node- the node in question- Returns:
- the paragraph or heading
-
isSingleParagraph
public static boolean isSingleParagraph(org.w3c.dom.Node node)
Checks, if this node contains at most one element, and that this is a paragraph.- Parameters:
node- the node to check- Returns:
- true if the node contains a single paragraph or nothing
-
isNoTextPar
public static boolean isNoTextPar(org.w3c.dom.Node node)
Checks, if the only text content of this node is whitespace. Other (draw) content is allowed.- Parameters:
node- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespaceContent
public static boolean isWhitespaceContent(org.w3c.dom.Node node)
Checks, if the only text content of this node is whitespace
- Parameters:
node- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespace
public static boolean isWhitespace(java.lang.String s)
Checks, if this text is whitespace
- Parameters:
s- the String to check- Returns:
- true if the String contains whitespace only
-
getCharacterCount
public static int getCharacterCount(org.w3c.dom.Node node)
Counts the number of characters (text nodes) in this element excluding footnotes etc.- Parameters:
node- the node to count in- Returns:
- the number of characters
-
getTextContent
public static java.lang.String getTextContent(org.w3c.dom.Node node)
-
getNextChar
public static char getNextChar(org.w3c.dom.Node node)
Return the next character in logical order
-
isPackageFormat
public boolean isPackageFormat()
Checks whether or not this document is in package format- Returns:
- true if it's in package format
-
isInPackage
public boolean isInPackage(java.lang.String sUrl)
Checks whether this url is internal to the package- Parameters:
sUrl- the url to check- Returns:
- true if the url is internal to the package
-
fixRelativeLink
public java.lang.String fixRelativeLink(java.lang.String sLink)
In OpenDocument package format ../ means "leave the package". Consequently this prefix must be removed to obtain a valid link- Parameters:
sLink-- Returns:
- the corrected link
-
getEmbeddedObject
public EmbeddedObject getEmbeddedObject(java.lang.String sName)
Get an embedded object in this office document
-
getFontDeclarations
public OfficeStyleFamily getFontDeclarations()
Get the collection of all font declarations.
- Returns:
- the
OfficeStyleFamilyof font declarations
-
getFontDeclaration
public FontDeclaration getFontDeclaration(java.lang.String sName)
Get a specific font declaration
- Parameters:
sName- the name of the font declaration- Returns:
- a
FontDeclarationrepresenting the font
-
getTextStyles
public OfficeStyleFamily getTextStyles()
-
getTextStyle
public StyleWithProperties getTextStyle(java.lang.String sName)
-
getParStyles
public OfficeStyleFamily getParStyles()
-
getParStyle
public StyleWithProperties getParStyle(java.lang.String sName)
-
getDefaultParStyle
public StyleWithProperties getDefaultParStyle()
-
getSectionStyles
public OfficeStyleFamily getSectionStyles()
-
getSectionStyle
public StyleWithProperties getSectionStyle(java.lang.String sName)
-
getTableStyles
public OfficeStyleFamily getTableStyles()
-
getTableStyle
public StyleWithProperties getTableStyle(java.lang.String sName)
-
getColumnStyles
public OfficeStyleFamily getColumnStyles()
-
getColumnStyle
public StyleWithProperties getColumnStyle(java.lang.String sName)
-
getRowStyles
public OfficeStyleFamily getRowStyles()
-
getRowStyle
public StyleWithProperties getRowStyle(java.lang.String sName)
-
getCellStyles
public OfficeStyleFamily getCellStyles()
-
getCellStyle
public StyleWithProperties getCellStyle(java.lang.String sName)
-
getDefaultCellStyle
public StyleWithProperties getDefaultCellStyle()
-
getFrameStyles
public OfficeStyleFamily getFrameStyles()
-
getFrameStyle
public StyleWithProperties getFrameStyle(java.lang.String sName)
-
getDefaultFrameStyle
public StyleWithProperties getDefaultFrameStyle()
-
getPresentationStyles
public OfficeStyleFamily getPresentationStyles()
-
getPresentationStyle
public StyleWithProperties getPresentationStyle(java.lang.String sName)
-
getDefaultPresentationStyle
public StyleWithProperties getDefaultPresentationStyle()
-
getDrawingPageStyles
public OfficeStyleFamily getDrawingPageStyles()
-
getDrawingPageStyle
public StyleWithProperties getDrawingPageStyle(java.lang.String sName)
-
getDefaultDrawingPageStyle
public StyleWithProperties getDefaultDrawingPageStyle()
-
getListStyles
public OfficeStyleFamily getListStyles()
-
getListStyle
public ListStyle getListStyle(java.lang.String sName)
-
getPageLayouts
public OfficeStyleFamily getPageLayouts()
-
getPageLayout
public PageLayout getPageLayout(java.lang.String sName)
-
getMasterPages
public OfficeStyleFamily getMasterPages()
-
getMasterPage
public MasterPage getMasterPage(java.lang.String sName)
-
getOutlineStyle
public ListStyle getOutlineStyle()
-
getFootnotesConfiguration
public PropertySet getFootnotesConfiguration()
-
getEndnotesConfiguration
public PropertySet getEndnotesConfiguration()
-
getHeadingStyle
public StyleWithProperties getHeadingStyle(int nLevel)
Returns the paragraph style associated with headings of a specific level. Returns
nullif no such style is known.In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.
- Parameters:
nLevel- the level of the heading- Returns:
- a
StyleWithPropertiesobject representing the style
-
getFirstMasterPage
public MasterPage getFirstMasterPage()
Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.
- Returns:
- a
MasterPageobject representing the master page
-
getMajorityLanguage
public java.lang.String getMajorityLanguage()
Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style- Returns:
- the iso language
-
getTocReader
public TocReader getTocReader(org.w3c.dom.Element onode)
Returns a reader for a specific toc
- Parameters:
onode- thetext:table-of-content-node- Returns:
- the reader, or null
-
isIndexSourceStyle
public boolean isIndexSourceStyle(java.lang.String sStyleName)
Is this style used in some toc as an index source style?
- Parameters:
sStyleName- the name of the style- Returns:
- true if this is an index source style
-
isFigureSequenceName
public boolean isFigureSequenceName(java.lang.String sName)
Does this sequence name belong to a lof?
- Parameters:
sName- the name of the sequence- Returns:
- true if it belongs to an index
-
isTableSequenceName
public boolean isTableSequenceName(java.lang.String sName)
Does this sequence name belong to a lot?
- Parameters:
sName- the name of the sequence- Returns:
- true if it belongs to an index
-
addTableSequenceName
public void addTableSequenceName(java.lang.String sName)
Add a sequence name for table captions.
OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.
- Parameters:
sName- the name to add
-
addFigureSequenceName
public void addFigureSequenceName(java.lang.String sName)
Add a sequence name for figure captions.
OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.
- Parameters:
sName- the name to add
-
getSequenceName
public java.lang.String getSequenceName(org.w3c.dom.Element par)
Get the sequence name associated with a paragraph
- Parameters:
par- the paragraph to look up- Returns:
- the sequence name or null
-
getSequenceFromRef
public java.lang.String getSequenceFromRef(java.lang.String sRefName)
Get the sequence name associated with a reference name
- Parameters:
sRefName- the reference name to use- Returns:
- the sequence name or null
-
hasNoteRefTo
public boolean hasNoteRefTo(java.lang.String sId)
Is there a reference to this note id?
- Parameters:
sId- the id of the note- Returns:
- true if there is a reference
-
hasFootnoteRefTo
public boolean hasFootnoteRefTo(java.lang.String sId)
Is there a reference to this footnote id?
- Parameters:
sId- the id of the footnote- Returns:
- true if there is a reference
-
hasEndnoteRefTo
public boolean hasEndnoteRefTo(java.lang.String sId)
Is there a reference to this endnote?
- Parameters:
sId- the id of the endnote- Returns:
- true if there is a reference
-
referenceMarkInHeading
public boolean referenceMarkInHeading(java.lang.String sName)
Is this reference mark contained in a heading?- Parameters:
sName- the name of the reference mark- Returns:
- true if so
-
hasReferenceRefTo
public boolean hasReferenceRefTo(java.lang.String sName)
Is there a reference to this reference mark?- Parameters:
sName- the name of the reference mark- Returns:
- true if there is a reference
-
bookmarkInHeading
public boolean bookmarkInHeading(java.lang.String sName)
Is this bookmark contained in a heading?- Parameters:
sName- the name of the bookmark- Returns:
- true if so
-
getBookmarkHeadingLevel
public int getBookmarkHeadingLevel(java.lang.String sName)
Get the level of the heading associated with this bookmark- Parameters:
sName- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
bookmarkInList
public boolean bookmarkInList(java.lang.String sName)
Is this bookmark contained in a list?- Parameters:
sName- the name of the bookmark- Returns:
- true if so
-
getBookmarkListStyle
public java.lang.String getBookmarkListStyle(java.lang.String sName)
Get the list style name associated with a bookmark in a list- Parameters:
sName- the name of the bookmark- Returns:
- the list style name or null if the bookmark does not exist or the list does not have a style name
-
getBookmarkListLevel
public int getBookmarkListLevel(java.lang.String sName)
Get the list level associated with a bookmark in a list- Parameters:
sName- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
hasBookmarkRefTo
public boolean hasBookmarkRefTo(java.lang.String sName)
Is there a reference to this bookmark?
- Parameters:
sName- the name of the bookmark- Returns:
- true if there is a reference
-
hasSequenceRefTo
public boolean hasSequenceRefTo(java.lang.String sId)
Is there a reference to this sequence field?
- Parameters:
sId- the id of the sequence field- Returns:
- true if there is a reference
-
hasLinkTo
public boolean hasLinkTo(java.lang.String sName)
Is there a link to this sequence anchor name?
- Parameters:
sName- the name of the anchor- Returns:
- true if there is a link
-
isOpenDocument
public boolean isOpenDocument()
Is this an OASIS OpenDocument or an OOo 1.0 document?
- Returns:
- true if it's an OASIS OpenDocument
-
isText
public boolean isText()
Is this an text document?
- Returns:
- true if it's a text document
-
isSpreadsheet
public boolean isSpreadsheet()
Is this a spreadsheet document?
- Returns:
- true if it's a spreadsheet document
-
isPresentation
public boolean isPresentation()
Is this a presentation document?
- Returns:
- true if it's a presentation document
-
getContent
public org.w3c.dom.Element getContent()
Get the content element
In the old file format this means the
office:bodyelementIn the OpenDocument format this means a
office:text,office:spreadsheetoroffice:presentationelement.- Returns:
- the content
Element
-
getForms
public FormsReader getForms()
Get the forms belonging to this document.
- Returns:
- a
FormsReaderrepresenting the forms
-
getTableReader
public TableReader getTableReader(org.w3c.dom.Element node)
Read a table from a table:table node
- Parameters:
node- the table:table Element node- Returns:
- a
TableReaderobject representing the table
-
getFirstImage
public org.w3c.dom.Element getFirstImage()
Get the very first image in this document, if any- Returns:
- the first image, or null if no images exists
-
-