1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2<!--
3#**************************************************************
4#
5#  Licensed to the Apache Software Foundation (ASF) under one
6#  or more contributor license agreements.  See the NOTICE file
7#  distributed with this work for additional information
8#  regarding copyright ownership.  The ASF licenses this file
9#  to you under the Apache License, Version 2.0 (the
10#  "License"); you may not use this file except in compliance
11#  with the License.  You may obtain a copy of the License at
12#
13#    http://www.apache.org/licenses/LICENSE-2.0
14#
15#  Unless required by applicable law or agreed to in writing,
16#  software distributed under the License is distributed on an
17#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
18#  KIND, either express or implied.  See the License for the
19#  specific language governing permissions and limitations
20#  under the License.
21#
22#**************************************************************
23 -->
24<html>
25<head>
26<title>org.openoffice.xmerge.converter.xml.sxw.aportisdoc package</title>
27</head>
28
29<body bgcolor="white">
30
31<p>Provides the tools for doing the conversion of StarWriter XML to
32and from AportisDoc format.</p>
33
34<p>It follows the {@link org.openoffice.xmerge} framework for the conversion process.</p>
35
36<p>Since it converts to/from a Palm application format, these converters
37follow the <a href=../../../../converter/palm/package-summary.html#streamformat>
38<code>PalmDB</code> stream format</a> for writing out to the Palm sync client or
39reading in from the Palm sync client.</p>
40
41<p>Note that <code>PluginFactoryImpl</code> also provides a
42<code>DocumentMerger</code> object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}.
43This functionality was derived from its superclass
44{@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory
45SxwPluginFactory}.</p>
46
47<h2>AportisDoc pdb format - Doc</h2>
48
49<p>The AportisDoc pdb format is widely used by different Palm applications,
50e.g. QuickWord, AportisDoc Reader, MiniWrite, etc.  Note that some
51of these applications put tweaks into the format.  The converters will only
52support the default AportisDoc format, plus some very minor tweaks to accommodate
53other applications.</p>
54
55<p>The text content of the format is plain text, i.e. there are no styles
56or structures.  There is no notion of lists, list items, paragraphs,
57headings, etc.  The format does have support for bookmarks.</p>
58
59<p>For most Doc applications, the default character encoding supported is
60the extended ASCII character set, i.e. ISO-8859-1.  StarWriter XML is in
61UTF-8 encoding scheme.  Since UTF-8 encoding scheme covers more characters,
62converting UTF-8 strings into extended ASCII would mean that there can be
63possible loss of character mappings.</p>
64
65<p>Using JAXP, XML files can be parsed and read in as Java <code>String</code>s
66which is in Unicode format, there is no loss of character mapping from UTF-8
67to Java Strings.  There is possible loss of character mapping in
68converting Java <code>String</code>s to ASCII bytes.  Java characters that
69cannot be represented in extended ASCII are converted into the ASCII
70character '?' or x3F in hex digit via the <code>String.getBytes(encoding)</code>
71API.</p>
72
73<h2>SXW to DOC Conversion</h2>
74
75<p>The <code>DocumentSerializerImpl</code> class implements the
76<code>org.openoffice.xmerge.DocumentSerializer</code>.
77This class specifically provides the conversion process from a given
78<code>SxwDocument</code> object to DOC formatted records, which are
79then passed back to the client via the <code>ConvertData</code> object.</p>
80
81<p>The following XML tags are handled. [Note that some may not be implemented yet.]</p>
82<ul>
83<li>
84    <p>Paragraphs <tt>&lt;text:p&gt;</tt> and Headings <tt>&lt;text:h&gt;</tt></p>
85
86    <p>Heading elements are classified the same as paragraph
87    elements since both have the same possible elements inside.
88    Their main difference is that they refer to different types
89    of style information, which is outside of their element tags.
90    Since there are no styles on the DOC format, headings should
91    be treated the same way a paragraph is converted.</p>
92
93    <p>For paragraph elements, convert and transfer text nodes
94    that are essential.  Text nodes directly contained within paragraph
95    nodes are such.  There are also a number of elements that
96    a paragraph element may contain.  These are explained in their
97    own context.</p>
98
99    <p>At the end of the paragraph, an EOL character is added by
100    the converter to provide a separation for each paragraph,
101    since the Doc format does not have a notion of a paragraph.</p>
102</li>
103<li>
104    <p>White spaces <tt>&lt;text:s&gt;</tt> and Tabs <tt>&lt;text:tab-stop&gt;</tt></p>
105
106    <p>In SXW, normally 2 or more white-space characters are collapsed into
107    a single space character.  In order to make sure that the document
108    content really contains those white-space characters, there are special
109    elements assigned to them.</p>
110
111    <p>The space element specifies the number of spaces are in it.
112    Thus, converting it just means providing the specific number of spaces
113    that the element requires.</p>
114
115    <p>There is also the tab-stop element.  This is a bit tricky.  In a
116    StarWriter document, tab-stops are specified by a column position.
117    A tab is not an exact number of space, but rather a specific column
118    positioning.  Say, regular tab-stops are set at every 5th column.
119    At column 4, if I hit a tab, it goes to column 5.  At column 1, hitting
120    a tab would put the cursor at column 5 as well.  SmartDoc and AporticDoc
121    applications goes by columns for the ASCII tab character. The only problem
122    is that in StarWriter, one could specify a different tab-stop, but not
123    in most of these Doc applications, at least I have not seen one.
124    Solution for this is just to go with the converting to the ASCII tab
125    character and not do anything for different tab-stop positioning.</p>
126</li>
127<li>
128    <p>Line breaks <tt>&lt;text:line-break&gt;</tt></p>
129
130    <p>To represent line breaks, it is simpliest to just put an ASCII LF
131    character.  Note that the side effect of this is that an end of paragraph
132    also contains an ASCII LF character.  Thus, for the DOC to SXW conversion,
133    line breaks are not distinguishable from specifying the end of a
134    paragraph.</p>
135</li>
136<li>
137    <p>Text spans <tt>&lt;text:span&gt;</tt></p>
138
139    <p>Text spans contain text that have different style attributes
140    from the paragraphs'.  Text spans can be embedded within another
141    text span.  Since it is purely for style tagging, we only needed
142    to convert and transfer the text elements within these.</p>
143</li>
144<li>
145    <p>Hyperlinks <tt>&lt;text:a&gt;</tt>
146
147    <p>Convert and transfer the text portion.</p>
148</li>
149<li>
150    <p>Bookmarks <tt>&lt;text:bookmark&gt;</tt> <tt>&lt;text:bookmark-start&gt;</tt>
151    <tt>&lt;text:bookmark-end&gt;</tt> [Not implemented yet]</p>
152
153    <p>In SXW, bookmark elements are embedded inside paragraph elements.
154    Bookmarks can either mark a text position or a text range. <tt>&lt;text:bookmark&gt;</tt>
155    marks a position while the pair <tt>&lt;text:bookmark-start&gt;</tt> and
156    <tt>&lt;text:bookmark-end&gt;</tt></p> marks a text range.  The DOC format only
157    supports bookmarking a text position.  Thus, for the conversion,
158    <tt>&lt;text:bookmark&gt;</tt> and  <tt>&lt;text:bookmark-start&gt;</tt> will both mark
159    a text position.</p>
160</li>
161<li>
162    <p>Change Tracking <tt>&lt;text:tracked-changes&gt;</tt>
163    <tt>&lt;text:change*&gt;</tt> [Not implemented yet]</p>
164
165    <p>Change tracking elements are not supported yet on the current
166    OpenOffice.org XML filters, will have to watch out on this.  The text
167    within these elements have to be interpreted properly during the
168    conversion process.</p>
169</li>
170<li>
171    <p>Lists <tt>&lt;text:unordered-list&gt;</tt> and
172    <tt>&lt;text:ordered-lists&gt;</tt></p>
173
174    <p>A list can only contain one optional <tt>&lt;text:list-header&gt;</tt>
175    and one or more <tt>&lt;text:list-item&gt;</tt> elements.</p>
176
177    <p>A <tt>&lt;text:list-header&gt;</tt> contains one or more paragraph
178    elements.  Since there are no styles, the conversion process does not
179    do anything special for list headers, conversion for the paragraphs
180    within list headers are the same as explained above.</p>
181
182    <p>A <tt>&lt;text:list-item&gt;</tt> may contain one or more of paragraphs,
183    headings, list, etc.  Since the Doc format does not support any list
184    structure, there will not be any special handling for this element.
185    Conversion for elements within it shall be applied according to the
186    element type.  Thus, lists with paragraphs within it will result in just
187    plain paragraphs.  Sublists will not be identifiable.  Paragraphs in
188    sublists will still appear.</p>
189</li>
190<li>
191    <p><tt>&lt;text:section&gt;</tt></p>
192
193    <p>I am not sure what this is yet, will need to investigate more on this.</p>
194</li>
195</ul>
196<p>There may be other tags that will still need to be addressed for this conversion.</p>
197
198<p>Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl}
199for details of implementation. It uses <code>DocEncoder</code> class to do the encoding
200part.</p>
201
202<h2>DOC to SXW Conversion</h2>
203
204<p>The <code>DocumentDeserializerImpl</code> class implements the
205<code>org.openoffice.xmerge.DocumentDeserializer</code>. It is
206passed the device document in the form of a <code>ConvertData</code> object.
207It will then create a <code>SxwDocument</code> object from the conversion of
208the DOC formatted records.</p>
209
210<p>The text content of the Doc format will be transferred as text.  Paragraph
211elements will be formed based on the existence of an ASCII LF character.  There
212will be at least one paragraph element.</p>
213
214<p>Bookmarks in the Doc format will be converted to the bookmark element
215<tt>&lt;text:bookmark&gt;</tt> [Not implemented yet].</p>
216
217
218<h2>Merging changes</h2>
219
220<p>As mentioned above, the <code>DocumentMerger</code> object produced by
221<code>PluginFactoryImpl</code> is <code>DocumentMergerImpl</code>.
222Refer to the javadocs for that package/class on its merging specifications.
223</p>
224
225<h2>TODO list</h2>
226
227<p><ol>
228<li>Investigate Palm's with different character encodings.</li>
229<li>Investigate other StarWriter XML tags</li>
230</ol></p>
231
232</body>
233</html>
234