[jsword-devel] Fwd: [sword-devel] Tables across verse boundaries
    DM Smith 
    dmsmith at crosswire.org
       
    Tue Mar 18 17:27:09 MST 2014
    
    
  
The other fixer upper was tag-soup.
On Mar 18, 2014, at 8:22 PM, DM Smith <dmsmith at crosswire.org> wrote:
> 
> On Mar 18, 2014, at 5:02 PM, Chris Burrell <christopher at burrell.me.uk> wrote:
> 
>> Yup - so I was looking at the code tonight.
>> 
>> I don't think the problem is quite as bad/hard to fix as you make it sound.
>> 
>> I think there are two types of issues
>> - a verse on its own not producing correct XML
>> - a bunch of XML together not producing well nested XML
>> 
>> Not sure how to solve the second, but the easy (?) solution on the second one is to amalgamate all the raw text first before parsing it. Now that we pass the whole passage down one more level, it shouldn't be too difficult to do that?
> 
> Amalgamation may minimize the problem. Especially if we are displaying a chapter at a time. But if we display a search results list, a parallel display or an arbitrary passage chosen by the user then it may exhibit problems. 
> 
> The problem is still the same.
> 
> We've also talked about expanding the context of what fails by grabbing an adjacent verse and adding it to the amalgamation and re-parsing.
> 
> It is not hard to write a parser. That's essentially what we have with the ThML parser. Such a parser could know when it sees an unmatched start or end tag. Presuming that the module is valid, well-formed as a whole we can either prefix or append the missing tag to the result. This would "solve" the problem. (The ThML parser does not do that).
> 
>> 
>> On the second, there may some nice XML parsers that fix stuff up more gracefully as well...
> 
> By definition an XML parser must fail on bad input. I've not seen any that fix up broken xml. Every year I do a survey of available parsers not just XML to see if there is something that might help. One that caught my eye: JTidy.
> 
> JTidy understands the xhtml spec and can take badly formed HTML and clean it up. I was trying to figure out if I could re-write it for another schema, or to take a schema and generate a cleanup technique. It was more complicated than I was willing to get into.
> 
> DM
> 
>> 
>> Chris
>> 
>> 
>> ---------- Forwarded message ----------
>> From: DM Smith <dmsmith at crosswire.org>
>> Date: 18 March 2014 20:45
>> Subject: Re: [sword-devel] Tables across verse boundaries
>> To: christopher at burrell.me.uk, SWORD Developers' Collaboration Forum <sword-devel at crosswire.org>
>> 
>> 
>> On Mar 18, 2014, at 3:29 PM, Chris Burrell <christopher at burrell.me.uk> wrote:
>> 
>>> Hi DM
>>> 
>>> 1- You're right, it was my mistake around across verses. Ezra 1 would be an example where you have 3 rows per verse, and a table over two verses.
>> No problem. It's hard to debug a problem where the text is made up.
>> 
>>> 
>>> 2- My issue with the markup and having the verse number inside the cell was that I got a 'nesting' warning by mod2osis. Is that something i just ignore? (i.e. "verse sID" in the first cell with "verse eID" in the second cell)
>> 
>> The nesting warnings are relatively benign. They indicate that the verse in isolation is not well-formed XML and that when displayed in certain contexts it will have problems.
>> 
>> That the verse sID is in one cell and the verse eID is in another by itself is not a problem. It is more a question if the raw data from the module is a well-formed fragment.
>> 
>> 
>>> 
>>> 3- I had another look at the output, and the module does in fact have the table in it. It looks like it wrapped it into verse 8, as expected. So it seems, that maybe this is an issue specific to JSword?
>> 
>> It is a particularily bad problem with JSword. JSword passes the verse raw data to an xml parser to create an xml fragment, which it fails when not well-formed. When the exception is caught, we then strip all markup out of the raw data and re-parse it. 
>> This is particular to JSword.
>> 
>> However, when the verse is shown in isolation by any SWORD frontend or in a table cell, it most likely will not display as intended. It's that JSword does it one worse. If we wish to discuss JSword's shortcoming more, we should do that on jsword-devel or create an issue for it (if there isn't already one, as we have talked about this problem in the long past.)
>> 
>>> 
>>> Chris
>>> 
>>> 
>>> 
>>> On 18 March 2014 13:50, Jonathan Morgan <jonmmorgan at gmail.com> wrote:
>>> Hi DM,
>>> 
>>> On Tue, Mar 18, 2014 at 12:01 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>> On Mar 17, 2014, at 1:07 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>> 
>>> > Hello
>>> >
>>> > I'm looking at converting a module that has tables across verse boundaries... Is this supported?
>>> 
>>> It should be. At least by osis2mod. I don't know if SWORD renderers have code for tables. I'll leave that for someone else to answer. JSword probably will choke on tables. I'll go into that in a bit.
>>> 
>>> Last time we discussed OSIS tables they weren't supported by the SWORD renderers.
>>> I don't think anything has changed.
>>> 
>>> Jon 
>>> 
>>> > I'm using the sword utilities to convert the module, however, I'm seeing that the 'table' element is getting dropped?
>>> 
>>> I'm presuming that you are using osis2mod. osis2mod should not drop anything. To verify what osis2mod creates I recommend creating a raw module (that is, use no compression flags) and use the -d 2 flag. This will put milestones for the start and end of the verses into the module. Then you can use a text editor (stay away from NotePad as the line endings may not be windows friendly) to look at the file and search for the constructs.
>>> 
>>> >  (both using mod2imp to check,
>>> 
>>> Using mod2imp is also useful because it marks each index entry with the verse slot name. But it may not be necessary, if the raw file gives what you wish.
>>> 
>>> > as well as using JSword).
>>> 
>>> JSword has some problems going to OSIS. It assumes that each verse is well-formed xml. If it is not, it strips all xml, leaving text (with notes inline).
>>> 
>>> This is a fairly safe assumption, but tables will probably will make that fail.
>>> 
>>> This assumption is something that all SWORD/JSword frontends make at some points. Two examples:
>>> Search results list that show verse content as well as references.
>>> Stacked or side-by-side parallel display.
>>> 
>>> >
>>> > If this is supported, does someone have some example mark-up that I could use as a starting point?
>>> 
>>> I'm trying to understand where in a Bible a table would be useful. I can see it in an introduction. But spanning verses? No way. There is no tabular data in the Bible. (Please correct me if I'm wrong!)
>>> 
>>> I have seen people use tables to control rendering. If this is what is being done, some one needs guidance.
>>> 
>>> In a commentary, which is indexed by verse numbers, anything could happen.
>>> 
>>> Regarding sample markup, it is analogous to simple HTML tables, but other than <table> the element names are different.
>>> The <table> element can be wholly contained within:
>>> <div>
>>> <chapter>
>>> <speech>
>>> <note>
>>> <cell>
>>> <p>
>>> Nothing else can be a parent to <table>.
>>> 
>>> A table has a few attributes, cols and rows to give dimensions; canonical to indicate whether it contains canonical material; and the standard OSIS attributes.
>>> It can contain a <head> and also <row> elements. Both are optional, but it doesn't make sense to have a table without rows.
>>> 
>>> I'm not clear what is the purpose of head. It can contain many of the same content as a verse.
>>> 
>>> The <row> element can only contain <cell> elements and it has a role attribute that can have a value of label or data. It also has a canonical attribute and the standard OSIS attributes.
>>> 
>>> The <cell> element can contain pretty much anything that a <div> or a <chapter> can contain except <div> and <chapter>. It also has the same role attribute, but defaults to data. It also has an align attribute with a value from left, right, center, justify, start and end. And of course it has canonical and standard OSIS attributes.
>>> 
>>> Since a table cannot be milestoned, the element it is contained within also cannot be milestoned. The manual states that for any given element you can chose to use the milestoned version or the container version but not both in the same document.
>>> 
>>> I guess a verse can be split across multiple cells and even rows by using the milestoned version of a verse.
>>> 
>>> If a <table> only has a single column, a <list> may be a better container.
>>> 
>>> Hope this helps.
>>> 
>>> Together in His Service,
>>>         DM
>>> 
>>> 
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>> 
>>> 
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>> 
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>> 
>> 
>> <smime.p7s>_______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140318/db5cc21f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4145 bytes
Desc: not available
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140318/db5cc21f/attachment-0001.p7s>
    
    
More information about the jsword-devel
mailing list