[jsword-devel] StrongNumber in indexing

Chris Burrell chris at burrell.me.uk
Fri Jan 4 15:15:52 MST 2013


Sounds good. A few notes,

   - the STEP interlinear functionality tries (tried?) to use this
   functionality to provide better interlinears. We currently don't use the
   x-split/or src, but could do either.


   - With H00, it was accepted that H00 almost always referred to the next
   tag, when triple tagged, it was the strong number next to H00 that was
   used; e.g. "H00 H1 H2" means that the next occurrence of H1 is split. Is
   there such a convention with x-split (there aren't that many occurrences of
   this, but there are a few all the same).


   - We were intending to include H00 in the the ESV that Tyndale are
   tagging. If that's not to be the case, can we decide the most appropriate
   way to do this? src sounds good, although sounds like it might be difficult
   to do properly for the KJV without a lot of manual work. We're hoping to
   have something ready-ish very soon.

How easy would it be to restore the data via x-split?
Chris



On 4 January 2013 21:57, DM Smith <dmsmith at crosswire.org> wrote:

> On Jan 4, 2013, at 4:34 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> There are two separate issues here.
>
> 1- The fact that we retrieve the closest match to a strong number is IMHO
> rather obscure and confusing in itself. I've hit this several times and
> found through rather laborious investigation that a module was using a bad
> strong number, or some piece of code hadn't quite formatted the number
> right, etc.
>
>
> This is a feature of a dictionary lookup. This will typically find the
> longest common prefix.
>
> It'd probably be good to mark some dictionaries as exact match only.
> Strong's, Robinson's, and maybe daily devotions seem like candidates.
>
>
> 2- H00: The KJV is the most obvious example of a module that has/had it.
> It looks like someone has removed them all in the KJV2006 project (
> http://www.crosswire.org/~dmsmith/kjv2006/index.html). Version 2.3 of the
> module still has it. Did we replace this with something else? H00 was used
> to indicate that the first occurrence of the strong number was the same
> original word as the second one. We were going to put them into the ESV.
>
> So for example Gen 2.9, used to read something like this:
>
> <div><title type="x-gen">Genesis 2:9</title>
> <verse osisID="Gen.2.9">
>  <w lemma="strong:H04480">And out</w>
> <w lemma="strong:H0127">of the ground</w>
>  *<w lemma="strong:H00 strong:H06779">made</w> *
> <w lemma="strong:H03068">the <seg><divineName>Lord</divineName></seg></w>
>  *<w lemma="strong:H0430">God</w> *
> <w lemma="strong:H06779" morph="strongMorph:TH8686">to grow</w>
>          [ ... ... ... some more stuff goes here ... ... ...]
> </verse></div>
>
> In the above, this indicates that the translators split the word H06779
> into "made" and into "to grow".
>
> It seems someone has removed all of these marks. However we don't have the
> "src" tag either so can anyone suggest how I can tell which bits go
> together and which bits go apart? What was the reasoning behind this change?
>
>
> I maintain the KJV. I couldn't find a purpose of H00. So I took it out as
> being wrong. If it is the splitting of words, we have a mechanism for that
> in the NT, which could be used. It uses src="XX" (which for the NT ties
> back to the XX word in the verse in a particular Greek module), the
> type="x-split" and subType="x-NN" where NN is a unique number w/in the
> verse having a value greater than the greatest value of src="XX". I'm not
> at all sure that subType is still needed. Both src and type are each
> sufficient to solve the problem.
>
> A bit more exploring to do on the KJV...
>
>
>
> Chris
>
>
>
> On 4 January 2013 21:07, DM Smith <dmsmith at crosswire.org> wrote:
>
>> H00 is not a valid Strong's number. The modules that have it should be
>> re-done. Do you know which are the problem modules?
>>
>> The problem with allowing H00 is that it will not find an entry in a
>> Strong's dictionary and will get the nearest one. Which is better? An error
>> filling the console or confusing the user?
>>
>> I don't mind changing the regex to be simpler, but it should not create
>> further problems.
>>
>> The part at the end is an optional extension. We have a module in the
>> wings that has it.
>>
>> In Him,
>>         DM
>>
>> On Jan 4, 2013, at 3:34 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> > Hi
>> >
>> > Can I suggest a fix to the StrongNumberFilter, which currently relies on
>> > org.crosswire.jsword.book.study.StrongsNumber
>> >
>> > The regular expression used to match the Strong number is:
>> > private static final Pattern STRONGS_PATTERN =
>> Pattern.compile("([GgHh])0*([1-9][0-9]*)!?([A-Za-z]+)?");
>> >
>> > Unfortunately, some texts use H00 as a strong number to indicate that
>> the tagged word is in 2 places (i.e. this is only the first part of the
>> tag).
>> >
>> > The above expression causes huge amounts of Logging to be output to the
>> console.
>> >
>> > I suggest we change it to something like
>> >
>> > [GgHh][0-9]+
>> >
>> > Also, what's the stuff at the end of the regex? Haven't come across any
>> like that...
>> >
>> > Chris
>> >
>> > _______________________________________________
>> > jsword-devel mailing list
>> > jsword-devel at crosswire.org
>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130104/1e696214/attachment.html>


More information about the jsword-devel mailing list