Current selection:

Advanced Search     Switch Selection     Preferences

Quality control

One reason to use the Textract program is the validation features that are built into it. It ensures all of the following:

  1. All characters are printable, accents have been stripped, and tabs have been converted to blanks.
  2. Every < symbol is matched by a corresponding >. There are no orphan < or > symbols, no nested tags (tags within tags), and no two < or > in sequence.
  3. The following tags are always recognized and accepted: The six levels of headings <H1> through <H6>, paragraph <P>, lists <UL> and <OL>, table <TABLE>, table row <TR>, table cell <TD>, blockquote (for indenting) <BLOCKQUOTE>, bold <B>, italics <I>, underline <U>, and line break <BR>. The counterpart ending tag is expected for every tag except the line break.
  4. Additional tags are recognized if they are listed in the file HTMLTags.txt. HTMLTags.txt is a self-documented file that you may edit to specify which tags you wish retained from HTML input.
  5. All tags that require ending tags are in LIFO (last in, first out) order.
  6. The program warns if a blank line in input is followed by content that starts in lower case. This may signal a need to remove extraneous blank lines from input. This is not an error as such.
  7. Headings may contain no tags except links.
  8. When ascending through heading levels, intermediate levels are not skipped. For example, an <H4> may not be the next heading after an <H2>.
  9. If a <TABLE> occurs, text may be located only within table cells, and table cells may occur only within table rows. Each table cell starts with <TD> and ends with </TD>. Each table row starts with <TR> and ends with </TR>. There should be the same number of table cells within each table row. The only other tags allowed within a table are bold, italics, and underline, in LIFO order and each finishing within the same cell in which it starts.
  10. Lists start with either <UL> (unordered ... a bullet before each list item) or <OL> (ordered ... sequential numbers before the successive list items). All text in a list must be within a list item, which starts with <LI> and ends with </LI>. Indented lists may occur within lists, up to five levels altogether. The only other tags allowed within a list are bold, italics, and underline, in LIFO order, each finishing within the same list item in which it starts.
  11. Use of ampersand codes is restricted to the following only:
    • Variations of the ampersand codes for < and > (ampersand followed by lt; or gt;) are permitted anywhere that text content is allowed.
    • The symbol for a non-breaking space (ampersand followed by nbsp;) is permitted only between <TD> and </TD> to preserve an otherwise empty cell.
  12. All other ampersand codes are reduced to normal print form. In other words, the only other occurrence of '&' is as an ampersand and not part of an ampersand code.
  13. Lengths: Headings must be under 300 characters in length. Paragraphs may be any length whatsoever.
  14. Every heading is followed either by a more junior heading or by text. This ensures that every heading has child text.


 
Website is for ...    preparing searchable text ...    using tools ...
 
Proximity Search .com Search technologies by Marpex, Inc.