Improving the presentation
Text, whether electronic or on paper, is easier to read if ...
- there are colored pictures to illustrate the text;
- the text is nicely formatted, with key terms in bold type or in italics, and all the text in a clear font with large enough type;
- items in lists are arranged clearly with a "bullet" to show the start of each item;
- tables are used to display content in labeled rows and columns;
- the table of contents is presented in a clear simple fashion;
- you can start reading anywhere you want whenever you want;
- you can switch to search whenever you wish.
If you observe The Essentials, your collection of text is searchable. This page shows how to make the results more presentable and readily understood by a person searching or reading the collection.
- Leave a blank line between paragraphs.
- Correctly nested HTML tags may be included.
- You may use bold, italic, and underline tags.
- Use tables when appropriate.
- Use lists when appropriate, up to five levels.
- Don't mix tables and lists.
- Use break and blockquote tags to format poetry or quotations.
- Graphics, pictures, and video clips may be included.
- Links to the Internet may be included.
- If indexing a web page, link to it.
- Links within the current collection may be included.
- Links to other collections may be included.
- Keep the objective in mind -- people finding content.
Leave a blank line between paragraphs.
Here is a very simple step: Separate paragraphs with a blank line. When content is displayed by the search engine, it inserts two break tags (<BR><BR>) in the HTML at each point at which the original text had one or more blank lines. These tags are invisible to the user, but have the effect of clearly distinguishing between paragraphs.
Behind the scenes, the text is divided into one hundred word blocks. The end user neither knows nor cares about those divisions. For display purposes, the search engine looks for good points that are logical from the user's view at which to start and finish a segment of text. Paragraph breaks are excellent for this purpose.
Correctly nested HTML tags may be included.
The desktop version of the Words Close Together search engine is based on a browser; it actually uses pieces of Internet Explorer. The server version also dynamically produces HTML web pages. Therefore HTML tags work. In order that search may be more efficient, the system requires that some basic HTML rules be followed:
- If an HTML tag has an equivalent end tag, use it. For example, in an ordered (<OL>) or an unordered (<UL>) list, each list item begins with an <LI> tag. Be sure to finish the list item with the equivalent </LI> tag.
- Nest tags in LIFO (last in, first out) order. Example: Within a table row, each table cell begins with a <TD> tag. Suppose you want that cell in bold. Your stack of active tags at this point may be something like <P>, <TABLE>, <TR>, <TD>, <B> (paragraph, table, table row, table cell, bold). First in is the paragraph tag. Last in is the bold tag. The first tag to be terminated must be the bold tag ... last in, first out.
Content within tags is not searchable. Imagine the confusion for the end user if she searched on the word "align" and found instances within tags. Content within tags is never displayed by the browser / desktop WCT Reader program. Under these circumstances, comment tags contribute nothing of value. Therefore, delete any comment tags.
The search engine has a simple standard layout for its result lists, expanded single results, and blocks of text for reading, including lists and tables. Any formatting within tags, for example, <P align="center"> is ignored. The extra formatting contributes nothing, and unnecessarily uses up word count. Leave it out.
You may use bold, italic, and underline tags.
The HTML purist may object to holdovers from the past such as bold, italic, and underline tags. Yes, cascading style sheets are better. However, a great deal of content includes these tags, and the bold and italics help to show emphasis by the author. Be a little careful about underlining; don't color it blue! (Yes, even span tags and color combinations work. But don't get carried away.)
Use tables when appropriate.
Tables go a long way to clarifying content for the end user. Make free use of table, table row, and table cell tags -- in LIFO (Last In First Out) order, of course. Don't bother with formatting instructions within tables.
Frames are still used on some web pages. For search purposes, tables within tables could be an immense source of confusion. Therefore the system only accepts stand-alone tables. The Textract program reduces frames, maintains the innermost of nested tables, and treats all else as text outside of tables.
Tables are not broken up for display. Therefore it is best to limit their size. 5000 bytes of text is a convenient maximum. Otherwise it takes too many screenfuls to display a single unit of text.
Here is a sample table. If you are using the server version, click on View -- Source to see the tags used below. The sample table is deliberately left with minimum formatting, since the Words Close Together software adds a bit of cell padding when it is displayed as a search result. The subject of the sample table is common spelling errors.
Error Correct Form Comment it's as a possessive its Pronouns don't take apostrophes ...
unless you like "hi's" and "m'y".its as a contraction it's Apostrophe signals missing letter. their as a contraction
of "they are"they're "Their" is a possessive pronoun. i as a personal pronoun I Sign of a dissipated life; not
enough energy to press Shift key.Use lists when appropriate, up to five levels.
Unordered lists with a bullet before each item, or ordered lists with a sequence number before each item are both standard fare in HTML. They help to make content more meaningful, and are therefore enabled within Words Close Together technology.
Lists within lists are okay. The system allows for up to five levels. Here is an example with three levels:
- Ohio
- Allen County
- Allentown
- Conant
- Kemp
- Lima
- Fort Shawnee
- Hardin County
- Kenton
- Pfeiffer Station
- Foraker
- Pennsylvania
- Allegheny County
- Moon Run
- Pittsburgh
- Imperial
- More counties
- etc.
- More states
- etc.
As with tables, lists are not broken up for display. Therefore it is best to limit their size. 5000 bytes of text is a convenient maximum. Otherwise it takes too many screenfuls to display a single unit of text.
Don't mix tables and lists.
Occasionally situations come up in which lists are included within tables. You may find examples within Microsoft Help files. For purposes of search and formatting results on the screen, this can get rather absurd. So if there is a list within a table, it's best to reduce it to a series of items, each preceded by a break tag so that it shows up on a separate line within the table cell.
Use break and blockquote tags to format poetry or quotations.
Poetry is searchable even if its lines are run together. We recommend judicious use of blockquote tags for indentation and line break tags for separating lines.
We are also accustomed to prolonged quotations being indented. Words Close Together presentation of text blocks and search results supports this kind of formatting.
Graphics, pictures, and video clips may be included.
It's an HTML world. Graphics, pictures, and video clips can be inserted within searchable text. These will be displayed along with search results. The search will only encompass the words of text. A picture is not searchable. Words about a picture many be searched along with the other text.
Whether graphic components should be included is a separate question. In search of a web site, graphics are not necessary since the actual page with all its pretty add-ons is only one click away. The downside of graphics is that they must be correctly linked and stored typically in a subdirectory that is set up in association with the Words Close Together index file. The logistics of distributing and installing can get messy, unless you provide a top notch installation aid.
One situation that would warrant full graphics is an electronic textbook distributed in the form of a Words Close Together index file. Production costs can be cut radically by handling all graphics in the electronic version, and printing only a black type on newsprint version. This approach also fits in with frequent updates, and gets around the headaches for publishers created by the resale market for high cost paper editions.
Links to the Internet may be included.
If linking a portion of text to the Internet, be sure to include the full link, including href="http://www.etc.org/whatever.asp". Provided the computer is connected to the Internet, the user is taken to the designated page, and it is shown within the current display area. Clicking the Back button returns the user to the search software.
If indexing a web page, link to it.
It is particularly helpful to include a link in the heading preceding a web page or the text extracted from it. This provides one click access to all the graphics and specialized layout of the original, while still providing full search capability to the user. It's the best of both worlds.
Links within the current collection may be included.
Inserting links from one part of a collection to another is possible. For this purpose, use anchor tags to identify destinations, and link tags to set up cross references to those destinations. Destination anchors may be either in headings or in ongoing text. For example:
In a heading: <H2><A name="49">Introduction</A></H2> orLink tags are typically within ongoing text:
In text: ... dedicated to the proposition that <A name="27">all men are created equal</A>. We are gathered ...The <A href="#49">Introduction</A> alluded to Stanton's view that ... or
Jefferson as a slave owner might have blanched at Lincoln's <A href="#27">proposition of equality</A>.Be sure that each anchor tag is unique within this particular text collection. Using sequential integers for the anchor tags (whether they are actually in order or not) is one sure way. You may of course cross link to the same anchor multiple times within the collection of text.
Prior to indexing, an extra program is run to replace both the anchors and the links. You won't recognize the replacements. But they work very nicely for your purposes.
Links to other collections may be included.
Inserting links from one part of a collection to another separate collection is similar. Anchor tags are exactly as above. The only difference is that the link tag must have the name of the other collection inserted between the href=" and the sharp sign.
Let's assign the hypothetical collection in the preceding section the name "Lincoln and Stanton". Suppose we wanted to refer to parts of it from within a different collection, "Free At Last". Then links within "Free At Last" might look like this:
The <A href="Lincoln and Stanton#82">tension with Stanton</A> could be traced to ... or
The dignity of no man is upheld unless all men are <A href="Lincoln and Stanton#645">free</A>.The name of the target data set must be precisely letter by letter correct in the link.
The user is scarcely aware of the switch from one collection to another. The back button is always available to return to the former collection.
Keep the objective in mind -- people finding content.
At the top of the page is a statement that simply reducing the inputs to printable text is enough to make them searchable. Everything beyond that improves the presentation, and in some cases the clarity of the content. There is an obvious tradeoff between the cost (amount of additional work invested) and the benefit (improved presentation characteristics). In some situations such as discovering emails that touch on particular topics, the benefit is entirely in the finding, and not at all in the enhancement of presentation. At the opposite extreme, a public relations group might go to great lengths to "arrange the truth in bouquets" that are fully persuasive.
You know the needs of your users. You decide how much effort to invest.
|
|||||
| Proximity Search .com | Search technologies by Marpex, Inc. | ||||