Help:Match and split

Match and split
For use with User:Phe-bot. Used to take edited text from the main namespace and to apply to consecutive scanned images in the Page: namespace. Note that there is a requirement that the page in Page: has an existing djvu text layer and is of a sufficiently reasonable quality to perform a match.

Criteria for using this tool

This tool has the power to create a lot of damage if not used carefully. Read through this section before using it and ensure that all the criteria are met.

  • Has the Index file been uploaded and put in place?
  • Does the Index file have a text layer?
  • Is the Index file of type DjVu?
  • Is it the same work (volume of the work)?
  • Is it the same edition of the work?
    • Does the year of publication match?
    • Is it the same publisher?
    • Is the city of publication the same? This is particularly important for texts that were published in England and the US simultaneously. Differences in punctuation and orthography can cause proofreading headaches in the Page: namespace.
  • Has the mainspace text been wikified?
    • Are bold, italics, smallcaps, &c. in place?
    • Have text page numbers and running headers been removed?
    • Have ref tags been used for any footnotes?
  • Has the text been proofread by us to at least 75%? If not, there is no advantage in using Match & Split and side-by-side proofreading will be quicker in the long run.
  • Have the header and footer fields in the Index been appropriately set for the running header and footer? When Page: namespace pages are created the contents of these fields is used to populate the header and footer fields.

When not to use this tool

  • If any of the above criteria are not met
  • If the text has been pasted from Project Gutenberg or Distributed Proofreaders, then we do not know that the editions used are the same. Side-by-side proofreading should be done on the OCR text layer in the Page: namespace and either transcluded to replace the pasted text or transcluded as a separate edition. This decision will be based on the similarity of the editions.
  • If the text has been pasted from Internet Archive, then side-by-side proofreading is the appropriate action. Transclusion should replace the pasted text. This is because very little proofreading takes place at IA and the text is the OCR layer that we already have.
