usage: tacl lifetime [-h] [-t {cbeta,latin,pagel}] [-v]
CATALOGUE RESULTS LABEL OUTPUT
Generate a report on the lifetime of n-grams in a results file.
positional arguments:
CATALOGUE Path to catalogue file.
RESULTS Path to a results file to report on.
LABEL Label to mark as the focus of the report.
OUTPUT Directory to output report to.
options:
-h, --help show this help message and exit
-t {cbeta,latin,pagel}, --tokenizer {cbeta,latin,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
A lifetime report consists of:
* an HTML table showing the disposition of each n-gram across the
ordered corpora (with texts and count ranges);
* an HTML table showing, for each corpus, the n-grams that first
occurred, only occurred, and last occurred in that corpus; and
* results files for each category (first occurred in, only
occurred in , last occurred in) for each corpus.
This report may be generated from any results file, but is most usefully
applied to the output of the lifetime script (in the tacl-extra package).
The focus label is informative only, since often multiple lifetime reports
will be generated, one per corpus, from the same master results file, but with
specific filtering for the corpus in focus.