API Reference¶
This page documents the public Python API by hand. The API surface is small enough that hand-written reference text is clearer than generated output for the initial release.
moine.distance¶
moine.distance(left, right, *, lang=None, dictionary=None, score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Returns the Levenshtein-style Lattice Path Edit Distance for one pair of
strings. When both lang and dictionary are omitted, the function falls back
to plain string edit distance.
left- The first string.
right- The second string.
lang- Optional language code. Use
"ja"for Japanese or"zh"for Chinese. dictionary- Optional loaded dictionary object. When omitted, mòine loads or reuses the
default dictionary for
lang. score_cutoff- Optional integer threshold. Distances greater than the cutoff return
score_cutoff + 1. max_readings_per_segment,max_span_chars,max_paths,longest_only- Optional dictionary expansion controls. These options require
langordictionary; plain string distance rejects them.
>>> import moine
>>> moine.distance("weishiji", "威士忌", lang="zh")
0
moine.damerau_distance¶
moine.damerau_distance(left, right, *, lang=None, dictionary=None, score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Returns the lattice-aware Damerau-Levenshtein distance for one pair of strings. It can count adjacent transpositions as one edit on lattice paths.
>>> import moine
>>> moine.damerau_distance("moine", "mione")
1
moine.normalized_distance¶
moine.normalized_distance(left, right, *, lang=None, dictionary=None, score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Returns a normalized distance in 0.0..=1.0.
>>> import moine
>>> moine.normalized_distance("もいにゃ", "モイニャ", lang="ja")
0.0
moine.normalized_similarity¶
moine.normalized_similarity(left, right, *, lang=None, dictionary=None, score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Returns a normalized similarity in 0.0..=1.0, where larger is better.
>>> import moine
>>> moine.normalized_similarity("もいにゃ", "モイニャ", lang="ja")
1.0
moine.ratio¶
moine.ratio(left, right, *, lang=None, dictionary=None, score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Alias for normalized_similarity.
>>> import moine
>>> moine.ratio("ピィート", "ピート", lang="ja")
0.7142857142857143
moine.partial_ratio¶
moine.partial_ratio(query, text, *, lang=None, dictionary=None, score_cutoff=None, max_span_chars=None, max_reading_span_chars=None, max_readings_per_segment=None, max_paths=None, longest_only=None)
Returns the best normalized similarity between query and a span in text.
The returned score is in 0.0..=1.0, where larger is better. In partial
APIs, max_span_chars limits scanned spans in text; max_reading_span_chars
limits dictionary reading expansion. When max_span_chars is omitted,
dictionary-backed matching also accounts for the longest reading path of
query, so short written forms such as kanji or hanzi can still match longer
romanized spans.
>>> import moine
>>> moine.partial_ratio("ウイスキー", "ういすきーをのんでいます", lang="ja")
1.0
moine.partial_distance¶
moine.partial_distance(query, text, *, lang=None, dictionary=None, score_cutoff=None, max_span_chars=None, max_reading_span_chars=None, max_readings_per_segment=None, max_paths=None, longest_only=None)
Returns the best distance between query and a span in text.
If dictionary-backed matching cannot score any span in text, this returns
len(query) without a cutoff or score_cutoff + 1 with a cutoff.
>>> import moine
>>> moine.partial_distance("ウイスキー", "ういすきーをのんでいます", lang="ja")
0
moine.partial_alignment¶
moine.partial_alignment(query, text, *, lang=None, dictionary=None, metric="ratio", score_cutoff=None, max_span_chars=None, max_reading_span_chars=None, max_readings_per_segment=None, max_paths=None, longest_only=None)
Returns a PartialAlignment(score, src_start, src_end, dest_start, dest_end)
for the best span, or None when no span can be scored or score_cutoff
filters every span. Offsets are Python character offsets. metric is
"ratio" by default; use "distance" to rank by distance instead.
>>> import moine
>>> text = "ういすきーをのんでいます"
>>> alignment = moine.partial_alignment("ウイスキー", text, lang="ja")
>>> alignment
PartialAlignment(score=1.0, src_start=0, src_end=5, dest_start=0, dest_end=5)
>>> text[alignment.dest_start:alignment.dest_end]
'ういすきー'
moine.cdist¶
moine.cdist(queries, choices, *, lang=None, dictionary=None, metric="distance", score_cutoff=None, max_readings_per_segment=None, max_span_chars=None, max_paths=None, longest_only=None)
Returns a query-by-choice matrix of scores.
queries- Iterable of query strings.
choices- Iterable of candidate strings.
lang- Optional language code. Use
"ja"or"zh"for dictionary-backed scoring. Omit it for plain string scoring. dictionary- Optional loaded dictionary object. When supplied,
cdistcan run withoutlang. metric- One of
"distance","damerau_distance","normalized_distance","normalized_similarity", or"ratio". score_cutoff- Optional threshold. Use an integer for distance metrics and a float for normalized metrics.
max_readings_per_segment,max_span_chars,max_paths,longest_only- Optional dictionary expansion controls. These require
langordictionary.
>>> import moine
>>> moine.cdist(["abc", "axc"], ["abc", "acb"])
[[0, 2], [1, 2]]
>>> moine.cdist(["abc"], ["abc", "adc"], metric="ratio")
[[1.0, 0.6666666666666666]]
>>> moine.cdist(
... ["weishiji", "布納哈奔"],
... ["威士忌", "布納哈本"],
... lang="zh",
... )
[[0, 8], [8, 0]]
Note
cdist intentionally keeps the first public API small. It does not expose
RapidFuzz-only knobs such as processor, score_hint, NumPy dtype options,
or worker parallelism.
Dictionary Loading¶
moine.load_dict¶
moine.load_dict(*, lang, path=None)
Loads a dictionary artifact for one language. If path is omitted, mòine
searches the configured cache, language-specific environment variables, and
MOINE_DICTIONARIES_PATH.
>>> import moine
>>> dictionary = moine.load_dict(lang="ja")
moine.set_default_dictionary¶
moine.set_default_dictionary(dictionary)
Registers a loaded dictionary as the default dictionary for its language.
>>> import moine
>>> dictionary = moine.load_dict(lang="ja")
>>> moine.set_default_dictionary(dictionary)
>>> moine.distance("もいにゃ", "モイニャ", lang="ja")
0
moine.clear_default_dictionary¶
moine.clear_default_dictionary(*, lang)
Clears the configured default dictionary for a language.
moine.get_default_dictionary¶
moine.get_default_dictionary(*, lang)
Returns the configured default dictionary for a language, or None.
Language-Specific Modules¶
moine.ja- Japanese helpers and the UniDic-backed
Dictionaryalias. moine.zh- Chinese helpers and the CC-CEDICT-backed
Dictionaryalias.
Rust users should use the crate documentation on docs.rs.