phon
The goal of phon
is to make available the CMU Pronouncing Dictionary
(cmudict) in an R friendly
format, and to collect some tools which use the pronunciation information.
The CMU Pronouncing Dictionary includes pronunciations for 130,000 words. By matching
the phonemes between words, phon
provides
phonemes('threw')
- Returns the phonemes for the pronunciation of “threw”.homophones('steak')
- Returns words which are homophones of “steak”.rhymes('carry')
- Returns words which rhyme with “carry”.sounds_like('statistics', 3)
- Returns words with a similar sound to “statistics” by limiting the mismatches in phonemes the other word can have.contains_pronunciation('through')
- Returns words which sound like they contain the given wordsyllables("useless")
Returns the count of syllables in “useless”.
This is a companion package to the syn package.
syn
finds related words based upon meanings, while phon
finds related words based upon
pronunciation.
Installation
You can install phon
from github with:
devtools::install_github("coolbutuseless/phon")
Phonemes
Phonemes are the sounds which make up a word.
The phonetic encoding in phon
come from the CMU Pronouncing Dictionary
(cmudict)
which encodes words using ARPABET.
phon::phonemes("cellar")
[1] "S EH L ER"
Since some words have mutliple pronunciations, the results of phon::phonemes()
is always returned as a list, e.g. carry has two slightly different
pronunciations.
phon::phonemes("carry")
[1] "K AE R IY" "K EH R IY"
ARPABET phonetic encoding includes stress markers as suffixes to vowel phonemes. The markers are:
- No stress
- Primary stress
- Secondary stress
You can ask for phonemes with/without the stress markers, e.g.
phon::phonemes("fantastic")
[1] "F AE N T AE S T IH K"
phon::phonemes("fantastic", keep_stresses = TRUE)
[1] "F AE0 N T AE1 S T IH0 K"
Syllables
The number of syllables in a word is the count of the number of phonemes with stress markers in the word.
This is pre-calculated and available through the phon::syllables()
function.
phon::syllables("average")
[1] 3
phon::syllables("antidisestablishmentarianism")
[1] 12
Matching Pronunciation
phon
allows you to search for the sound of one word within another.
In the following example, phon::contains_pronunciation()
finds all the words that
include the pronunciation of “through” within their pronunciation.
phon::contains_pronunciation("through")
[1] "bathroom" "bathrooms" "breakthrough" "breakthroughs"
[5] "drive-thru" "drive-thrus" "overthrew" "threw"
[9] "throop" "throughout" "throughput" "throughs"
[13] "throughway" "thru" "thruway"
Use the keep_stresses
argument to match with/without the stresses included (default is
to ignore the stresses).
Homophones
Homophones are words with the same pronunciation but different spelling.
phon::homophones("steak")
[1] "stake"
phon::homophones("carry")
[1] "carey" "carie" "carrey" "carrie" "cary" "kairey" "kari" "karry"
[9] "kary" "kerrey" "kerri" "kerry"
Rhymes
To find rhymes, phon
compares trailing phonemes. If the phonemes at the end of a
word in the dictionary match those at the end of the given word, then they rhyme.
The rhymes are returned in multiple vectors:
- Words with the most matching trailing phonemes are returned first.
- Subsequent vectors have fewer matching trailing phonemes.
- The names of the list are the number of trailing phonemes which match.
phon::rhymes("drudgery")
$`3`
[1] "challengery" "forgery" "gingery" "injury" "margery"
[6] "marjorie" "marjory" "menagerie" "neurosurgery" "perjury"
[11] "surgery"
$`2`
[1] "acary" "accessory" "adoree" "adultery"
[5] "advisory" "alimentary" "alphandery" "ambery"
[9] .... (results trimmed)
In the above example:
- The phonemes for “drudgery” are “D R AH1 JH ER0 IY0”
- In the first vector, the words match the -gery sound, i.e the last 3 phonemes.
- In the second vector the words only match the -ery sound, i.e. the last 2 phonemes.
Similar sounding words
Similar sounding words are found by comparing words with the same number of phonemes but with a number of mismatches allowed.
phon::sounds_like("statistics", phoneme_mismatches = 5)
[1] "anaesthetics" "anesthetics" "centronics" "gymnastics" "heuristics"
[6] "onomastics" "scientific's" "scientifics" "statistics'" "stochastics"
[11] "subsistence" "synbiotics"