phon - a package for rhymes etc

phon

The goal of phon is to make available the CMU Pronouncing Dictionary (cmudict) in an R friendly format, and to collect some tools which use the pronunciation information.

The CMU Pronouncing Dictionary includes pronunciations for 130,000 words. By matching the phonemes between words, phon provides

  • phonemes('threw') - Returns the phonemes for the pronunciation of “threw”.
  • homophones('steak') - Returns words which are homophones of “steak”.
  • rhymes('carry') - Returns words which rhyme with “carry”.
  • sounds_like('statistics', 3) - Returns words with a similar sound to “statistics” by limiting the mismatches in phonemes the other word can have.
  • contains_pronunciation('through') - Returns words which sound like they contain the given word
  • syllables("useless") Returns the count of syllables in “useless”.

This is a companion package to the syn package. syn finds related words based upon meanings, while phon finds related words based upon pronunciation.

Installation

You can install phon from github with:

devtools::install_github("coolbutuseless/phon")

Phonemes

Phonemes are the sounds which make up a word.

The phonetic encoding in phon come from the CMU Pronouncing Dictionary (cmudict) which encodes words using ARPABET.

phon::phonemes("cellar")
[1] "S EH L ER"

Since some words have mutliple pronunciations, the results of phon::phonemes() is always returned as a list, e.g. carry has two slightly different pronunciations.

phon::phonemes("carry")
[1] "K AE R IY" "K EH R IY"

ARPABET phonetic encoding includes stress markers as suffixes to vowel phonemes. The markers are:

  1. No stress
  2. Primary stress
  3. Secondary stress

You can ask for phonemes with/without the stress markers, e.g.

phon::phonemes("fantastic")
[1] "F AE N T AE S T IH K"

phon::phonemes("fantastic", keep_stresses = TRUE)
[1] "F AE0 N T AE1 S T IH0 K"

Syllables

The number of syllables in a word is the count of the number of phonemes with stress markers in the word.

This is pre-calculated and available through the phon::syllables() function.

phon::syllables("average")
[1] 3

phon::syllables("antidisestablishmentarianism")
[1] 12

Matching Pronunciation

phon allows you to search for the sound of one word within another.

In the following example, phon::contains_pronunciation() finds all the words that include the pronunciation of “through” within their pronunciation.

phon::contains_pronunciation("through")
 [1] "bathroom"      "bathrooms"     "breakthrough"  "breakthroughs"
 [5] "drive-thru"    "drive-thrus"   "overthrew"     "threw"        
 [9] "throop"        "throughout"    "throughput"    "throughs"     
[13] "throughway"    "thru"          "thruway"      

Use the keep_stresses argument to match with/without the stresses included (default is to ignore the stresses).

Homophones

Homophones are words with the same pronunciation but different spelling.

phon::homophones("steak")
[1] "stake"

phon::homophones("carry")
 [1] "carey"  "carie"  "carrey" "carrie" "cary"   "kairey" "kari"   "karry" 
 [9] "kary"   "kerrey" "kerri"  "kerry" 

Rhymes

To find rhymes, phon compares trailing phonemes. If the phonemes at the end of a word in the dictionary match those at the end of the given word, then they rhyme.

The rhymes are returned in multiple vectors:

  • Words with the most matching trailing phonemes are returned first.
  • Subsequent vectors have fewer matching trailing phonemes.
  • The names of the list are the number of trailing phonemes which match.
phon::rhymes("drudgery")
$`3`
 [1] "challengery"  "forgery"      "gingery"      "injury"       "margery"     
 [6] "marjorie"     "marjory"      "menagerie"    "neurosurgery" "perjury"     
[11] "surgery"     

$`2`
  [1] "acary"           "accessory"       "adoree"          "adultery"       
  [5] "advisory"        "alimentary"      "alphandery"      "ambery"         
  [9] .... (results trimmed)

In the above example:

  • The phonemes for “drudgery” are “D R AH1 JH ER0 IY0”
  • In the first vector, the words match the -gery sound, i.e the last 3 phonemes.
  • In the second vector the words only match the -ery sound, i.e. the last 2 phonemes.

Similar sounding words

Similar sounding words are found by comparing words with the same number of phonemes but with a number of mismatches allowed.

phon::sounds_like("statistics", phoneme_mismatches = 5)
 [1] "anaesthetics" "anesthetics"  "centronics"   "gymnastics"   "heuristics"  
 [6] "onomastics"   "scientific's" "scientifics"  "statistics'"  "stochastics" 
[11] "subsistence"  "synbiotics"