IPAValidator
Validation and normalization of IPA
Loading...
Searching...
No Matches
Composer package for validating and normalizing IPA

Basic usage

// Load composer's autoloader
require_once __DIR__ . '/vendor/autoload.php';
// Load the validator
use Wikimedia\\IPAValidator\\Validator;
/*
* Create a new validator with the options:
* - Remove delimiters (defaults to true)
* - Normalize IPA (defaults to false)
* - Normalize to Google TTS standard (defaults to false)
*/
$validator = new Validator( '/pʰə̥ˈkj̊uːliɚ/', true, true, true );
// Check if the IPA is valid
echo $validator->valid; # true
// Get the normalized IPA
echo $validator->normalizedIPA; # phəˈkjuːliɚ
// Get the original IPA
echo $validator->originalIPA; # /pʰə̥ˈkj̊uːliɚ/

Options

When constructing a new Validator, you can set the following options:

public function __construct( $ipa, $strip = true, $normalize = false, $google = false )

Remove delimiters

This option will remove some delimiters from the IPA — currently /.../ and [...]

Normalize IPA

When $google is false, this option will normalize the IPA and remove commonly mistaken unicode characters (for example, using : instead of ː in a word such as tenoːt͡ʃˈtit͡ɬan).

Normalize IPA for Google TTS

As part of a work project, we're feeding IPA to Google's TTS engine — Google is a little opinionated about things like diacritics. For example, the IPA ˈɔːfɫ̩ would not render correctly in Google TTS. A custom charmap is used to normalize certain characters:

$charmap = [
[ '(', '' ],
[ ')', '' ],
// 207F
[ 'ⁿ', 'n' ],
// 02B0
[ 'ʰ', 'h' ],
// 026B
[ 'ɫ', 'l' ],
// 02E1
[ 'ˡ', 'l' ],
// 02B2
[ 'ʲ', 'j' ],
];

Setting $google to true also removes all diacritics from the IPA string.

The Regex

^[().a-z|æçðøħŋœǀ-ǃɐ-ɻɽɾʀ-ʄʈ-ʒʔʕʘʙʛ-ʝʟʡʢʰʲʷʼˀˈˌːˑ˞ˠˡˤ-˩̴̘̙̜̝̞̟̠̤̥̩̪̬̯̰̹̺̻̼̀́̂̃̄̆̈̊̋̌̏̽̚͜͡βθχ᷄᷅᷈‖‿ⁿⱱ]+$

I've also placed it at https://regex101.com/r/f2Qhuk if you think you can improve it... (please do!)