MediaWiki  1.23.0
generateNormalizerDataAr.php
Go to the documentation of this file.
1 <?php
24 require_once __DIR__ . '/../Maintenance.php';
25 
33  public function __construct() {
34  parent::__construct();
35  $this->mDescription = 'Generate the normalizer data file for Arabic';
36  $this->addOption( 'unicode-data-file', 'The local location of the data file ' .
37  'from http://unicode.org/Public/UNIDATA/UnicodeData.txt', false, true );
38  }
39 
40  public function getDbType() {
41  return Maintenance::DB_NONE;
42  }
43 
44  public function execute() {
45  if ( !$this->hasOption( 'unicode-data-file' ) ) {
46  $dataFile = 'UnicodeData.txt';
47  if ( !file_exists( $dataFile ) ) {
48  $this->error( "Unable to find UnicodeData.txt. Please specify " .
49  "its location with --unicode-data-file=<FILE>" );
50  exit( 1 );
51  }
52  } else {
53  $dataFile = $this->getOption( 'unicode-data-file' );
54  if ( !file_exists( $dataFile ) ) {
55  $this->error( 'Unable to find the specified data file.' );
56  exit( 1 );
57  }
58  }
59 
60  $file = fopen( $dataFile, 'r' );
61  if ( !$file ) {
62  $this->error( 'Unable to open the data file.' );
63  exit( 1 );
64  }
65 
66  // For the file format, see http://www.unicode.org/reports/tr44/
67  $fieldNames = array(
68  'Code',
69  'Name',
70  'General_Category',
71  'Canonical_Combining_Class',
72  'Bidi_Class',
73  'Decomposition_Type_Mapping',
74  'Numeric_Type_Value_6',
75  'Numeric_Type_Value_7',
76  'Numeric_Type_Value_8',
77  'Bidi_Mirrored',
78  'Unicode_1_Name',
79  'ISO_Comment',
80  'Simple_Uppercase_Mapping',
81  'Simple_Lowercase_Mapping',
82  'Simple_Titlecase_Mapping'
83  );
84 
85  $pairs = array();
86 
87  $lineNum = 0;
88  while ( false !== ( $line = fgets( $file ) ) ) {
89  ++$lineNum;
90 
91  # Strip comments
92  $line = trim( substr( $line, 0, strcspn( $line, '#' ) ) );
93  if ( $line === '' ) {
94  continue;
95  }
96 
97  # Split fields
98  $numberedData = explode( ';', $line );
99  $data = array();
100  foreach ( $fieldNames as $number => $name ) {
101  $data[$name] = $numberedData[$number];
102  }
103 
104  $code = base_convert( $data['Code'], 16, 10 );
105  if ( ( $code >= 0xFB50 && $code <= 0xFDFF ) # Arabic presentation forms A
106  || ( $code >= 0xFE70 && $code <= 0xFEFF ) # Arabic presentation forms B
107  ) {
108  if ( $data['Decomposition_Type_Mapping'] === '' ) {
109  // No decomposition
110  continue;
111  }
112  if ( !preg_match( '/^ *(<\w*>) +([0-9A-F ]*)$/',
113  $data['Decomposition_Type_Mapping'], $m )
114  ) {
115  $this->error( "Can't parse Decomposition_Type/Mapping on line $lineNum" );
116  $this->error( $line );
117  continue;
118  }
119 
120  $source = hexSequenceToUtf8( $data['Code'] );
121  $dest = hexSequenceToUtf8( $m[2] );
122  $pairs[$source] = $dest;
123  }
124  }
125 
126  global $IP;
127  file_put_contents( "$IP/serialized/normalize-ar.ser", serialize( $pairs ) );
128  echo "ar: " . count( $pairs ) . " pairs written.\n";
129  }
130 }
131 
132 $maintClass = 'GenerateNormalizerDataAr';
133 require_once RUN_MAINTENANCE_IF_MAIN;
php
skin txt MediaWiki includes four core it has been set as the default in MediaWiki since the replacing Monobook it had been been the default skin since before being replaced by Vector largely rewritten in while keeping its appearance Several legacy skins were removed in the as the burden of supporting them became too heavy to bear Those in etc for skin dependent CSS etc for skin dependent JavaScript These can also be customised on a per user by etc This feature has led to a wide variety of user styles becoming that gallery is a good place to ending in php
Definition: skin.txt:62
Maintenance\addOption
addOption( $name, $description, $required=false, $withArg=false, $shortName=false)
Add a parameter to the script.
Definition: Maintenance.php:169
RUN_MAINTENANCE_IF_MAIN
require_once RUN_MAINTENANCE_IF_MAIN
Definition: maintenance.txt:50
GenerateNormalizerDataAr\__construct
__construct()
Default constructor.
Definition: generateNormalizerDataAr.php:33
Maintenance
Abstract maintenance class for quickly writing and churning out maintenance scripts with minimal effo...
Definition: maintenance.txt:39
$maintClass
$maintClass
Definition: generateNormalizerDataAr.php:132
array
the array() calling protocol came about after MediaWiki 1.4rc1.
List of Api Query prop modules.
global
when a variable name is used in a it is silently declared as a new masking the global
Definition: design.txt:93
$line
$line
Definition: cdb.php:57
$name
Allows to change the fields on the form that will be generated $name
Definition: hooks.txt:336
Maintenance\DB_NONE
const DB_NONE
Constants for DB access type.
Definition: Maintenance.php:57
hexSequenceToUtf8
hexSequenceToUtf8( $sequence)
Take a series of space-separated hexadecimal numbers representing Unicode code points and return a UT...
Definition: UtfNormalUtil.php:61
$file
if(PHP_SAPI !='cli') $file
Definition: UtfNormalTest2.php:30
Maintenance\getOption
getOption( $name, $default=null)
Get an option, or return the default.
Definition: Maintenance.php:191
as
This document is intended to provide useful advice for parties seeking to redistribute MediaWiki to end users It s targeted particularly at maintainers for Linux since it s been observed that distribution packages of MediaWiki often break We ve consistently had to recommend that users seeking support use official tarballs instead of their distribution s and this often solves whatever problem the user is having It would be nice if this could such as
Definition: distributors.txt:9
$source
if(PHP_SAPI !='cli') $source
Definition: mwdoc-filter.php:18
Maintenance\error
error( $err, $die=0)
Throw an error to the user.
Definition: Maintenance.php:333
GenerateNormalizerDataAr\execute
execute()
Do the actual work.
Definition: generateNormalizerDataAr.php:44
GenerateNormalizerDataAr
Generates the normalizer data file for Arabic.
Definition: generateNormalizerDataAr.php:32
Maintenance\hasOption
hasOption( $name)
Checks to see if a particular param exists.
Definition: Maintenance.php:181
GenerateNormalizerDataAr\getDbType
getDbType()
Does the script need different DB access? By default, we give Maintenance scripts normal rights to th...
Definition: generateNormalizerDataAr.php:40
$IP
$IP
Definition: WebStart.php:88