MediaWiki REL1_34
IEContentAnalyzer Class Reference

This class simulates Microsoft Internet Explorer's terribly broken and insecure MIME type detection algorithm. More...

Public Member Functions

 __construct ()
 
 getMimesFromData ( $fileName, $chunk, $proposed)
 Get the untranslated MIME types for all known versions.
 
 getRealMimesFromData ( $fileName, $chunk, $proposed)
 Get the MIME types from getMimesFromData(), but convert the result from IE's idiosyncratic private types into something other apps will understand.
 
 translateMimeType ( $type)
 Translate a MIME type from IE's idiosyncratic private types into more commonly understood type strings.
 

Protected Member Functions

 getDataFormat ( $version, $type)
 
 getMimeTypeForVersion ( $version, $fileName, $chunk, $proposed)
 Get the MIME type for a given named version.
 
 sampleData ( $version, $chunk)
 Do heuristic checks on the bulk of the data sample.
 

Protected Attributes

 $addedTypes
 Changes to the type table in later versions of IE.
 
 $baseTypeTable
 Relevant data taken from the type table in IE 5.
 
 $registry
 An approximation of the "Content Type" values in HKEY_CLASSES_ROOT in a typical Windows installation.
 
 $typeTable = []
 Type table with versions expanded.
 
 $versions = [ 'ie05', 'ie06', 'ie07', 'ie07.strict', 'ie07.nohtml' ]
 IE versions which have been analysed to bring you this class, and for which some substantive difference exists.
 

Private Member Functions

 checkBinaryHeaders ( $version, $chunk)
 Check for binary headers at the start of the chunk Confirmed same in 5 and 7.
 
 checkTextHeaders ( $version, $chunk)
 Check for text headers at the start of the chunk Confirmed same in 5 and 7.
 

Detailed Description

This class simulates Microsoft Internet Explorer's terribly broken and insecure MIME type detection algorithm.

It can be used to check web uploads with an apparently safe type, to see if IE will reinterpret them to produce something dangerous.

It is full of bugs and strange design choices should not under any circumstances be used to determine a MIME type to present to a user or client. (Apple Safari developers, this means you too.)

This class is based on a disassembly of IE 5.0, 6.0 and 7.0. Although I have attempted to ensure that this code works in exactly the same way as Internet Explorer, it does not share any source code, or creative choices such as variable names, thus I (Tim Starling) claim copyright on it.

It may be redistributed without restriction. To aid reuse, this class does not depend on any MediaWiki module.

Definition at line 27 of file IEContentAnalyzer.php.

Constructor & Destructor Documentation

◆ __construct()

IEContentAnalyzer::__construct ( )

Definition at line 314 of file IEContentAnalyzer.php.

References $addedTypes, and $baseTypeTable.

Member Function Documentation

◆ checkBinaryHeaders()

IEContentAnalyzer::checkBinaryHeaders ( $version,
$chunk )
private

Check for binary headers at the start of the chunk Confirmed same in 5 and 7.

Parameters
string$version
string$chunk
Returns
bool|string

Definition at line 585 of file IEContentAnalyzer.php.

Referenced by getMimeTypeForVersion().

◆ checkTextHeaders()

IEContentAnalyzer::checkTextHeaders ( $version,
$chunk )
private

Check for text headers at the start of the chunk Confirmed same in 5 and 7.

Parameters
string$version
string$chunk
Returns
bool|string

Definition at line 559 of file IEContentAnalyzer.php.

Referenced by getMimeTypeForVersion().

◆ getDataFormat()

IEContentAnalyzer::getDataFormat ( $version,
$type )
protected
Parameters
string$version
string | null$type
Returns
int|string

Definition at line 838 of file IEContentAnalyzer.php.

References $type.

Referenced by getMimeTypeForVersion().

◆ getMimesFromData()

IEContentAnalyzer::getMimesFromData ( $fileName,
$chunk,
$proposed )

Get the untranslated MIME types for all known versions.

Parameters
string$fileNamethe file name (unused at present)
string$chunkthe first 256 bytes of the file
string$proposedthe MIME type proposed by the server
Returns
array map of IE version to detected MIME type

Definition at line 375 of file IEContentAnalyzer.php.

References getMimeTypeForVersion().

Referenced by getRealMimesFromData().

◆ getMimeTypeForVersion()

IEContentAnalyzer::getMimeTypeForVersion ( $version,
$fileName,
$chunk,
$proposed )
protected

Get the MIME type for a given named version.

Parameters
string$version
string$fileName
string$chunk
string$proposed
Returns
bool|string

Definition at line 391 of file IEContentAnalyzer.php.

References $ext, $type, checkBinaryHeaders(), checkTextHeaders(), getDataFormat(), and sampleData().

Referenced by getMimesFromData().

◆ getRealMimesFromData()

IEContentAnalyzer::getRealMimesFromData ( $fileName,
$chunk,
$proposed )

Get the MIME types from getMimesFromData(), but convert the result from IE's idiosyncratic private types into something other apps will understand.

Parameters
string$fileNamethe file name (unused at present)
string$chunkthe first 256 bytes of the file
string$proposedthe MIME type proposed by the server
Returns
array map of IE version to detected MIME type

Definition at line 337 of file IEContentAnalyzer.php.

References getMimesFromData().

◆ sampleData()

IEContentAnalyzer::sampleData ( $version,
$chunk )
protected

Do heuristic checks on the bulk of the data sample.

Search for HTML tags.

Parameters
string$version
string$chunk
Returns
array

Definition at line 686 of file IEContentAnalyzer.php.

Referenced by getMimeTypeForVersion().

◆ translateMimeType()

IEContentAnalyzer::translateMimeType ( $type)

Translate a MIME type from IE's idiosyncratic private types into more commonly understood type strings.

Parameters
string$type
Returns
string

Definition at line 349 of file IEContentAnalyzer.php.

References $type.

Member Data Documentation

◆ $addedTypes

IEContentAnalyzer::$addedTypes
protected
Initial value:
= [
'ie07' => [
'text' => [ 'text/xml', 'application/xml' ]
],
]

Changes to the type table in later versions of IE.

Definition at line 55 of file IEContentAnalyzer.php.

Referenced by __construct().

◆ $baseTypeTable

IEContentAnalyzer::$baseTypeTable
protected
Initial value:
= [
'ambiguous' => [
'text/plain',
'application/octet-stream',
'application/x-netcdf',
],
'text' => [
'text/richtext', 'image/x-bitmap', 'application/postscript', 'application/base64',
'application/macbinhex40', 'application/x-cdf', 'text/scriptlet'
],
'binary' => [
'application/pdf', 'audio/x-aiff', 'audio/basic', 'audio/wav', 'image/gif',
'image/pjpeg', 'image/jpeg', 'image/tiff', 'image/x-png', 'image/png', 'image/bmp',
'image/x-jg', 'image/x-art', 'image/x-emf', 'image/x-wmf', 'video/avi',
'video/x-msvideo', 'video/mpeg', 'application/x-compressed',
'application/x-zip-compressed', 'application/x-gzip-compressed', 'application/java',
'application/x-msdownload'
],
'html' => [ 'text/html' ],
]

Relevant data taken from the type table in IE 5.

Definition at line 31 of file IEContentAnalyzer.php.

Referenced by __construct().

◆ $registry

IEContentAnalyzer::$registry
protected

An approximation of the "Content Type" values in HKEY_CLASSES_ROOT in a typical Windows installation.

Used for extension to MIME type mapping if detection fails.

Definition at line 67 of file IEContentAnalyzer.php.

◆ $typeTable

IEContentAnalyzer::$typeTable = []
protected

Type table with versions expanded.

Definition at line 312 of file IEContentAnalyzer.php.

◆ $versions

IEContentAnalyzer::$versions = [ 'ie05', 'ie06', 'ie07', 'ie07.strict', 'ie07.nohtml' ]
protected

IE versions which have been analysed to bring you this class, and for which some substantive difference exists.

These will appear as keys in the return value of getRealMimesFromData(). The names are chosen to sort correctly.

Definition at line 307 of file IEContentAnalyzer.php.


The documentation for this class was generated from the following file: