IDLeDOM
DOM interfaces automatically generated from WebIDL
|
This document describes the PHP WebIDL binding used by IDLeDOM.
Since there does not seem to be an official WebIDL binding for PHP, this documents the implementation choices that IDLeDOM has made to map WebIDL names and types to PHP. Where possible correspondence has been maintained with the PHP DOMDocument
/etc classes, although the PHP dom
extension classes appear to be an adhoc binding of the libxml
C library not a rigorous WebIDL mapping.
All WebIDL interfaces are defined in the namespace Wikimedia\IDLeDOM\
. For example, the WebIDL interface 'Document' corresponds to the PHP interface Wikimedia\IDLeDOM\Document
. A particular implementation of the DOM will implement these interfaces in a different namespace; for example the implementation of Document
might be in a class Wikimedia\Dodo\Document
which implements Wikimedia\IDLeDOM\Document
.
It is expected that users will obtain a DOMImplementation
or Document
object by some mechanism, and use that to create all concrete DOM objects. Code which attempts to maintain implementation-independence will avoid direct references to the particular implementation class, and instead refer to the IDLeDOM interface.
It is expected that users can use type aliasing (ie, PHP class_alias
) to use a compatible DOM implementation—that is, implementations which adhere to this binding specification but don't inherit from the IDLeDOM interfaces.
TODO: It is hoped that class_alias
can also be used to be used to provide a migration path for code currently using DOMDocument
, or to allow code to support either the built-in DOMDocument
classes or an IDLeDOM
-compatible implementation.
To form a PHP identifier from a name in the WebIDL spec, first replace any non-alphanumeric characters with _
. (This isn't generally needed except when turning enumeration values into identifiers.)
Since PHP has a number of reserved words in the language, identifiers of PHP constructs corresponding to WebIDL definitions then need to be escaped to avoid conflicts. There are also reserved method names used for PHP interfaces like ArrayAccess
and Countable
which could conflict with WebIDL definitions. Finally, WebIDL properties are turned into getter and setter methods which share a namespace with WebIDL operations; this could also produce a conflict.
Conflicts are resolved as follows:
If the WebIDL name conflicts with a PHP reserved word or other previously-assigned name, then the PHP escaped name is the prefix
idl_
, followed by the smallest number of underscore (_
) characters needed to avoid a conflict (perhaps zero), followed by the WebIDL name. Otherwise there is no conflict, and the PHP escaped name is simply the name.
The PHP reserved words include any identifier starting with two underscores (__
), which are reserved in PHP as magical. The keyword class
, all predefined constants, and other reserved words (including soft reserved words) are also reserved. (Other than class
we don't need to reserve the PHP keywords and compile-time constants because starting in PHP 7.0.0 they are allowed as property, constant, and method names.)
In WebIDL dictionary
types, the following names are also reserved: offsetExists
, offsetGet
, offsetSet
, and offsetUnset
. (These are used to implement ArrayAccess
.)
In WebIDL callback
types, the name invoke
is reserved. (This name is not reserved in callback interface
types.)
In WebIDL callback
, callback interface
, and dictionary
types, the name cast
is reserved.
In WebIDL enumeration
types, the name cast
is reserved.
In WebIDL interface
types, the names getIterator
and count
are reserved. (These are used to implement IteratorAggregate
and Countable
.)
If the WebIDL defines an "unnamed" indexed getter, named getter, indexed setter, named setter, indexed deleter, named deleter, or stringifier operation, then the names item
, namedItem
, setItem
, setNamedItem
, removeItem
, removeNamedItem
, or toString
(respectively) are reserved.
When determining the PHP escaped name corresponding to an WebIDL name, conflicts are resolved in the following order:
Concretely, given the following WebIDL fragment:
The following names would be assigned and reserved:
setBat
: the operation on Foo
idl_setBat
: the constant on Bar
getBat
: the getter of attribute bat
of Bar
idl__setBat
: the setter of attribute bat
of Bar
idl___setBat
: the operation on Bar
Note in particular the assymmetry in the names of the getter and setter on attribute bat
. This is unfortunate, but fortunately conflicts such as these are extremely rare.
This sub-section describes how types in the WebIDL map to types in PHP.
The any
WebIDL type corresponds to a PHP ?mixed
value. No boxing of types is required in PHP.
Methods on PHP objects that implement an operation whose WebIDL specifies a undefined
return type must be declared to have a return type of void
. Other uses of undefined
will be mapped to null
in PHP.
The WebIDL boolean
type maps exactly to the PHP bool
type.
The WebIDL byte
, octet
, short
, unsigned short
, long
and unsigned long
types correspond to the PHP int
type.
Note that while the WebIDL unsigned long
type is unsigned, with a range of [0, 4294967295], the PHP int
type is signed, and on 32-bit platforms may have a range of [−2147483648, 2147483647]. To encode an WebIDL unsigned long
type in a PHP int, the following steps are followed on all platforms regardless of the value of PHP_INT_MAX
on that platform:
x
be the WebIDL unsigned long
value to encode.x
< 2147483648, return a PHP int
whose value is x
.x
≥ 2147483648. Return a PHP int whose value is x − 4294967296
.Note that this is the same as casting to a 32-bit int in most languages.
To decode an WebIDL unsigned long
value from a PHP int
, the following steps must be followed:
x
be the PHP int value to decode.x
≥ 0, return an WebIDL unsigned long
whose value is x
.x
< 0. Return an WebIDL unsigned long
whose value is x + 4294967296
.Note that in PHP this is the same as performing a bit-wise AND of the int
value with the long constant 0xffffffff
.
long long
, unsigned long long
, and bigint
The WebIDL long long
, unsigned long long
, and bigint
types map to a PHP arbitrary-length integer using either the GMP or BCMath. Precise details will be determined when there is need.
The WebIDL float
, unrestricted float
, double
, and unrestricted double
types map to the PHP float
type.
sequence<T>
The WebIDL sequence<T>
type corresponds to a PHP array, with elements having the appropriate mapped type for T
. In PHP type hints, the type will be array
, and in phan type annotations, the type will be list<T>
, which denotes a list of objects with consecutive integer keys starting from 0.
A PHP object implementing an interface with an operation declared to return a sequence<T>
value must not return null
from the corresponding method. Similarly, a getter method for an WebIDL attribute must not return null
if the attribute is declared to be of type sequence<T>
.
sequence<octet>
and sequence<unsigned short>
As a special case, WebIDL sequence<octet>
and sequence<unsigned short>
are represented by PHP string in ASCII and UTF-16 encodings, respectively. As with other sequence
types, null
is not a valid value of these WebIDL types.
TODO: No DOM interface yet uses these types.
PHP uses string
for all WebIDL string types. Strings are expected to be encoded in UTF-8.
As described in a note in the CSS Object Model spec the main difference between DOMString and USVString has to do with how surrogate code units are handled. DOMString allows surrogates, while USVString replaces them with U+FFFD. As rationale the spec states that "well-formed UTF-8 specifically disallows surrogate code points".
Since we represent strings natively as UTF-8, it is true that surrogates don't appear in our native representation. However, for those interfaces (CharacterData
and Text
, for example) for which the spec requires offsets to be in UTF-16 units, it is possible that following the spec will generate unpaired surrogates. For example, if you extract a substring (using UTF-16 offsets) that contains an unmatched surrogate, that lonely surrogate would need to be converted back to UTF-8 before being returned – but UTF-8 cannot properly represent these characters.
The behavior of a PHP WebIDL implementation following this specification should be considered undefined in these situations.
Indeed, for the special case of the CharacterData
APIs, this spec makes it implementation-defined whether the offsets will be measured in UTF-16 code units or unicode code points (see the PHP compatibility section for more details). If measuring in unicode code points, the creation of unpaired surrogates will not occur.
However, if the implementation elects to use offsets measured in UTF-16 code units, one may use (for example) PHP's built-in mb_convert_encoding
to convert between UTF-16 and UTF-8. The mb_convert_encoding
function will drop an unpaired initial surrogate and return an ASCII question mark character for an unpaired trailing surrogate. This may be surprising:
Alternatively, an implementation using UTF-16 offsets might choose to fall back to a WTF-8 encoding of the unpaired surrogate. String concatenation of unpaired surrogates is likely to lead to further unexpected behavior, however, and PHP does not contain robust support for WTF-8. (Another alternative would be CESU-8, but that alters the encoding of characters outside the Basic Multilingual Plane even where surrogate-splitting is not involved and so would be explicitly in violation of this spec, which mandates UTF-8 for all results not involving unpaired surrogates.)
Perhaps the safest option for implementations using UTF-16 offsets is to throw an exception if an unpaired surrogate is encountered.
The WebIDL object
type maps to the PHP object
type.
The WebIDL symbol
type is not supported.
WebIDL enumeration types correspond to the PHP string
type. (There is a PHP interface created to hold the enumeration values as class constants, but enumeration values are string
s, not objects implementing this interface.)
See the section on Enumeration interfaces below.
Exceptions are represented by the objects implementing the interfaces defined in the Exceptions section below. They will also implement the PHP Throwable
type.
A PHP object that implements an WebIDL interface must be of a PHP class that implements the PHP interface that corresponds to the WebIDL interface.
Every WebIDL interface corresponds to a PHP interface, with the following members:
For each constant defined on the WebIDL interface, there is a corresponding PHP class constant:
public
visibility.For each operation defined on the WebIDL interface, there must be a corresponding method declared on the PHP interface with the following properties:
public
visibility.Interfaces with an iterable
member defined shall extend IteratorAggregate
and define a getIterator
method which implements the iterable
member (if unnamed) or dispatches to the iterable
member (if named).
XXX: WebIDL interfaces with a pair iterator are not yet supported.
Interfaces with a stringifier
member defined will implement the PHP magic method __toString()
, which implements the stringification behavior of the interface (if unnamed) or dispatches to the stringifier
member (if named).
Interfaces with either a named property getter, indexed property getter, named property setter, indexed property getter, or named property deleter shall extend ArrayAccess
. The ArrayAccess
methods will dispatch to the named property getter/*named property setter*/*named property deleter* if the given offset is_string
, and will dispatch to the indexed property getter/*indexed property setter* if the offset is_numeric
. Otherwise behavior is undefined, but implementations should throw an exception.
WebIDL interfaces often contain a readonly attribute unsigned long length
member. In combination with an indexed property getter, the presence of a length
property is sufficient to make the interface an "array-like" in JavaScript, usable in a number of array contexts including for...of loops. These members have been tagged in our WebIDL with the (nonstandard) [PHPCountable]
extended attribute. Interfaces containing a [PHPCountable]
member shall extend the PHP Countable
interface, and Countable::count()
shall invoke the getter for the [PHPCountable]
member and return the result.
Interfaces which have a indexed property getter and a [PHPCountable]
member should be iterable, as an "array-like" would be in JavaScript. Therefore these interfaces shall have a unnamed iterable
member; these additional members added in the PHP binding have been tagged in our WebIDL with the (nonstandard) [PHPExtension]
extended attribute.
For each attribute defined on the WebIDL interface, there shall be a corresponding getter method declared on the PHP interface with the following properties:
public
visibility.get
, followed by the first character of the identifier of the attribute uppercased (as if the identifier were passed to the ucfirst
function). This name is then PHP escaped to resolve conflicts.For each attribute defined on the WebIDL interface that is not [read only], there shall be a corresponding setter method declared on the PHP interface with the following properties:
public
visibility.void
.set
, followed by the first character of the identifier of the attribute uppercased (as if the identifier were passed to the ucfirst
function). This name is then PHP escaped to resolve conflicts.For each attribute defined on the WebIDL interface that is [read only] and is declared with the [PutForwards
] extended attribute, there shall be a corresponding setter method declared on the PHP interface with the following properties:
public
visibility.void
.set
, followed by the first character of the identifier of the attribute uppercased (as if the identifier were passed to the ucfirst
function). This name is then PHP escaped to resolve conflicts.PutForwards
extended attribute on the interface type that this attribute is declared to be of, according to the rules in the [type section above].Certain IDL attributes are defined in the relevant specifications to reflect the value of Element attributes. These do not affect the interface declaration of the IDL attribute. Since implementation details are out-of-scope for this document, we will briefly note that IDLeDOM uses extended attributes named [Reflect]
, [ReflectURL]
, [ReflectEnum]
, [ReflectDefault]
, [ReflectMissing]
, and [ReflectInvalid]
in the IDL specification to guide the automatic creation of getter and setter methods for reflected attributes.
In every interface with attributes, the PHP magic methods __get()
, __set()
, __isset()
, and __unset()
should be defined with the following behavior:
__get($name)
method should invoke the PHP getter method for $name
and return the result.__set($name, $value)
method should invoke the PHP setter method for $name
, unless the attribute is declared read only in which case an appropriate exception should be thrown.__isset($name)
method should invoke __get($name)
and return false
if the value returned is null
, or true
otherwise.__unset($name)
method should invoke __set($name, null)
.The behavior of __get()
and __set()
if $name
is not a valid attribute of the interface is undefined. Implementations may elect to throw an exception, or else to permit creation of dynamic properties by storing and fetching values indexed by these names in an auxiliary array.
Note: the helper traits provided by WebIDL
choose to invoke a helper method (_getMissingProp
or _setMissingProp
) when $name
is not a valid attribute of the interface. The default implementation of these methods in the helper traits throws an exception, but by providing a specific implementation of these methods other behaviors can be chosen by a DOM implementation.
XXX: perhaps a 'standard' extension point should be provided as well?
The use of these compatibility methods is not recommended in performance-critical code. They are provided to provide compatibliity with existing code written to use the built-in DOMDocument
interface and for the convenience of using property syntax, but the invocation of magic methods has a steep performance cost in PHP.
XXX: ensure that there is no cost associated with simply defining the magic methods, even if they are not used.
WebIDL dictionary
objects are PHP abstract classes which extend one other PHP class or interface: dictionaries which directly inherit from another dictionary shall extend the PHP abstract class corresponding to that inherited dictionary, and dictionaries with no inherited dictionaries shall extend the PHP ArrayAccess
interface.
Dictionary classes shall have getter methods for every field of the dictionary:
public
visibility.get
, followed by the first character of the identifier of the field name uppercased (as if the identifier were passed to the ucfirst
function). This name is then PHP escaped to resolve conflicts.In every class corresponding to a dictionary, the PHP magic methods __get()
and [__isset()
] (and optionally __set()
and __unset()
) should be defined with the following behavior:
__get($name)
method should invoke the PHP getter method for the field $name
and return the result.__set($name, $value)
method should throw an appropriate exception (if __set()
is defined).__isset($name)
method should invoke __get($name)
and return false
if the value returned is null
, or true
otherwise.__unset($name)
method should throw an appropriate exception (if __unset()
is defined).The ArrayAccess
methods of a dictionary should have the following behavior:
offsetExists
method should return true
iff the dictionary has a field with the given WebIDL name (no PHP escaping is done on the name).offsetGet
method should invoke the PHP getter method for the named field and return the result.offsetSet
method should throw an appropriate exception.offsetUnset
method should throw an appropriate exception.In addition, classes corresponding to a dictionary should define a static method named cast
with a single argument:
cast
method should return an object extending the class corresponding to the dictionary.cast
method should be either an associative array or an object extending the class corresponding to the dictionary.WebIDL callback
objects shall be treated as if they were callback interface
s with a single regular operation member named invoke
.
A callback interface
shall correspond to a PHP interface with constant and operation members as defined above for interface
types.
In addition, the PHP interface corresponding to a [callback interface
] shall define the PHP magic method __invoke()
. This magic method shall invoke the single regular operation of the callback interface
with the same arguments it is given, and return the result. (This ensure that the callback interface
is a PHP callable.)
In addition, classes implementing the interface shall define a static method named cast
with a single argument:
cast
method shall return an object implementing the interface corresponding to the callback interface
.cast
method should be either a PHP callable or an object implementing the PHP interface corresponding to the callback interface
.callback interface
, the argument shall be directly returned.callback interface
shall be returned, where invoking the single regular operation of the callback interface
will invoke the callable with the same arguments, and return the result.WebIDL enumerations shall correspond to a PHP final class with an string constant for every enumeration value.
For each enumeration value:
public
visibility.string
type.In addition, the enumeration class will have a static method named cast
which will throw a TypeError
exception if its argument is not a string matching a valid enumeration value; otherwise it will return its argument.
For example, the following WebIDL:
corresponds to the PHP interface:
A conforming PHP implementation must have a PHP interface corresponding to every WebIDL exception that is supported. For simple exceptions, the interface name must be the [PHP escaped] identifier of the WebIDL simple exception name: Error
, EvalError
, RangeError
, ReferenceError
, TypeError
, or URIError
. These PHP interfaces must have only the public
modifier. All simple exception interface classes will also extend an empty marker interface named SimpleException
.
In addition, the PHP implementation must have a PHP interface named DOMException
corresponding to the WebIDL exception DOMException type. This PHP interface must correspond in the usual way to the WebIDL interface for DOMException given in the WebIDL spec. DOMException
will not extend SimpleException
.
Note that because these exceptions are represented as interface types, an implementation may chose to (for example) subclass PHP built-in types such as RangeException
and DOMException
as long as these subclasses also implement the appropriate interface types.
In this section we describe some specific differences between the binding as described above, the names resulting from the DOMDocument
classes, and the standard JavaScript bindings. In general IDL interfaces, mixins, attributes, or operations which occur in IDLeDOM for PHP interoperability or compatibility with PHP's built-in DOMDocument
classes have been marked with the extended attribute [PHPExtension]
.
PHP contains a writeable encoding
attribute in the definition of the built-in DOMDocument
. We have copied that non-standard attribute to our IDL for Document
.
PHP contains methods on DOMDocument
named loadHTML
, loadXML
, saveHTML
, and saveXML
which are roughly equivalent to the standard DOMParser
and XMLSerializer
classes (which are otherwise missing from PHP). Similarly, DOMDocumentFragment
has a non-standard method named appendXML
. It is recommended that implementers implement these for compatibility with legacy code. (We don't support calling DOMDocument::loadHTML()
or DOMDocument::loadXML()
statically, which generates an E_STRICT
error in modern PHP.)
PHP contains methods on DOMElement
named setIdAttribute
, setIdAttributeNode
, and setIdAttributeNS
. In legacy code using DOMDocument
calling one of these methods is necessary in order to make DOMDocument::getElementById
work properly, so these methods have been added to our IDL for Element
. It is recommended that implementers implement them as no-ops for compatibility with legacy code.
Various methods on DOMCharacterData
are defined in the spec to take or return offsets in UTF-16 code units. Originally the PHP DOM extension also used UTF-16 code units (and the PHP documentation still claims it does) however in PHP5 the implementation was changed to measure in unicode code points instead. This primarily affects the handling of UTF-16 surrogate pairs.
As discussed in the definition of the string types above, the implementation is allowed to choose whether it implements the DOM standard/pre-PHP5 behavior or else the modern PHP5-and-later behavior for these methods. If the DOM standard/pre-PHP5 behavior is implemented, there are additional issues with the handling of unpaired surrogates which can't be represented in UTF-8.