DokuWiki 2006-11-06 : Vue détaillée de utf8.php

Fonctions qui ne font pas partie d'une Classe:

URL-Encode a filename to allow unicodecharacters

Slashes are not encoded

When the second parameter is true the string will
be encoded only if non ASCII characters are detected -
This makes it safe to run it multiple times on the
same string (default is true)

author: Andreas Gohr <andi@splitbrain.org>

utf8_decodeFN($file) X-Ref

URL-Decode a filename

This is just a wrapper around urldecode

author: Andreas Gohr <andi@splitbrain.org>

utf8_isASCII($str) X-Ref

Checks if a string contains 7bit ASCII only

author: Andreas Gohr <andi@splitbrain.org>

utf8_strip($str) X-Ref

Strips all highbyte chars

Returns a pure ASCII7 string

author: Andreas Gohr <andi@splitbrain.org>

utf8_check($Str) X-Ref

Tries to detect if a string is in Unicode encoding

author: <bmorel@ssi.fr>

utf8_strlen($string) X-Ref

Unicode aware replacement for strlen()

utf8_decode() converts characters that are not in ISO-8859-1
to '?', which, for the purpose of counting, is alright - It's
even faster than mb_strlen.

author: <chernyshevsky at hotmail dot com>

utf8_substr($str, $offset, $length = null) X-Ref

UTF-8 aware alternative to substr

Return part of a string given character offset (and optionally length)

author: Harry Fuecks <hfuecks@gmail.com>
author: Chris Smith <chris@jalakai.co.uk>
param: string
param: integer number of UTF-8 characters offset (from left)
param: integer (optional) length in UTF-8 characters from offset
return: mixed string or FALSE if failure

utf8_substr_replace($string, $replacement, $start , $length=0 ) X-Ref

Unicode aware replacement for substr_replace()

author: Andreas Gohr <andi@splitbrain.org>

utf8_explode($sep, $str) X-Ref

Unicode aware replacement for explode

author: Harry Fuecks <hfuecks@gmail.com>

utf8_str_replace($s,$r,$str) X-Ref

Unicode aware replacement for strrepalce()

author: Harry Fuecks <hfuecks@gmail.com>

utf8_ltrim($str,$charlist='') X-Ref

Unicode aware replacement for ltrim()

author: Andreas Gohr <andi@splitbrain.org>
return: string

utf8_rtrim($str,$charlist='') X-Ref

Unicode aware replacement for rtrim()

author: Andreas Gohr <andi@splitbrain.org>
return: string

utf8_trim($str,$charlist='') X-Ref

Unicode aware replacement for trim()

author: Andreas Gohr <andi@splitbrain.org>
return: string

utf8_strtolower($string) X-Ref

This is a unicode aware replacement for strtolower()

Uses mb_string extension if available

author: Andreas Gohr <andi@splitbrain.org>

utf8_strtoupper($string) X-Ref

This is a unicode aware replacement for strtoupper()

Uses mb_string extension if available

author: Andreas Gohr <andi@splitbrain.org>

utf8_deaccent($string,$case=0) X-Ref

Replace accented UTF-8 characters by unaccented ASCII-7 equivalents

Use the optional parameter to just deaccent lower ($case = -1) or upper ($case = 1)
letters. Default is to deaccent both cases ($case = 0)

author: Andreas Gohr <andi@splitbrain.org>

utf8_romanize($string) X-Ref

Romanize a non-latin string

author: Andreas Gohr <andi@splitbrain.org>

utf8_stripspecials($string,$repl='',$additional='') X-Ref

Removes special characters (nonalphanumeric) from a UTF-8 string

This function adds the controlchars 0x00 to 0x19 to the array of
stripped chars (they are not included in $UTF8_SPECIAL_CHARS)

author: Andreas Gohr <andi@splitbrain.org>
param: string $string The UTF8 string to strip of special chars
param: string $repl Replace special with this string
param: string $additional Additional chars to strip (used in regexp char class)

utf8_strpos($haystack, $needle,$offset=0) X-Ref

This is an Unicode aware replacement for strpos

Uses mb_string extension if available

author: Harry Fuecks <hfuecks@gmail.com>

utf8_tohtml($str) X-Ref

Encodes UTF-8 characters to HTML entities

author: <vpribish at shopping dot com>

utf8_to_unicode($str,$strict=false) X-Ref

Takes an UTF-8 string and returns an array of ints representing the
Unicode characters. Astral planes are supported ie. the ints in the
output can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates
are not allowed.

If $strict is set to true the function returns false if the input
string isn't a valid UTF-8 octet sequence and raises a PHP error at
level E_USER_WARNING

Note: this function has been modified slightly in this library to
trigger errors on encountering bad bytes

author: <hsivonen@iki.fi>
author: Harry Fuecks <hfuecks@gmail.com>
param: string UTF-8 encoded string
param: boolean Check for invalid sequences?
return: mixed array of unicode code points or FALSE if UTF-8 invalid

unicode_to_utf8($arr,$strict=false) X-Ref

Takes an array of ints representing the Unicode characters and returns
a UTF-8 string. Astral planes are supported ie. the ints in the
input can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates
are not allowed.

If $strict is set to true the function returns false if the input
array contains ints that represent surrogates or are outside the
Unicode range and raises a PHP error at level E_USER_WARNING

Note: this function has been modified slightly in this library to use
output buffering to concatenate the UTF-8 string (faster) as well as
reference the array by it's keys

author: <hsivonen@iki.fi>
author: Harry Fuecks <hfuecks@gmail.com>
param: array of unicode code points representing a string
param: boolean Check for invalid sequences?
return: mixed UTF-8 string or FALSE if array contains invalid code points

utf8_to_utf16be(&$str, $bom = false) X-Ref

UTF-8 to UTF-16BE conversion.

Maybe really UCS-2 without mb_string due to utf8_to_unicode limits

utf16be_to_utf8(&$str) X-Ref

UTF-8 to UTF-16BE conversion.

Maybe really UCS-2 without mb_string due to utf8_to_unicode limits

utf8_bad_replace($str, $replace = '') X-Ref

Replace bad bytes with an alternative character

ASCII character is recommended for replacement char

PCRE Pattern to locate bad bytes in a UTF-8 string
Comes from W3 FAQ: Multilingual Forms
Note: modified to include full ASCII range including control chars

author: Harry Fuecks <hfuecks@gmail.com>
param: string to search
param: string to replace bad bytes with (defaults to '?') - use ASCII
return: string

utf8_correctIdx(&$str,$i,$next=false) X-Ref

adjust a byte index into a utf8 string to a utf8 character boundary

author: chris smith <chris@jalakai.co.uk>
param: $str   string   utf8 character string
param: $i     int      byte index into $str
param: $next  bool     direction to search for boundary,
return: int            byte index into $str now pointing to a utf8 character boundary

Author:	Andreas Gohr <andi@splitbrain.org>
License:	LGPL (http://www.gnu.org/copyleft/lesser.html)
Poids:	1224 lignes (52 kb)
Inclus ou requis:	6 fois
Référencé:	0 fois
Nécessite:	0 fichiers

Code source de DokuWiki 2006-11-06

/inc/ -> utf8.php (sommaire)

Définit 26 fonctions