HTML Encoding (Character Sets)

এইচটিএমএল অক্ষর সেট এবং কোডিং

The HTML charset Attribute

To display an HTML page correctly, a web browser must know which character set to use.

The character set is specified in the <meta> tag:

<meta charset="UTF-8">

The HTML specification encourages web developers to use the UTF-8 character set.

UTF-8 covers almost all of the characters and symbols in the world!

UTF-8 Character Set Coverage

UTF-8 covers almost all of the characters and symbols in the world!

Unicode Web growth

Learn More:

Full UTF-8 Reference

The ASCII Character Set

ASCII was the first character encoding standard for the web.

It defined 128 different latin characters that could be used on the internet:

A
B
C
D
E
F
G
H
I
J
English letters (a-z and A-Z)
0
1
2
3
4
5
6
7
8
9
Numbers (0-9)
!
$
+
-
(
)
@
<
>
.
#
?
Some special characters: ! $ + - ( ) @ < > . # ?

The ANSI Character Set

ANSI (Windows-1252) was the first Windows character set:

Identical to ASCII

for the first 127 characters

Special characters

from 128 to 159

Identical to UTF-8

from 160 to 255

<meta charset="Windows-1252">

The ISO-8859-1 Character Set

The default character set for HTML 4 was ISO-8859-1.

It supported 256 characters:

Identical to ASCII

for the first 127 characters

Does not use

the characters from 128 to 159

Identical to ANSI and UTF-8

from 160 to 255

HTML 4 Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

HTML 5 Example

<meta charset="ISO-8859-1">

The UTF-8 Character Set

Identical to ASCII

for the values from 0 to 127

Does not use

the characters from 128 to 159

Identical to ANSI and 8859-1

from 160 to 255

Continues from

the value 256 to 10 000 characters

<meta charset="UTF-8">

Learn More:

Full UTF-8 Reference

HTML UTF-8 Characters

Basic Latin

A
B
C
D
a
b
c
d
0
1
2
3
?
#
$
%
ABCD abcd 0123 ?#$%

Latin Extended A

Ā
Ă
Ą
Ć
Ĉ
Ċ
Ē
Ĕ
Ė
Ę
ĀĂĄ ĆĈĊ ĒĔĖĘ

Latin Extended B

ƀ
Ɓ
Ƃ
ƃ
Ƅ
ƅ
Ɔ
Ƈ
ƈ
Ɖ
Ɗ
Ƌ
ƌ
ƀƁƂƃƄƅ ƆƇƈ ƉƊƋƌ

Latin Extended C

ⱠⱡⱢ ⱣⱤ ⱥⱦ ⱧⱨⱩ

Latin Extended D

Ꜧꜧ ꜨꜩꜪꜫ ꜬꜭꜮꜯ

Latin Extended E

ꬰꬱ ꬲꬳꬴ ꬵꬶ ꬷꬸꬹ

IPA Extentions

ɖ
ɜ
ɣ
ɘ
ɫ
ɛ
ɱ
ɷ
ɞ
ɖɜɣ ɘɫɛ ɱɷɞ

Spacing Modifiers

pʰ pʱ pʲ pʳ

Diacritical Marks

àáâã èéêẽ òóôõ

General Punctuation

‰ ‱ ⁒ ‼ ⁇ ⁈ ⁉ ⁎ ⁑ ⁂

Super and Subscript

C⁰
Cⁱ
C⁴
C⁵
C₆
C₇
C₈
C⁰ Cⁱ C⁴ C⁵ C₆ C₇ C₈

Braille

⠓⠑⠇⠇⠕ ⠺⠕⠗⠇⠙