Parsing Pipeline - from Bytes to DOM and CSSOM
Parsing Pipeline is the process of transforming HTML/CSS bytes into data structures (DOM and CSSOM) that the browser can work with. Understanding this process is critical for optimizing page load performance.
HTML Parsing: from Bytes to DOM
Stage 1: Byte Stream → Character Stream
Bytes: 3C 68 31 3E ...
↓ (Character Encoding)
Characters: <h1>Hello</h1>
Character Encoding Detection:
- BOM (Byte Order Mark)
- HTTP Content-Type header:
charset=utf-8 - Meta tag:
<meta charset="utf-8"> - Fallback: auto-detection
Stage 2: Tokenization (Lexical Analysis)
HTML parser converts characters into tokens:
<div class="container">
<h1>Title</h1>
<p>Text</p>
</div>
Tokens:
StartTag: div (attributes: class="container")
StartTag: h1
Character: Title
EndTag: h1
StartTag: p
Character: Text
EndTag: p
EndTag: div
Stage 3: Tree Construction
Tokens are transformed into DOM nodes and DOM tree is built:
Preload Scanner — Critical Optimization
Preload Scanner works in parallel with HTML parser and preloads resources:
<html>
<head>
<!-- Parser here -->
<link rel="stylesheet" href="style.css">
<script src="app.js"></script>
</head>
<body>
<img src="hero.jpg"> <!-- Preload Scanner already found this! -->
What Preload Scanner finds:
<link rel="stylesheet"><script src><img src><link rel="preload">
CSS Parsing: CSSOM Construction
Stage 1: CSS Tokenization
body {
color: blue;
font-size: 16px;
}
Tokens:
Selector: body
Property: color
Value: blue
Property: font-size
Value: 16px
Stage 2: CSSOM Construction
CSS Blocking
CSS blocks rendering!
<head>
<link rel="stylesheet" href="style.css"> <!-- Blocks! -->
</head>
<body>
<!-- Content won't render until CSS loads -->
Solution — Media Queries:
<link rel="stylesheet" href="print.css" media="print"> <!-- Doesn't block screen -->
<link rel="stylesheet" href="mobile.css" media="(max-width: 600px)">
Render Tree Construction
DOM + CSSOM = Render Tree
<div style="display:none">Hidden</div>
<div class="visible">Visible</div>
Render Tree contains only visible elements:
display: none— not in Render Treevisibility: hidden— in Render Tree (takes space)<head>,<script>,<meta>— not in Render Tree
Speculative Parsing
Modern browsers use speculative parsing:
<script src="slow.js"></script> <!-- Blocks parsing -->
<img src="image1.jpg">
<img src="image2.jpg">
Without Speculative Parsing:
- Parsing stops at
<script> - Wait for loading and execution
- Continue parsing
With Speculative Parsing:
- Parsing stops at
<script> - But speculative thread continues parsing
- Finds
image1.jpg,image2.jpgand starts loading!
Performance Best Practices
Minimize CSS
CSS blocks rendering. Use critical CSS inline for above-the-fold content.
Use async/defer for scripts
Don't block parsing. <script defer> doesn't block.
Help Preload Scanner
Use <link rel="preload"> for critical resources.
Avoid document.write()
Completely breaks Speculative Parsing!
Summary:
Parsing Pipeline is a complex multi-stage process with many optimizations (Preload Scanner, Speculative Parsing). Understanding this process helps write HTML/CSS that loads faster.