Parsing Pipeline - from Bytes to DOM and CSSOM

Parsing Pipeline is the process of transforming HTML/CSS bytes into data structures (DOM and CSSOM) that the browser can work with. Understanding this process is critical for optimizing page load performance.

HTML Parsing: from Bytes to DOM

Stage 1: Byte Stream → Character Stream

Bytes:     3C 68 31 3E ...
           ↓ (Character Encoding)
Characters: <h1>Hello</h1>

Character Encoding Detection:

  1. BOM (Byte Order Mark)
  2. HTTP Content-Type header: charset=utf-8
  3. Meta tag: <meta charset="utf-8">
  4. Fallback: auto-detection

Stage 2: Tokenization (Lexical Analysis)

HTML parser converts characters into tokens:

<div class="container">
  <h1>Title</h1>
  <p>Text</p>
</div>

Tokens:

StartTag: div (attributes: class="container")
StartTag: h1
Character: Title
EndTag: h1
StartTag: p
Character: Text
EndTag: p
EndTag: div

Stage 3: Tree Construction

Tokens are transformed into DOM nodes and DOM tree is built:

Preload Scanner — Critical Optimization

Preload Scanner works in parallel with HTML parser and preloads resources:

<html>
<head>
  <!-- Parser here -->
  <link rel="stylesheet" href="style.css">
  <script src="app.js"></script>
</head>
<body>
  <img src="hero.jpg"> <!-- Preload Scanner already found this! -->

What Preload Scanner finds:

  • <link rel="stylesheet">
  • <script src>
  • <img src>
  • <link rel="preload">

CSS Parsing: CSSOM Construction

Stage 1: CSS Tokenization

body {
  color: blue;
  font-size: 16px;
}

Tokens:

Selector: body
Property: color
Value: blue
Property: font-size
Value: 16px

Stage 2: CSSOM Construction

CSS Blocking

CSS blocks rendering!

<head>
  <link rel="stylesheet" href="style.css"> <!-- Blocks! -->
</head>
<body>
  <!-- Content won't render until CSS loads -->

Solution — Media Queries:

<link rel="stylesheet" href="print.css" media="print"> <!-- Doesn't block screen -->
<link rel="stylesheet" href="mobile.css" media="(max-width: 600px)">

Render Tree Construction

DOM + CSSOM = Render Tree

<div style="display:none">Hidden</div>
<div class="visible">Visible</div>

Render Tree contains only visible elements:

  • display: none — not in Render Tree
  • visibility: hidden — in Render Tree (takes space)
  • <head>, <script>, <meta> — not in Render Tree

Speculative Parsing

Modern browsers use speculative parsing:

<script src="slow.js"></script> <!-- Blocks parsing -->
<img src="image1.jpg">
<img src="image2.jpg">

Without Speculative Parsing:

  1. Parsing stops at <script>
  2. Wait for loading and execution
  3. Continue parsing

With Speculative Parsing:

  1. Parsing stops at <script>
  2. But speculative thread continues parsing
  3. Finds image1.jpg, image2.jpg and starts loading!

Performance Best Practices

1

Minimize CSS

CSS blocks rendering. Use critical CSS inline for above-the-fold content.

2

Use async/defer for scripts

Don't block parsing. <script defer> doesn't block.

3

Help Preload Scanner

Use <link rel="preload"> for critical resources.

4

Avoid document.write()

Completely breaks Speculative Parsing!

Summary:

Parsing Pipeline is a complex multi-stage process with many optimizations (Preload Scanner, Speculative Parsing). Understanding this process helps write HTML/CSS that loads faster.