<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://muthuramkumars.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://muthuramkumars.github.io/" rel="alternate" type="text/html" /><updated>2026-03-15T07:01:54+00:00</updated><id>https://muthuramkumars.github.io/feed.xml</id><title type="html">Muthu Ramkumar</title><subtitle>AI Engineer | LLM Systems | RAG | Multi-Agent Architectures</subtitle><entry><title type="html">Weaviate Deep Dive</title><link href="https://muthuramkumars.github.io/blog/weaviate-deep-dive/" rel="alternate" type="text/html" title="Weaviate Deep Dive" /><published>2026-03-11T00:00:00+00:00</published><updated>2026-03-11T00:00:00+00:00</updated><id>https://muthuramkumars.github.io/blog/weaviate-deep-dive</id><content type="html" xml:base="https://muthuramkumars.github.io/blog/weaviate-deep-dive/"><![CDATA[<h1 id="deep-dive-how-weaviate-really-works-under-the-hood">Deep Dive: How Weaviate Really Works Under the Hood</h1>

<h3 id="from-filtered-vector-search-to-storage-engines--a-complete-technical-walkthrough-from-ingestion-to-retrieval">From filtered vector search to storage engines — a complete technical walkthrough from ingestion to retrieval</h3>

<hr />

<p>Vector databases are often treated as black boxes. You push embeddings in, fire a query, and results come back. For most use cases, that abstraction holds. But the moment you start tuning for performance, debugging unexpected recall drops, designing high-throughput ingestion pipelines, or reasoning about what happens during a crash — the black box stops being enough.</p>

<p>This article peels back every layer of Weaviate’s internals. We start from how filtered vector search actually works, move through how the HNSW graph is constructed and how nodes decide their connections, and go all the way down to the storage engines, the inverted index data structures, and the crash recovery mechanism sitting beneath all of it. Each section builds on the last, so by the end you’ll have a complete mental model of the entire system — from a single write call to the bytes on disk.</p>

<hr />

<h2 id="filtered-vector-search--how-it-actually-works">Filtered Vector Search — How It Actually Works</h2>

<p>The most common misconception about Weaviate is how it handles filtered vector search. The intuitive mental model — traverse the vector index to find nearest neighbors, then filter the results — is actually the wrong model, and understanding why it fails is the starting point for understanding what Weaviate actually does.</p>

<h3 id="why-post-filtering-breaks">Why Post-Filtering Breaks</h3>

<p>HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor index. It knows nothing about your metadata filters. It only understands vector distances. So if you run HNSW first and filter afterward, you face a fundamental problem: the number of results HNSW returns is bounded by the <code class="language-plaintext highlighter-rouge">ef</code> parameter (typically 64–128). If your filter only matches 0.5% of your dataset, there’s no guarantee that any of those matching objects will appear in the top-64 nearest neighbors. You could traverse thousands of nodes and return zero useful results.</p>

<p>This isn’t a Weaviate-specific problem — it’s a fundamental tension between approximate nearest neighbor search and arbitrary metadata filtering.</p>

<h3 id="pre-filtering-with-an-allowlist">Pre-Filtering with an Allowlist</h3>

<p>Weaviate’s primary strategy inverts the order entirely. Before touching the HNSW graph, Weaviate resolves the filter by querying its inverted index — a separate index that maps scalar property values to the set of object IDs that match. This produces an <strong>allowlist</strong>: a set of IDs that are valid candidates for the final result.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Query the inverted index
        filter: category == "electronics"
        → allowlist = {id3, id7, id19, id45, id102, ...}

Step 2: HNSW traversal with allowlist masking
        during graph traversal, when the algorithm wants
        to visit a node → check if its ID is in the allowlist
        if NOT in allowlist → skip and continue traversal

Step 3: Return top-K from the allowed nodes only
</code></pre></div></div>

<p>The graph topology doesn’t change. HNSW still navigates the same edges it always does. The allowlist acts as a transparent mask — nodes outside it are simply invisible to the traversal. The inverted index lookup itself is a fast key-value operation, completing in microseconds before HNSW traversal begins.</p>

<h3 id="the-flat-search-fallback">The Flat Search Fallback</h3>

<p>Allowlist masking works well when the filter is loose — when many objects pass the filter, the masked graph is dense enough for HNSW to navigate efficiently. But when the filter is very selective, the masked graph becomes so sparse that HNSW’s navigational structure breaks down. The graph’s edges were built to reflect global vector proximity, not the topology of your filtered subset. With only 50 matching objects out of 1 million, HNSW might wander for hundreds of hops before landing on an allowed node, and its termination conditions might fire prematurely.</p>

<p>Weaviate handles this with an automatic flat search fallback. When the allowlist is small enough — below a configurable threshold — Weaviate skips HNSW entirely and performs a brute-force scan directly over the allowlist. With only 50 candidates, brute-force distance computation is trivially fast.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">vectorIndexConfig</span><span class="pi">:</span>
  <span class="na">flatSearchCutoff</span><span class="pi">:</span> <span class="m">40000</span>   <span class="c1"># default</span>
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">allowlist size &lt; flatSearchCutoff</code> → brute-force flat search over allowlist.
If <code class="language-plaintext highlighter-rouge">allowlist size &gt;= flatSearchCutoff</code> → HNSW traversal with allowlist masking.</p>

<table>
  <thead>
    <tr>
      <th>Scenario</th>
      <th>Strategy</th>
      <th>Why</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>No filter</td>
      <td>Pure HNSW traversal</td>
      <td>Maximum speed</td>
    </tr>
    <tr>
      <td>Loose filter (large allowlist)</td>
      <td>HNSW + allowlist masking</td>
      <td>Good recall, efficient traversal</td>
    </tr>
    <tr>
      <td>Tight filter (small allowlist)</td>
      <td>Flat brute-force on allowlist</td>
      <td>Reliable when graph is too sparse</td>
    </tr>
  </tbody>
</table>

<p>Weaviate chooses between these strategies automatically at query time. No configuration is required beyond optionally tuning <code class="language-plaintext highlighter-rouge">flatSearchCutoff</code>.</p>

<hr />

<h2 id="the-hnsw-graph--structure-construction-and-navigation">The HNSW Graph — Structure, Construction, and Navigation</h2>

<p>HNSW is a layered graph structure. Every object in Weaviate’s vector index is a node in this graph, and the edges between nodes encode vector proximity — two nodes are connected if their embedding vectors are close in the embedding space.</p>

<p>The “hierarchical” part means the graph is organized into multiple layers, where upper layers are sparse and contain only a small fraction of all nodes, and the bottom layer (Layer 0) contains every node. The “navigable small world” part means the graph is constructed so that you can reach any node from any other node in a small number of hops — a property that enables fast approximate nearest neighbor search.</p>

<h3 id="the-layered-structure">The Layered Structure</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Layer 2 (very few nodes):    A ─────────────── F
                              │
Layer 1 (some nodes):        A ───── C ───── F ───── H
                              │               │
Layer 0 (ALL nodes):         A─B─C─D─E─F─G─H─I─J─K─L
</code></pre></div></div>

<p>Upper layers have fewer nodes and longer-range edges. They exist purely for navigation — they let you cover large swaths of vector space in a single hop, getting you close to the target region quickly. Layer 0 has every node, with dense, short-range edges that enable precise local search once you’ve landed in the right neighborhood.</p>

<p>A search query enters the graph at the topmost layer via a single global entry point, greedily descends through each layer finding closer and closer nodes to the query vector, and finally performs a wider beam search at Layer 0 to find the true approximate top-K.</p>

<h3 id="how-layer-assignment-works">How Layer Assignment Works</h3>

<p>Every node is always inserted into Layer 0. But it may also be inserted into Layer 1, Layer 2, and higher — determined by a random draw at insertion time:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>layer = floor( −ln(uniform(0,1)) × mL )

where  mL = 1 / ln(M)
and    M  = max connections per node (default 16)
</code></pre></div></div>

<p>For M=16, <code class="language-plaintext highlighter-rouge">mL ≈ 0.36</code>. The resulting distribution across a dataset of 1 million vectors:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Expected Node Count</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Layer 0</td>
      <td>1,000,000 nodes — every object</td>
    </tr>
    <tr>
      <td>Layer 1</td>
      <td>~62,500 nodes (6.25%)</td>
    </tr>
    <tr>
      <td>Layer 2</td>
      <td>~3,906 nodes (0.39%)</td>
    </tr>
    <tr>
      <td>Layer 3</td>
      <td>~244 nodes</td>
    </tr>
    <tr>
      <td>Layer 4</td>
      <td>~15 nodes</td>
    </tr>
    <tr>
      <td>Layer 5</td>
      <td>~1 node — the global entry point</td>
    </tr>
  </tbody>
</table>

<p>The exponential thinning is the mathematical foundation of the <code class="language-plaintext highlighter-rouge">O(log N)</code> search complexity. Each layer reduces the navigable node population by a factor of M. A search that covers 1 million nodes at Layer 0 only needs to navigate ~15 nodes at Layer 4 before descending. The structure is essentially a probabilistic skip list generalized to high-dimensional vector space.</p>

<h3 id="inserting-a-new-node--step-by-step">Inserting a New Node — Step by Step</h3>

<p>When a new node X is inserted with an assigned max layer of 2 (for example):</p>

<p><strong>Phase 1 — Navigation (layers above X’s max layer):</strong>
Starting from the global entry point at the top layer, the algorithm greedily hops to whichever neighbor is closest to X at each layer, descending until it reaches Layer 2 (X’s max layer). These upper layers are used only for navigation — X is not inserted into them.</p>

<p><strong>Phase 2 — Insertion (from X’s max layer down to Layer 0):</strong>
At each layer from 2 down to 0, the algorithm performs a beam search with width <code class="language-plaintext highlighter-rouge">ef_construction</code> (default 128) to find the best candidate neighbors for X. From these candidates, it selects M connections using the neighbor selection heuristic (covered in detail below), then creates bidirectional edges between X and its selected neighbors.</p>

<p><strong>Entry point update:</strong>
If X’s assigned max layer is higher than the current graph’s maximum layer, X becomes the new global entry point.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Insertion of node X (assigned to max layer 2):

Top layer → navigate greedily to find landing point at Layer 2
Layer 2   → beam search ef_construction candidates → select M → connect
Layer 1   → beam search ef_construction candidates → select M → connect
Layer 0   → beam search ef_construction candidates → select M → connect
</code></pre></div></div>

<p>This is an <strong>online algorithm</strong> — HNSW accepts incremental inserts naturally. There is no batch rebuild step. New nodes inserted into an already-populated graph follow the same procedure and integrate seamlessly.</p>

<h3 id="how-nodes-choose-their-connections--the-diversity-heuristic">How Nodes Choose Their Connections — The Diversity Heuristic</h3>

<p>The most subtle and important aspect of HNSW construction is how a node selects its M neighbors from the <code class="language-plaintext highlighter-rouge">ef_construction</code> candidates the beam search returns.</p>

<p>The naive approach — simply pick the M closest candidates — produces a badly navigable graph. If X’s true nearest neighbors are all clustered in the same region of vector space, all M connections point in the same direction. X becomes unreachable from the opposite side of the space, and search quality degrades significantly.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Naive closest-M selection:

          C1  C2  C3
            ↘  ↓  ↙
               X          ← all neighbors in the same direction
             C4  C5        X is a dead end from the right side of space
             C6  C7
</code></pre></div></div>

<p>Weaviate uses the <strong>SELECT-NEIGHBORS-HEURISTIC</strong>, which enforces diversity across the selected connections:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">select_neighbors_heuristic</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">candidates</span><span class="p">,</span> <span class="n">M</span><span class="p">):</span>
    <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="c1"># candidates sorted by distance to X, closest first
</span>
    <span class="k">for</span> <span class="n">candidate</span> <span class="n">C</span> <span class="ow">in</span> <span class="n">candidates</span><span class="p">:</span>
        <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="n">M</span><span class="p">:</span>
            <span class="k">break</span>

        <span class="n">good</span> <span class="o">=</span> <span class="bp">True</span>
        <span class="k">for</span> <span class="n">already_selected</span> <span class="n">R</span> <span class="ow">in</span> <span class="n">result</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">distance</span><span class="p">(</span><span class="n">C</span><span class="p">,</span> <span class="n">R</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">distance</span><span class="p">(</span><span class="n">C</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
                <span class="c1"># C is closer to some already-selected neighbor R
</span>                <span class="c1"># than it is to X itself.
</span>                <span class="c1"># This means C covers the same angular region as R.
</span>                <span class="c1"># C is geometrically redundant — skip it.
</span>                <span class="n">good</span> <span class="o">=</span> <span class="bp">False</span>
                <span class="k">break</span>

        <span class="k">if</span> <span class="n">good</span><span class="p">:</span>
            <span class="n">result</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">C</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">result</span>
</code></pre></div></div>

<p>The key check: for each candidate C, if C is closer to any already-selected neighbor R than it is to X, then C is geometrically redundant — R already “covers” that direction. Skip C and move to the next candidate.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Heuristic selection:

    C1                    C9
      ↘                  ↙
           X                  ← neighbors distributed across all directions
      ↗                  ↖
    C3                    C11
</code></pre></div></div>

<p>Each selected neighbor covers a distinct angular region around X. This ensures that no matter which direction a search is approaching X from, there’s a connection pointing in the right direction. The graph becomes truly navigable in all directions.</p>

<p><strong>Bidirectional linking and the shrink operation:</strong>
After X selects its neighbors, each of those neighbors must also link back to X. If a neighbor N already has M connections (its list is full), it runs the same heuristic on its existing M connections plus X, and keeps the best M — potentially evicting one old connection. This is called the shrink operation, and it keeps the graph balanced as the dataset grows.</p>

<h3 id="hnsw-configuration-parameters">HNSW Configuration Parameters</h3>

<table>
  <thead>
    <tr>
      <th>Parameter</th>
      <th>What It Controls</th>
      <th>Default</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">M</code></td>
      <td>Max bidirectional connections per node per layer</td>
      <td>16</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">maxConnections</code></td>
      <td>Max connections at Layer 0 specifically (usually 2×M)</td>
      <td>32</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ef_construction</code></td>
      <td>Beam width during insertion — wider = better graph quality, slower ingest</td>
      <td>128</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ef</code></td>
      <td>Beam width during query — wider = better recall, slower query</td>
      <td>64</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">flatSearchCutoff</code></td>
      <td>Allowlist size threshold for flat search fallback</td>
      <td>40,000</td>
    </tr>
  </tbody>
</table>

<p>Higher <code class="language-plaintext highlighter-rouge">ef_construction</code> gives a better-connected graph and better search recall, at the cost of slower ingestion. Higher <code class="language-plaintext highlighter-rouge">ef</code> at query time improves recall at the cost of latency. These are the two primary knobs for the quality-performance trade-off.</p>

<hr />

<h2 id="the-inverted-index--scalar-property-filtering-at-scale">The Inverted Index — Scalar Property Filtering at Scale</h2>

<p>Weaviate maintains a separate inverted index for every scalar property on every class. This is what makes fast pre-filtering possible — when you apply a filter, Weaviate resolves it against the inverted index first, before touching the HNSW graph at all.</p>

<h3 id="what-an-inverted-index-is">What an Inverted Index Is</h3>

<p>The fundamental idea is to invert the storage model. A normal object store maps from object ID to its properties. An inverted index maps from property values to the set of object IDs that have that value:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Object Store (normal direction):
  id=1 → { category: "electronics", price: 299, in_stock: true }
  id=2 → { category: "clothing",    price: 49,  in_stock: true }
  id=3 → { category: "electronics", price: 599, in_stock: false }
  id=4 → { category: "clothing",    price: 129, in_stock: true }

Inverted Index (flipped direction):
  category:"electronics" → [1, 3]
  category:"clothing"    → [2, 4]
  price:299              → [1]
  price:49               → [2]
  price:599              → [3]
  price:129              → [4]
  in_stock:true          → [1, 2, 4]
  in_stock:false         → [3]
</code></pre></div></div>

<p>A filter like <code class="language-plaintext highlighter-rouge">category == "electronics"</code> is resolved by a direct key lookup — no scanning, no iteration. The result <code class="language-plaintext highlighter-rouge">[1, 3]</code> is the allowlist handed to HNSW.</p>

<p>For text properties, Weaviate applies a tokenization pipeline before indexing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Raw value: "Running Shoes for Men"
     ↓
Tokenizer (configured per property: word / field / whitespace)
     ↓
["running", "shoes", "for", "men"]
     ↓
Each token indexed separately:
  "running" → [..., id_X]
  "shoes"   → [..., id_X]
  "men"     → [..., id_X]
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">filterableIndex: true</code> stores the full value as a single token for exact-match filters. <code class="language-plaintext highlighter-rouge">searchableIndex: true</code> tokenizes for BM25 text search. Both can be enabled simultaneously on the same property.</p>

<h3 id="index-structures-by-property-type">Index Structures by Property Type</h3>

<p>Different property types need different underlying index structures to support their query patterns efficiently:</p>

<table>
  <thead>
    <tr>
      <th>Property Type</th>
      <th>Index Structure</th>
      <th>Why</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>string / text</td>
      <td>Hash map: term → roaring bitmap</td>
      <td>O(1) exact match lookup</td>
    </tr>
    <tr>
      <td>number / int</td>
      <td>B-tree</td>
      <td>Supports range queries: <code class="language-plaintext highlighter-rouge">price &gt; 100 AND price &lt; 500</code></td>
    </tr>
    <tr>
      <td>boolean</td>
      <td>Two-entry map: true → [ids], false → [ids]</td>
      <td>Only two possible values</td>
    </tr>
    <tr>
      <td>date</td>
      <td>B-tree (stored as int64 unix timestamp)</td>
      <td>Range queries over time</td>
    </tr>
    <tr>
      <td>geo coordinates</td>
      <td>R-tree based geo index</td>
      <td>Radius queries, bounding box queries</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="on-ingestion">On Ingestion</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>client.data_object().create({ ... })

  1. Vectorizer module converts object to embedding vector
  2. WAL append → fsync() → durability guaranteed
  3. Three parallel updates:
     │
     ├── RocksDB object store
     │     → WAL → MemTable → eventual SSTable flush
     │
     ├── RocksDB inverted index
     │     → for each scalar property:
     │       tokenize value → update roaring bitmap posting list
     │       WAL → MemTable → eventual SSTable flush
     │
     └── BoltDB HNSW graph
           → assign max layer via probability formula
           → navigate to insertion neighborhood
           → beam search ef_construction candidates
           → SELECT-NEIGHBORS-HEURISTIC → pick best M
           → bidirectional connect (shrink neighbors if full)
           → COW B+ tree write → atomic root pointer update

  4. Return 201 Created to client
</code></pre></div></div>

<h3 id="on-filtered-vector-search">On Filtered Vector Search</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>client.query.get(...).with_near_vector(...).with_where(filter).do()

  1. Resolve filter → RocksDB inverted index
       → key lookup per filter term
       → roaring bitmap operations (AND / OR / NOT) for compound filters
       → result: allowlist of matching object IDs

  2. Choose search strategy:
       if len(allowlist) &lt; flatSearchCutoff:
           → brute-force distance computation over allowlist
       else:
           → HNSW traversal with allowlist masking
               enter at global entry point (top layer)
               greedy descent through upper layers
               beam search at Layer 0 with ef candidates
               skip any node not in allowlist

  3. Return top-K results ranked by vector distance
</code></pre></div></div>

<h3 id="on-crash-recovery">On Crash Recovery</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Process restarts after crash:

  RocksDB:
    → Read WAL file
    → Replay all records since last MemTable flush checkpoint
    → Reconstruct MemTable to pre-crash state
    → Verify SSTable checksums

  BoltDB:
    → Memory-mapped file is intact (COW never overwrote committed pages)
    → Read root page pointer → valid committed B+ tree state
    → Any in-progress write transaction that didn't commit is simply absent
    → HNSW graph is back in its last fully committed state

  Result: zero data loss for any write that received a success response
</code></pre></div></div>

<p>Each component’s guarantees compose cleanly. The WAL ensures RocksDB never loses an acknowledged write. BoltDB’s copy-on-write atomicity ensures the HNSW graph is always in a consistent committed state. SSTable immutability ensures compacted data is never corrupted by a partial write. Together, they give Weaviate end-to-end durability without any of the components needing to know about each other.</p>

<hr />

<h2 id="why-this-architecture-is-the-way-it-is">Why This Architecture Is the Way It Is</h2>

<p>The design of Weaviate’s storage layer is not arbitrary. Every choice reflects a specific access pattern.</p>

<p>RocksDB’s LSM tree was chosen for the object store and inverted index because both absorb the full write throughput of every ingestion operation. Bulk ingest — millions of objects per hour — demands sequential write performance. LSM trees are architected for exactly this workload. The cost is slightly more complex reads, mitigated by Bloom filters and block caches.</p>

<p>BoltDB’s B+ tree was chosen for HNSW graph metadata because the graph is read orders of magnitude more often than it’s written. A search query reads node connection lists; an insert writes a handful of new entries. B+ trees are optimal for this read-heavy, random-access workload. The copy-on-write model also gives atomic HNSW updates for free.</p>

<p>Roaring Bitmaps were chosen for inverted index posting lists because filter resolution must be fast even for compound multi-property filters with millions of matching IDs. Plain arrays, hash sets, and fixed bitsets all fail in different ways. Roaring Bitmaps adapt their internal representation to the data density, giving compact storage and CPU-speed set operations regardless of how selective or non-selective the filter is.</p>

<p>The allowlist-first filtered search strategy was chosen because post-filtering is fundamentally unreliable for selective filters — a lesson learned across the entire approximate nearest neighbor search literature, not just Weaviate. By resolving the scalar filter first and masking the vector search, Weaviate guarantees that the top-K results are always drawn from the correct candidate pool.</p>

<p>No single data structure solves all of these problems. Weaviate’s architecture works because it uses the right data structure for each specific problem, and composes them into a pipeline where each component’s output is exactly what the next component needs.</p>

<hr />]]></content><author><name></name></author><category term="llm" /><category term="vector database" /><summary type="html"><![CDATA[Deep Dive: How Weaviate Really Works Under the Hood]]></summary></entry></feed>