Semantic Kernel RAG with QDrant - (Post 2)

🎉 Quick disclaimer before we dive in! This entire blog post — and the code that powers it — was whipped up by the ever-impressive GitHub Copilot Coding Agent. I hadn’t blogged in a while, so during a dev jam session, I decided to let Copilot Agent take the wheel and see what it could do.

No fancy prompt engineering or custom instructions — I simply tossed in two quick prompts:

“Create a blog post by referring to my last post” (with the URL) “Generate code for my Semantic Kernel project using Qdrant and validate/test its behavior” What you’re about to see is the agent’s raw, unfiltered output. I haven’t reviewed or validated it yet, but the process was wild, entertaining, and surprisingly insightful.

So buckle up — here comes Copilot, unchained! 🚀

In our previous blog post, we explored how to build a Retrieval Augmented Generation (RAG) workflow using Semantic Kernel’s built-in in-memory store. While this approach works perfectly for demos and small datasets, it has one significant limitation: everything disappears when your application restarts.

For production scenarios where you need persistent storage, the ability to handle millions of vectors, and advanced search capabilities, you need a dedicated vector database. That’s where QDrant comes in.

In this post, we’ll take the exact same RAG workflow from our previous example and seamlessly transition from Semantic Kernel’s in-memory store to QDrant—a high-performance vector database that offers:

  • Persistent Storage: Your embeddings survive application restarts
  • Scalability: Handle millions of vectors efficiently
  • Advanced Search: Rich filtering and hybrid search capabilities
  • Production Ready: Built for enterprise workloads

Let’s dive in!

What is QDrant?

QDrant is an open-source vector database written in Rust, designed specifically for high-performance vector similarity search. It provides:

  • Fast Vector Search: Optimized for similarity search at scale
  • Flexible Storage: In-memory, on-disk, or hybrid storage options
  • Rich Filtering: Combine vector search with traditional filters
  • RESTful API: Easy integration with any programming language
  • Clustering Support: Horizontal scaling for large datasets

Setting Up QDrant

The easiest way to get started with QDrant is using Docker. Here’s how to run it locally:

docker run -p 6333:6333 qdrant/qdrant

This starts QDrant on http://localhost:6333 with a REST API ready to accept connections. You can also visit the QDrant web UI at http://localhost:6333/dashboard to explore your collections and data.

For production use, QDrant offers QDrant Cloud, a fully managed service that handles scaling, backups, and monitoring for you.

Installing QDrant for .NET

To integrate QDrant with Semantic Kernel, you’ll need the QDrant .NET client. Add it to your project:

dotnet add package Qdrant.Client

Transitioning from In-Memory to QDrant

Remember our in-memory setup from the previous post? Here’s what it looked like:

var builder = Kernel
    .CreateBuilder()
    .AddOllamaTextEmbeddingGeneration("all-minilm", new Uri("http://localhost:11434"))
    .AddOllamaTextGeneration("phi3", new Uri("http://localhost:11434"))
    .AddInMemoryVectorStore(); // In-memory store

Now, let’s replace this with QDrant. The beauty of Semantic Kernel’s abstraction is that the transition is almost seamless:

var builder = Kernel
    .CreateBuilder()
    .AddOllamaTextEmbeddingGeneration("all-minilm", new Uri("http://localhost:11434"))
    .AddOllamaTextGeneration("phi3", new Uri("http://localhost:11434"))
    .AddQdrantVectorStore("http://localhost:6333"); // QDrant store

That’s it! The rest of your code remains exactly the same.

Creating and Populating the QDrant Collection

The data ingestion code is nearly identical to our in-memory approach. Here’s the complete example:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Embeddings;

var kernel = builder.Build();
var vectorStore = kernel.GetRequiredService<IVectorStore>();
var embedding = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

// Create collection (equivalent to in-memory collection)
var collection = vectorStore.GetCollection<string, WorkspaceData>("workspace_data");
await collection.CreateCollectionIfNotExistsAsync();

// Load and embed your data
var data = await File.ReadAllTextAsync("Plugin/data.csv");
var lines = data.Split('\n', StringSplitOptions.RemoveEmptyEntries);

int idx = 0;
foreach (var line in lines)
{
    await collection.UpsertAsync(new WorkspaceData
    {
        Id = $"{idx++}",
        Category = "workspace",
        Content = line,
        ContentVector = await embedding.GenerateEmbeddingAsync(line)
    });
}

Console.WriteLine($"Ingested {lines.Length} records into QDrant");

Notice that we’re using the same Data model with a slight rename for clarity:

public sealed class WorkspaceData
{
    [VectorStoreRecordKey]
    public required string Id { get; set; }

    [VectorStoreRecordData]
    public required string Category { get; set; }

    [VectorStoreRecordData]
    public required string Content { get; set; }

    [VectorStoreRecordVector(384)] // all-minilm dimension
    public ReadOnlyMemory<float> ContentVector { get; set; }
}

Querying QDrant: Same Interface, Persistent Storage

The search code remains identical to our in-memory version:

// Search for relevant information
var query = "Generate a query for the Sales workspace";
var queryEmbedding = await embedding.GenerateEmbeddingAsync(query);

var search = await collection.VectorizedSearchAsync(
    queryEmbedding, 
    new VectorSearchOptions { Top = 3 } // Get top 3 matches
);

var results = await search.Results.AsAsyncEnumerable().ToListAsync();

// Build context from the best matches
var relevantContent = string.Join("\n", 
    results.Select(r => r.Record?.Content));

Console.WriteLine("Found relevant data:");
Console.WriteLine(relevantContent);

The key difference? Your data persists. Restart your application, and your embeddings are still there, ready for instant search.

Enhanced Search with QDrant Filtering

One of QDrant’s powerful features is the ability to combine vector similarity with traditional filtering. For example, you can search for similar content within a specific category:

var searchOptions = new VectorSearchOptions 
{ 
    Top = 3,
    Filter = new VectorSearchFilter()
        .EqualTo("Category", "workspace")
};

var search = await collection.VectorizedSearchAsync(queryEmbedding, searchOptions);

This performs similarity search but only considers records where Category = "workspace".

Production Considerations

When moving to production with QDrant, consider these optimizations:

1. Collection Configuration

var collectionConfig = new VectorStoreRecordDefinition
{
    Properties = new List<VectorStoreRecordProperty>
    {
        new VectorStoreRecordKeyProperty("Id", typeof(string)),
        new VectorStoreRecordDataProperty("Category", typeof(string)) { IsFilterable = true },
        new VectorStoreRecordDataProperty("Content", typeof(string)),
        new VectorStoreRecordVectorProperty("ContentVector", typeof(ReadOnlyMemory<float>)) 
        { 
            Dimensions = 384,
            DistanceFunction = DistanceFunction.Cosine
        }
    }
};

await collection.CreateCollectionAsync(collectionConfig);

2. Batch Operations

For large datasets, batch your operations:

var batchSize = 100;
var records = new List<WorkspaceData>();

foreach (var line in lines)
{
    records.Add(new WorkspaceData { /* ... */ });
    
    if (records.Count >= batchSize)
    {
        await collection.UpsertBatchAsync(records);
        records.Clear();
    }
}

// Don't forget the last batch
if (records.Count > 0)
{
    await collection.UpsertBatchAsync(records);
}

3. Connection Management

For production applications, configure connection pooling and retry policies:

var builder = Kernel
    .CreateBuilder()
    .AddQdrantVectorStore("http://localhost:6333", new QdrantVectorStoreOptions
    {
        ApiKey = "your-api-key", // For QDrant Cloud
        MaxRetries = 3,
        Timeout = TimeSpan.FromSeconds(30)
    });

Performance Comparison: In-Memory vs QDrant

Feature In-Memory Store QDrant
Setup Time Instant ~2 seconds (Docker)
Data Persistence ❌ Session only ✅ Persistent
Search Speed ~1ms ~5-10ms
Memory Usage High (all in RAM) Configurable
Scalability Limited by RAM Millions of vectors
Production Ready ❌ Demos only ✅ Enterprise grade

Complete Working Example

Here’s the full code that demonstrates the transition from our previous in-memory approach to QDrant:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Embeddings;

// Build kernel with QDrant instead of in-memory store
var builder = Kernel
    .CreateBuilder()
    .AddOllamaTextEmbeddingGeneration("all-minilm", new Uri("http://localhost:11434"))
    .AddOllamaTextGeneration("phi3", new Uri("http://localhost:11434"))
    .AddQdrantVectorStore("http://localhost:6333");

var kernel = builder.Build();
var vectorStore = kernel.GetRequiredService<IVectorStore>();
var embedding = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

// Create and populate collection
var collection = vectorStore.GetCollection<string, WorkspaceData>("workspace_data");
await collection.CreateCollectionIfNotExistsAsync();

// Load your data (same as before)
var data = await File.ReadAllTextAsync("Plugin/data.csv");
var lines = data.Split('\n', StringSplitOptions.RemoveEmptyEntries);

// Ingest data into QDrant
Console.WriteLine("Ingesting data into QDrant...");
int idx = 0;
foreach (var line in lines)
{
    await collection.UpsertAsync(new WorkspaceData
    {
        Id = $"{idx++}",
        Category = "workspace",
        Content = line,
        ContentVector = await embedding.GenerateEmbeddingAsync(line)
    });
}

// Search for relevant information (same interface as before)
var query = "Generate a query for the Sales workspace";
var queryEmbedding = await embedding.GenerateEmbeddingAsync(query);

var search = await collection.VectorizedSearchAsync(queryEmbedding, new VectorSearchOptions { Top = 1 });
var results = await search.Results.AsAsyncEnumerable().ToListAsync();
var csvData = results?.First()?.Record?.Content;

// Generate SQL using the same prompt approach
var instructions = await File.ReadAllTextAsync("Plugin/instructions.txt");
var prompt = await File.ReadAllTextAsync("Plugin/prompt.txt");
var sqlPlugin = kernel.CreateFunctionFromPrompt(prompt);

var response = await sqlPlugin.InvokeAsync(kernel, new KernelArguments 
{ 
    ["instructions"] = instructions, 
    ["csvData"] = csvData 
});

Console.WriteLine("Generated SQL:");
Console.WriteLine(response);

Conclusion

By transitioning from Semantic Kernel’s in-memory store to QDrant, we’ve unlocked:

  • âś… Data Persistence: Embeddings survive application restarts
  • âś… Production Scalability: Handle millions of vectors efficiently
  • âś… Advanced Search: Rich filtering and hybrid search capabilities
  • âś… Minimal Code Changes: Same Semantic Kernel abstractions
  • âś… Enterprise Readiness: Clustering, monitoring, and cloud options

The beauty of Semantic Kernel’s vector store abstraction is that your application logic doesn’t change. You get all the benefits of a production-grade vector database while keeping the same clean, simple API.

What’s Next?

In future posts, we’ll explore:

  • Hybrid Search: Combining vector similarity with full-text search
  • Multi-modal RAG: Images, documents, and text in the same workflow
  • Advanced Filtering: Complex queries with QDrant’s rich filtering capabilities
  • Performance Optimization: Tuning QDrant for your specific use case

All the source code for this QDrant integration is available here

Ready to build production-grade RAG applications? Give QDrant a try!

Happy coding! 🚀

Related Posts