# Building Python Packages with C++ Extensions: A Complete Guide

- **URL:** https://isaacfei.com/posts/cmsketch-py-cpp
- **Date:** 2025-09-13
- **Tags:** Python, C++, pybind11, CMake, Data Structures
- **Description:** Learn how to develop Python packages with C++ extensions using pybind11, CMake, and modern build tools. Complete project structure and setup guide.

---

## Introduction

The complete project repository is available [here](https://github.com/isaac-fate/count-min-sketch).

Building Python packages with C++ extensions is a powerful way to combine Python's ease of use with C++'s performance. This guide walks through creating a complete Python package with C++ backend, covering everything from project structure to PyPI publishing.

We'll use a Count-Min Sketch implementation as our example - a probabilistic data structure perfect for streaming data analysis. But the techniques apply to any C++ library you want to expose to Python.

### Why Build Python Packages with C++?

- **Multithreading Performance**: C++ atomic operations bypass Python's GIL limitations, enabling true parallel processing
- **Lower-Level Control**: Direct memory management and hardware-level optimizations
- **Existing Libraries**: Leverage existing C++ libraries in Python projects
- **System Integration**: Access low-level system APIs and hardware features
- **Memory Efficiency**: Better control over memory usage and data structures

### What You'll Learn

- Complete project structure for Python packages with C++ extensions
- How to configure `pyproject.toml` for modern Python packaging
- CMake setup for cross-platform C++ builds
- pybind11 integration for seamless Python bindings
- Development workflow and testing strategies
- CI/CD pipeline for automated building and publishing

## Project Structure

Understanding the project structure is crucial for Python packages with C++ extensions. Here's the complete layout:

```
count-min-sketch/
├── include/cmsketch/                    # C++ header files
│   ├── cmsketch.h                      # Main header (include this)
│   ├── count_min_sketch.h              # Core template class
│   └── hash_util.h                     # Hash utility functions
├── src/cmsketchcpp/                    # C++ source files
│   └── count_min_sketch.cc             # Core implementation
├── src/cmsketch/                       # Python package source
│   ├── __init__.py                     # Package initialization
│   ├── base.py                         # Base classes and interfaces
│   ├── _core.pyi                       # Type stubs for C++ bindings
│   ├── _version.py                     # Version information
│   ├── py.typed                        # Type checking marker
│   └── py/                             # Pure Python implementations
│       ├── count_min_sketch.py         # Python Count-Min Sketch implementation
│       └── hash_util.py                # Python hash utilities
├── src/                                # Additional source files
│   ├── main.cc                         # Example C++ application
│   └── python_bindings.cc              # Python bindings (pybind11)
├── tests/                              # C++ unit tests
│   ├── CMakeLists.txt                  # Test configuration
│   ├── test_count_min_sketch.cc        # Core functionality tests
│   ├── test_hash_functions.cc          # Hash function tests
│   └── test_sketch_config.cc           # Configuration tests
├── pytests/                            # Python tests
│   ├── __init__.py                     # Test package init
│   ├── conftest.py                     # Pytest configuration
│   ├── test_count_min_sketch.py        # Core Python tests
│   ├── test_hash_util.py               # Hash utility tests
│   ├── test_mixins.py                  # Mixin class tests
│   └── test_py_count_min_sketch.py     # Pure Python implementation tests
├── benchmarks/                         # Performance benchmarks
│   ├── __init__.py                     # Benchmark package init
│   ├── generate_data.py                # Data generation utilities
│   └── test_benchmarks.py              # Benchmark validation tests
├── examples/                           # Example scripts
│   └── example.py                      # Python usage example
├── scripts/                            # Build and deployment scripts
│   ├── build.sh                        # Production build script
│   └── build-dev.sh                    # Development build script
├── data/                               # Sample data files
│   ├── ips.txt                         # IP address sample data
│   └── unique-ips.txt                  # Unique IP sample data
├── build/                              # Build artifacts (generated)
│   ├── _core.cpython-*.so              # Compiled Python extensions
│   ├── cmsketch_example                # Compiled C++ example
│   ├── libcmsketch.a                   # Static library
│   └── tests/                          # Compiled test binaries
├── dist/                               # Distribution packages (generated)
│   └── cmsketch-*.whl                  # Python wheel packages
├── CMakeLists.txt                      # Main CMake configuration
├── pyproject.toml                      # Python package configuration
├── uv.lock                             # uv lock file
├── Makefile                            # Convenience make targets
├── LICENSE                             # MIT License
└── README.md                           # This file
```

### Key Directory Purposes

- **`include/`**: C++ header files that define the public API
- **`src/cmsketchcpp/`**: C++ implementation files
- **`src/cmsketch/`**: Python package source code
- **`src/`**: Additional C++ files like bindings and examples
- **`tests/`**: C++ unit tests using Google Test
- **`pytests/`**: Python tests using pytest
- **`benchmarks/`**: Performance testing and comparison
- **`build/`**: Generated build artifacts (not in version control)
- **`dist/`**: Generated distribution packages (not in version control)

## Version Management with bump-my-version

Managing versions across multiple files (Python package, C++ library, documentation) can be challenging. This project uses [bump-my-version](https://github.com/callowayproject/bump-my-version) to automate version updates across all relevant files.

### Configuration

The version management is configured in `.bumpversion.toml`:

```toml
# .bumpversion.toml
[bumpversion]
current_version = "0.1.10"
commit = true
tag = true
tag_name = "v{new_version}"
message = "Bump version: {current_version} → {new_version}"

[bumpversion:file:pyproject.toml]
search = 'version = "{current_version}"'
replace = 'version = "{new_version}"'

[bumpversion:file:CMakeLists.txt]
search = 'VERSION {current_version} # Project version'
replace = 'VERSION {new_version} # Project version'

[bumpversion:file:VERSION]
search = '{current_version}'
replace = '{new_version}'
```

### The CMakeLists.txt Trick

To make bump-my-version work with CMakeLists.txt, I use a clever trick by adding a comment:

```cmake
# CMakeLists.txt
project(
  cmsketch
  VERSION 0.1.10 # Project version
  LANGUAGES CXX)
```

The comment `# Project version` helps `bump-my-version` identify the correct version line in CMakeLists.txt. This ensures that other occurrences of strings like `VERSION x.x.x` elsewhere in the file are not mistaken for the actual project version.

### Usage

```bash
# Install bump-my-version
uv add --dev bump-my-version

# Bump patch version (0.1.10 → 0.1.11)
uv run bump-my-version patch

# Bump minor version (0.1.10 → 0.2.0)
uv run bump-my-version minor

# Bump major version (0.1.10 → 1.0.0)
uv run bump-my-version major

# Preview changes without committing
uv run bump-my-version --dry-run patch
```

### What Gets Updated

When you run `bump-my-version`, it automatically updates:

- **`pyproject.toml`**: Python package version
- **`CMakeLists.txt`**: C++ project version
- **`VERSION`**: Standalone version file
- **Git commit**: Creates a commit with the version bump
- **Git tag**: Creates a tag like `v0.1.11`

This ensures all version references stay synchronized across your entire project.

## pyproject.toml Configuration

The `pyproject.toml` file is the heart of modern Python packaging. Here's how to configure it for C++ extensions:

```toml
# pyproject.toml
[build-system]
requires = ["scikit-build-core>=0.10", "pybind11", "cmake>=3.15"]
build-backend = "scikit_build_core.build"

[project]
name = "cmsketch"
version = "0.1.10"
description = "High-performance Count-Min Sketch implementation with C++ and Python versions"
readme = "README.md"
license = { file = "LICENSE" }
authors = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]
maintainers = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]
requires-python = ">=3.11"
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: C++",
    "Topic :: Scientific/Engineering",
    "Topic :: Software Development :: Libraries :: Python Modules",
    "Operating System :: OS Independent",
]
keywords = ["count-min-sketch", "probabilistic", "data-structure", "streaming"]

[project.urls]
Homepage = "https://github.com/isaac-fate/count-min-sketch"
Repository = "https://github.com/isaac-fate/count-min-sketch"
Documentation = "https://github.com/isaac-fate/count-min-sketch#readme"
Issues = "https://github.com/isaac-fate/count-min-sketch/issues"

[project.optional-dependencies]
dev = ["pytest>=8.0.0", "pytest-benchmark>=4.0.0", "build>=1.0.0"]

[tool.scikit-build]
build-dir = "build/{wheel_tag}"
wheel.exclude = ["lib/**", "include/**"]

[tool.scikit-build.cmake]
args = [
    "-DCMAKE_BUILD_TYPE=Release",
    "-DCMAKE_CXX_STANDARD=17",
    "-DCMAKE_CXX_STANDARD_REQUIRED=ON",
    "-DCMAKE_CXX_EXTENSIONS=OFF",
]

[tool.cibuildwheel]
build = "cp311-* cp312-*"
skip = "*-win32 *-manylinux_i686 *-musllinux*"
test-command = "python -m pytest {project}/pytests -v"
test-requires = "pytest"
manylinux-x86_64-image = "manylinux_2_28"

[tool.cibuildwheel.macos]
environment = { MACOSX_DEPLOYMENT_TARGET = "10.15" }

[tool.cibuildwheel.windows]
before-build = "pip install delvewheel"
repair-wheel-command = "delvewheel repair -w {dest_dir} {wheel}"

[tool.pytest.ini_options]
testpaths = ["pytests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = ["-v", "--tb=short"]
```

### Key Configuration Sections

**`[build-system]`**: Specifies the build backend and requirements

- `scikit-build-core`: Modern build system for C++ extensions
- `pybind11`: C++ to Python binding library
- `cmake`: C++ build system

**`[project]`**: Package metadata and dependencies

- Standard Python package information
- `requires-python`: Minimum Python version
- `classifiers`: PyPI categorization

**`[tool.scikit-build]`**: Build configuration

- `build-dir`: Where to place build artifacts
- `wheel.exclude`: Files to exclude from wheels

**`[tool.scikit-build.cmake]`**: CMake arguments

- C++ standard and build type settings
- Cross-platform compilation flags

**`[tool.cibuildwheel]`**: CI/CD wheel building

- Python versions and platforms to build for
- Platform-specific configurations

## CMakeLists.txt Configuration

The CMakeLists.txt file orchestrates the C++ build process and Python binding generation:

```cmake
# CMakeLists.txt
cmake_minimum_required(VERSION 3.15)

project(
  cmsketch
  VERSION 0.1.10 # Project version
  LANGUAGES CXX)

# Generate compile_commands.json for IDE support
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

# Build options
option(DEVELOPMENT_MODE "Enable development mode with IDE support" OFF)
option(BUILD_PYTHON_BINDINGS "Build Python bindings for development" OFF)

# C++ standard - use C++17 for better compatibility
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

# Default build type
if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE
      Release
      CACHE STRING "Build type" FORCE)
endif()

# Compiler warnings
if(MSVC)
  add_compile_options(/W4)
  # Enable Windows symbol export
  set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
else()
  add_compile_options(-Wall -Wextra -Wpedantic)
  # Enable position independent code for shared libraries
  set(CMAKE_POSITION_INDEPENDENT_CODE ON)
endif()

# Platform-specific settings
if(APPLE)
  set(CMAKE_OSX_DEPLOYMENT_TARGET
      "10.9"
      CACHE STRING "Minimum OS X deployment version")
  set(CMAKE_OSX_ARCHITECTURES
      "x86_64;arm64"
      CACHE STRING "Build architectures for OS X")
endif()

# Source files
file(GLOB_RECURSE CMSKETCH_SOURCES "src/cmsketchcpp/*.cc")

# Create library
add_library(cmsketch ${CMSKETCH_SOURCES})
target_include_directories(cmsketch PUBLIC include)
target_compile_features(cmsketch PUBLIC cxx_std_17)

# Example executable
file(GLOB EXAMPLE_SOURCES "src/main.cc")
add_executable(cmsketch_example ${EXAMPLE_SOURCES})
target_link_libraries(cmsketch_example PRIVATE cmsketch)

# Install targets
install(TARGETS cmsketch DESTINATION lib)
install(DIRECTORY include/ DESTINATION include)

# Python bindings
if(SKBUILD_PROJECT_NAME
   OR BUILD_PYTHON_BINDINGS
   OR DEVELOPMENT_MODE)
  set(PYBIND11_FINDPYTHON ON)
  find_package(pybind11 REQUIRED)
  pybind11_add_module(_core MODULE src/python_bindings.cc)
  target_link_libraries(_core PRIVATE cmsketch)
  if(SKBUILD_PROJECT_NAME)
    install(TARGETS _core DESTINATION ${SKBUILD_PROJECT_NAME})
  endif()
endif()

# Testing
option(BUILD_TESTS "Build tests" OFF)
if(BUILD_TESTS OR DEVELOPMENT_MODE)
  find_package(GTest REQUIRED)
  enable_testing()
  add_subdirectory(tests)
endif()
```

### Key CMake Sections

**Project Setup**: Basic project configuration and C++ standard
**Compiler Settings**: Platform-specific compiler flags and warnings
**Library Creation**: Building the core C++ library
**Python Bindings**: pybind11 integration for Python extensions
**Testing**: Google Test integration for C++ unit tests
**Installation**: Target installation for packaging

## Python Bindings with pybind11

The Python bindings are created in `src/python_bindings.cc`:

```cpp
// src/python_bindings.cc
#include "cmsketch/cmsketch.h"
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>

namespace py = pybind11;

// Macro to define common CountMinSketch methods for a given type
#define DEFINE_COUNT_MIN_SKETCH_METHODS(class_type, class_name)                \
  py::class_<cmsketch::CountMinSketch<class_type>>(m, class_name)              \
      .def(py::init<uint32_t, uint32_t>(), py::arg("width"), py::arg("depth"), \
           "Create a Count-Min Sketch with specified dimensions")              \
      .def("insert", &cmsketch::CountMinSketch<class_type>::Insert,            \
           py::arg("item"), "Insert an item into the sketch")                  \
      .def("count", &cmsketch::CountMinSketch<class_type>::Count,              \
           py::arg("item"), "Get the estimated count of an item")              \
      .def("clear", &cmsketch::CountMinSketch<class_type>::Clear,              \
           "Reset the sketch to initial state")                                \
      .def("merge", &cmsketch::CountMinSketch<class_type>::Merge,              \
           py::arg("other"), "Merge another sketch into this one")             \
      .def("top_k", &cmsketch::CountMinSketch<class_type>::TopK, py::arg("k"), \
           py::arg("candidates"), "Get the top k items from candidates")       \
      .def("get_width", &cmsketch::CountMinSketch<class_type>::GetWidth,       \
           "Get the width of the sketch")                                      \
      .def("get_depth", &cmsketch::CountMinSketch<class_type>::GetDepth,       \
           "Get the depth of the sketch")

PYBIND11_MODULE(_core, m) {
  m.doc() = "Count-Min Sketch implementation with Python bindings";

  // CountMinSketch class for strings
  DEFINE_COUNT_MIN_SKETCH_METHODS(std::string, "CountMinSketchStr");

  // CountMinSketch class for int
  DEFINE_COUNT_MIN_SKETCH_METHODS(int, "CountMinSketchInt");
}
```

### Key pybind11 Features

- **Automatic Type Conversion**: STL containers are automatically converted
- **Method Binding**: C++ methods become Python methods
- **Documentation**: Docstrings are automatically generated
- **Template Specialization**: Different types get separate Python classes

## C++ Atomic Implementation

The core advantage of this C++ implementation is its use of atomic operations for thread safety, which bypasses Python's Global Interpreter Lock (GIL). Here's how the atomic implementation works:

**Header file** (`include/cmsketch/count_min_sketch.h`):

```cpp
// include/cmsketch/count_min_sketch.h
template<typename KeyType>
class CountMinSketch {
private:
    // 2D array of atomic counters for thread-safe operations
    std::vector<std::vector<std::atomic<size_t>>> counters_;
    std::vector<std::function<size_t(const KeyType&)>> hash_functions_;
    size_t width_;
    size_t depth_;

public:
    void Insert(const KeyType& key);
    size_t Count(const KeyType& key) const;
    // ... other method declarations
};
```

**Implementation file** (`src/cmsketchcpp/count_min_sketch.cc`):

```cpp
// src/cmsketchcpp/count_min_sketch.cc
template<typename KeyType>
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
    for (size_t i = 0; i < depth_; ++i) {
        size_t hash_value = hash_functions_[i](key);
        size_t index = hash_value % width_;
        // Atomic increment - thread-safe without locks
        counters_[i][index].fetch_add(1, std::memory_order_relaxed);
    }
}

template<typename KeyType>
size_t CountMinSketch<KeyType>::Count(const KeyType& key) const {
    size_t min_count = std::numeric_limits<size_t>::max();
    for (size_t i = 0; i < depth_; ++i) {
        size_t hash_value = hash_functions_[i](key);
        size_t index = hash_value % width_;
        // Atomic read - thread-safe without locks
        size_t count = counters_[i][index].load(std::memory_order_relaxed);
        min_count = std::min(min_count, count);
    }
    return min_count;
}
```

### Key Atomic Features

**Memory Ordering**: Using `std::memory_order_relaxed` for optimal performance

- **No synchronization overhead**: Relaxed ordering is sufficient for counters
- **Hardware optimization**: Allows CPU to reorder operations for better performance
- **Cache efficiency**: Reduces memory barrier overhead

**Thread Safety Benefits**:

- **Lock-free design**: No mutexes or locks required
- **Concurrent access**: Multiple threads can insert/query simultaneously
- **GIL bypass**: C++ threads operate independently of Python's GIL
- **Scalability**: Performance scales with number of CPU cores

**Performance Comparison**:

```python
# Python implementation (GIL limited)
# pytests/test_py_count_min_sketch.py
def insert_python(self, key):
    with self.lock:  # Serialized access
        # ... increment counters
```

```cpp
// C++ implementation (atomic operations)
// src/cmsketchcpp/count_min_sketch.cc
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
    // Parallel access - no locks needed
    counters_[i][index].fetch_add(1, std::memory_order_relaxed);
}
```

This atomic implementation enables true parallel processing where multiple threads can simultaneously insert and query the sketch without blocking each other, providing significant performance advantages in multithreaded environments.

## Development Workflow

Here's the complete development workflow for building Python packages with C++ extensions:

### 1. Initial Setup

```bash
# Create project directory
mkdir my-python-cpp-package
cd my-python-cpp-package

# Initialize git repository
git init

# Create basic directory structure
mkdir -p include/mypackage src/mypackagecpp src/mypackage/py tests pytests examples
```

### 2. Development Environment

```bash
# Install development dependencies
uv sync --dev

# Build in development mode
uv run python -m pip install -e .

# Run tests
uv run pytest pytests/
make build-dev && cd build && make test
```

### 3. File Associations

Understanding how files relate to the project structure:

**C++ Headers** (`include/cmsketch/`):

- `cmsketch.h` → Main header included by users
- `count_min_sketch.h` → Core template class definition
- `hash_util.h` → Utility functions

**C++ Implementation** (`src/cmsketchcpp/`):

- `count_min_sketch.cc` → Template class implementation
- Links to headers via `#include "cmsketch/cmsketch.h"`

**Python Package** (`src/cmsketch/`):

- `__init__.py` → Package initialization and public API
- `_core.pyi` → Type stubs for C++ bindings
- `base.py` → Abstract base classes
- `py/` → Pure Python implementations

**Python Bindings** (`src/python_bindings.cc`):

- Links C++ library to Python via pybind11
- Creates `_core` module with `CountMinSketchStr` and `CountMinSketchInt` classes

**Build Configuration**:

- `pyproject.toml` → Python package metadata and build settings
- `CMakeLists.txt` → C++ build configuration and pybind11 integration

### 4. Build Process

The build process follows this sequence:

1. **CMake Configuration**: Reads `CMakeLists.txt` and configures build
2. **C++ Compilation**: Compiles C++ source files into library
3. **pybind11 Binding**: Generates Python extension module
4. **Python Packaging**: Creates wheel with both C++ library and Python bindings

### 5. Testing Strategy

**C++ Tests** (`tests/`):

```cpp
// tests/test_count_min_sketch.cc
#include <gtest/gtest.h>
#include "cmsketch/cmsketch.h"

TEST(CountMinSketchTest, BasicFunctionality) {
    cmsketch::CountMinSketch<std::string> sketch(100, 3);
    sketch.Insert("test");
    EXPECT_EQ(sketch.Count("test"), 1);
}
```

**Python Tests** (`pytests/`):

```python
# pytests/test_count_min_sketch.py
import pytest
import cmsketch

def test_basic_functionality():
    sketch = cmsketch.CountMinSketchStr(100, 3)
    sketch.insert("test")
    assert sketch.count("test") == 1
```

### 6. CI/CD Pipeline

The project uses GitHub Actions for automated building and testing:

**Test Workflow** (`.github/workflows/test.yml`):

- Runs on push/PR
- Tests C++ and Python code
- Cross-platform testing (Windows, Linux, macOS)

**Wheel Building** (`.github/workflows/wheels.yml`):

- Uses cibuildwheel for cross-platform wheel generation
- Builds for multiple Python versions and architectures
- Tests wheels before publishing

**Release Workflow** (`.github/workflows/release.yml`):

- Triggers on git tags
- Publishes wheels to PyPI
- Creates GitHub releases

## C++ Implementation

The core implementation uses a template-based design that supports any hashable key type:

```cpp
// src/main.cc
#include "cmsketch/cmsketch.h"
#include <iostream>

int main() {
    // Create a sketch with width=1000, depth=5
    cmsketch::CountMinSketch<std::string> sketch(1000, 5);
    
    // Add elements
    sketch.Insert("apple");
    sketch.Insert("apple");
    sketch.Insert("banana");
    
    // Query frequencies
    std::cout << "apple: " << sketch.Count("apple") << std::endl;    // 2
    std::cout << "banana: " << sketch.Count("banana") << std::endl;  // 1
    std::cout << "cherry: " << sketch.Count("cherry") << std::endl;  // 0
    
    return 0;
}
```

The implementation uses multiple hash functions to distribute items across the counter array, providing probabilistic guarantees on estimation accuracy.

### Template Design

The template-based approach allows for type-safe implementations:

```cpp
// include/cmsketch/count_min_sketch.h
template<typename KeyType>
class CountMinSketch {
public:
    CountMinSketch(size_t width, size_t depth);
    void Insert(const KeyType& key);
    size_t Count(const KeyType& key) const;
    std::vector<std::pair<KeyType, size_t>> TopK(size_t k, 
                                                 const std::vector<KeyType>& candidates) const;
    void Merge(const CountMinSketch& other);
    void Clear();
    
private:
    std::vector<std::vector<std::atomic<size_t>>> counters_;
    std::vector<std::function<size_t(const KeyType&)>> hash_functions_;
    size_t width_;
    size_t depth_;
};
```

This design ensures type safety while maintaining high performance through template specialization.

## Python Usage

The Python interface provides a clean, easy-to-use API:

```python
# examples/example.py
import cmsketch

# Create a sketch for strings
sketch = cmsketch.CountMinSketchStr(1000, 5)

# Add elements
sketch.insert("apple")
sketch.insert("apple")
sketch.insert("banana")

# Query frequencies
print(f"apple: {sketch.count('apple')}")    # 2
print(f"banana: {sketch.count('banana')}")  # 1
print(f"cherry: {sketch.count('cherry')}")  # 0

# Get top-k items
candidates = ["apple", "banana", "cherry"]
top_k = sketch.top_k(2, candidates)
for item, count in top_k:
    print(f"{item}: {count}")
```

### Type Support

The library provides specialized classes for different data types:

- `CountMinSketchStr`: String-based sketch
- `CountMinSketchInt`: Integer-based sketch

This approach optimizes performance for common use cases while maintaining the flexibility of the underlying C++ implementation.

## Performance Benchmarks

The C++ implementation provides significant performance improvements over Python, especially in multithreaded environments. Here are the actual benchmark results from our test suite:

### Benchmark Setup

The benchmark suite tests real-world scenarios with:

- **100,000 IP address samples** generated using Faker with weighted distribution
- **Realistic frequency patterns** (most frequent IP appears ~10% of the time)
- **Threaded processing** with 10 concurrent workers and 1,000-item batches
- **Comprehensive testing** across insert, count, top-k, and streaming operations

### Actual Benchmark Results

**Insert Performance (100k items, threaded)**:

- **C++**: 45.79ms (21.84 ops/sec)
- **Python**: 8,751.15ms (0.11 ops/sec)
- **Speedup**: **191x faster** with C++

**Count Performance (querying unique items)**:

- **C++**: 4.71μs per query (212,130 ops/sec)
- **Python**: 858.58μs per query (1,165 ops/sec)
- **Speedup**: **182x faster** with C++

**Top-K Performance (finding top items)**:

- **C++**: 2.57μs per operation (389,163 ops/sec)
- **Python**: 857.54μs per operation (1,166 ops/sec)
- **Speedup**: **334x faster** with C++

**Streaming Performance (insert + top-k)**:

- **C++**: 46.03ms (21.72 ops/sec)
- **Python**: 8,889.81ms (0.11 ops/sec)
- **Speedup**: **193x faster** with C++

### Performance Analysis

| Operation | C++ Time | Python Time | Speedup | Key Advantage |
|-----------|----------|-------------|---------|---------------|
| **Insert (100k threaded)** | 45.79ms | 8,751.15ms | **191x** | **GIL bypass + atomic operations** |
| **Count (per query)** | 4.71μs | 858.58μs | **182x** | **Direct memory access** |
| **Top-K (per operation)** | 2.57μs | 857.54μs | **334x** | **Optimized algorithms** |
| **Streaming (end-to-end)** | 46.03ms | 8,889.81ms | **193x** | **Combined benefits** |

### Running Benchmarks

```bash
# Run all benchmarks with pytest
uv run pytest ./benchmarks

# Run specific benchmark categories
uv run pytest ./benchmarks -k "insert"
uv run pytest ./benchmarks -k "count"
uv run pytest ./benchmarks -k "topk"

# Run with verbose output
uv run pytest ./benchmarks -v

# Generate test data (if needed)
uv run python ./benchmarks/generate_data.py
```

The benchmark suite uses pytest-benchmark for reliable measurements and includes both synthetic and real-world data patterns.

### Why C++ is So Much Faster

**1. GIL Bypass in Multithreaded Operations**

- **Python**: GIL serializes all operations, even with threading
- **C++**: Atomic operations allow true parallel processing
- **Result**: 191x speedup in threaded insertions

**2. Memory Access Patterns**

- **Python**: Object overhead, dynamic typing, garbage collection
- **C++**: Direct memory access, contiguous arrays, no GC overhead
- **Result**: 182x speedup in count operations

**3. Algorithm Optimization**

- **Python**: Interpreted bytecode, dynamic dispatch
- **C++**: Compiled machine code, template specialization
- **Result**: 334x speedup in top-k operations

**4. Thread Safety Implementation**

```python
# Python: Lock-based (serialized)
def insert_python(self, key):
    with self.lock:  # All threads wait here
        # ... increment counters
```

```cpp
// C++: Atomic operations (parallel)
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
    // All threads can execute simultaneously
    counters_[i][index].fetch_add(1, std::memory_order_relaxed);
}
```

**5. Memory Efficiency**

- **Python**: ~8 bytes per integer + object overhead
- **C++**: 4 bytes per atomic counter
- **Result**: 2-3x less memory usage

## Project Architecture

The project demonstrates modern software engineering practices:

### Build System

- **CMake**: Cross-platform C++ build configuration
- **scikit-build-core**: Modern Python build system for C++ extensions
- **pybind11**: Seamless C++ to Python binding generation
- **uv**: Fast, modern Python package management

### Project Structure

```
count-min-sketch/
├── include/cmsketch/                    # C++ header files
│   ├── cmsketch.h                      # Main header
│   ├── count_min_sketch.h              # Core template class
│   └── hash_util.h                     # Hash utilities
├── src/cmsketchcpp/                    # C++ source files
│   └── count_min_sketch.cc             # Core implementation
├── src/cmsketch/                       # Python package
│   ├── __init__.py                     # Package initialization
│   ├── _core.pyi                       # Type stubs
│   └── py/                             # Pure Python implementations
├── tests/                              # C++ unit tests
├── pytests/                            # Python tests
├── benchmarks/                         # Performance benchmarks
└── examples/                           # Example scripts
```

### CI/CD Pipeline

The project uses GitHub Actions for automated testing and publishing:

- **Cross-Platform Testing**: Windows, Linux, macOS
- **Wheel Building**: Automated wheel generation for all platforms
- **PyPI Publishing**: Automatic package distribution on release

## Educational Value

This project demonstrates several important software engineering concepts:

### 1. Python Package Development with C++ Extensions

- **pybind11 Integration**: Seamless C++ to Python binding generation
- **Type Stubs**: Complete type information for Python IDEs
- **Modern Build Tools**: scikit-build-core and uv for package management

### 2. Performance Engineering

- **C++ vs Python**: Direct performance comparison between implementations
- **Memory Efficiency**: Optimized data structures and memory usage patterns
- **Thread Safety**: Atomic operations and concurrent access patterns

### 3. Build System Integration

- **CMake**: Cross-platform C++ build configuration
- **Python Packaging**: Complete pip-installable package creation
- **CI/CD**: Automated testing and publishing workflows

### 4. Modern C++ Practices

- **Template Metaprogramming**: Generic, type-safe implementations
- **RAII**: Resource management and exception safety
- **STL Integration**: Standard library containers and algorithms

## Getting Started

### Installation

```bash
# Using pip
pip install cmsketch

# Using uv (recommended)
uv add cmsketch
```

### Basic Usage

```python
import cmsketch

# Create a sketch
sketch = cmsketch.CountMinSketchStr(1000, 5)

# Add elements
sketch.insert("apple")
sketch.insert("apple")
sketch.insert("banana")

# Query frequencies
print(f"apple: {sketch.count('apple')}")    # 2
print(f"banana: {sketch.count('banana')}")  # 1
```

### Building from Source

```bash
# Clone the repository
git clone https://github.com/isaac-fate/count-min-sketch.git
cd count-min-sketch

# Build everything
make build

# Run tests
make test

# Run example
make example
```

## Key Takeaways

Building Python packages with C++ extensions requires understanding several interconnected systems:

### 1. Project Structure

- **Clear separation** between C++ headers, implementation, and Python bindings
- **Logical organization** that scales from simple to complex projects
- **Build artifact management** to keep source control clean

### 2. Build System Integration

- **pyproject.toml** for modern Python packaging standards
- **CMakeLists.txt** for cross-platform C++ compilation
- **pybind11** for seamless C++ to Python binding generation

### 3. Development Workflow

- **Incremental development** with hot reloading during development
- **Comprehensive testing** at both C++ and Python levels
- **CI/CD automation** for cross-platform wheel building and publishing

### 4. Performance Benefits

- **191x speedup** in threaded insertions (GIL bypass)
- **182x speedup** in count operations (direct memory access)
- **334x speedup** in top-k operations (compiled optimization)
- **Atomic operations** enable true parallel processing without locks
- **Memory efficiency** through direct C++ data structure control

## Next Steps

To apply these techniques to your own projects:

1. **Start Simple**: Begin with a basic C++ function and Python binding
2. **Iterate Gradually**: Add complexity incrementally (templates, STL containers, etc.)
3. **Test Thoroughly**: Implement both C++ and Python test suites
4. **Automate Everything**: Set up CI/CD for automated building and testing
5. **Document Well**: Provide clear examples and API documentation

The complete source code, documentation, and benchmarks are available on [GitHub](https://github.com/isaac-fate/count-min-sketch), and the package is available on [PyPI](https://pypi.org/project/cmsketch/) for immediate use.

This approach to Python package development with C++ extensions provides a solid foundation for building high-performance libraries that combine the best of both worlds: Python's ease of use and C++'s performance.