Cython Note

本文记录 Cython 相关的笔记

基础

Cython 的本质可以总结如下:Cython 是包含 C 数据类型的 Python。

Cython 是 Python:几乎所有 Python 代码都是合法的 Cython 代码。 (存在一些限制,但是差不多也可以。) Cython 的编译器会转化 Python 代码为 C 代码,这些 C 代码均可以调用 Python/C 的 API。

Cython 可不仅仅包含这些,Cython 中的参数和变量还可以以 C 数据类型来声明。代码中的 Python 值和 C 的值可以自由地交叉混合(intermixed)使用, 所有的转化都是自动进行。Python 中的引用计数维护(Reference count maintenance)和错误检查(error checking)操作同样是自动进行的,并且全面支持 Python 的异常处理工具(facilities),包括 try-except 和 try-finally,即便在其中操作 C 数据都是可以的。

There are three file types in Cython:

The implementation files, carrying a .py or .pyx suffix.

The definition files, carrying a .pxd suffix.

The include files, carrying a .pxi suffix.

Declaring Data Types

C variables can be declared by

  • using the Cython specific cdef statement,

  • using PEP-484/526 type annotations with C data types or

  • using the function cython.declare().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
cdef int a_global_variable

def func():
    cdef int i, j, k
    cdef float f, g[42], *h

cdef struct Grail:
    int age
    float volume

cdef union Food:
    char *spam
    float *eggs

cdef enum CheeseType:
    cheddar, edam,
    camembert

cdef enum CheeseState:
    hard = 1
    soft = 2
    runny = 3
1
2
3
ctypedef unsigned long ULong

ctypedef int* IntPtr

You can create a C function by declaring it with cdef or by decorating a Python function with @cfunc:

1
2
cdef int eggs(unsigned long l, float f):
    ...

Classes can be declared as Extension Types.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from __future__ import print_function


cdef class Shrubbery:
    cdef int width
    cdef int height

    def __init__(self, w, h):
        self.width = w
        self.height = h

    def describe(self):
        print("This shrubbery is", self.width,
              "by", self.height, "cubits.")

If you have a series of declarations that all begin with cdef, you can group them into a cdef block like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from __future__ import print_function

cdef:
    struct Spam:
        int tons

    int i
    float a
    Spam *p

    void f(Spam *s) except *:
        print(s.tons, "Tons of spam")

If no type is specified for a parameter or return value, it is assumed to be a Python object. (Note that this is different from the C convention, where it would default to int.) For example, the following defines a C function that takes two Python objects as parameters and returns a Python object

1
2
cdef spamobjs(x, y):
    ...

The type name object can also be used to explicitly declare something as a Python object. This can be useful if the name being declared would otherwise be taken as the name of a type, for example,

1
2
cdef object ftang(object int):
    ...

Differences between C and Cython expressions

There are some differences in syntax and semantics between C expressions and Cython expressions, particularly in the area of C constructs which have no direct equivalent in Python.

An integer literal is treated as a C constant, and will be truncated to whatever size your C compiler thinks appropriate. To get a Python integer (of arbitrary precision), cast immediately to an object (e.g. 100000000000000000000 or cast(object, 100000000000000000000)). The L, LL, and U suffixes have the same meaning in Cython syntax as in C.

There is no -> operator in Cython. Instead of p->x, use p.x

There is no unary * operator in Cython. Instead of *p, use p[0]

There is an & operator in Cython, with the same semantics as in C. In pure python mode, use the cython.address() function instead.

The null C pointer is called NULL, not 0. NULL is a reserved word in Cython and special object in pure python mode.

Type casts are written value or cast(type, value), for example,

1
2
3
4
cdef char* p
cdef float* q

p = <char*>q

Define a function type using ctypedef

1
ctypedef void (*function_type_nake)(int, int)

equal to

1
typedef void (*function_type_name)(int, int)

def, cdef, cpdef

def - Basically, it’s Python

def is used for code that will be:

  • Called directly from Python code with Python objects as arguments.
  • Returns a Python object. The generated code treats every operation as if it was dealing with Python objects with Python consequences soincurs a high overhead.

cdef - Basically, it’s C

cdef is used for Cython functions that are intended to be pure ‘C’ functions. All types must be declared. Cython aggressively optimises the the code and there are a number of gotchas.

cdef declared functions are not visible to Python code that imports the module. Take some care with cdef declared functions; it looks like you are writing Python but actually you are writing C.

cpdef - It’s Both

cpdef functions combine both def and cdef by creating two functions; a cdef for C types and a def fr Python types.

Notice

  1. the dereference operator* can't be used in Cython. Instead you need to import dereference from the cython.operator module. When you want to access the object at the pointed address, you should write dereference(pointer).
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import cython
cimport cython
from licpp.map import map as mapcpp
from cython.operator import dereference, postincrement

def it_through_map(dict mymap_of_int_int):
  # python dict to map
  cdef mapcpp[int,int] mymap_in = mymap_of_int_int
  cdef mapcpp[int,int].iterator it = mymap_in.begin()

  while(it != mymap.end()):
    # let's pretend here I just want to print the key and the value
    print(dereference(it).first) # print the key
    print(dereference(it).second) # print the associated value
    postincrement(it) # Increment the iterator to the net element

The map iterator doesn't have elements first and second. Instead it has a operator* which returns a pair reference. In C++ you can use it->first to do this in one go, but that syntax doesn't work in Cython (and it isn't intelligent enough to decide to use -> instead of . itself in this case).

1
2
from cython.operator cimport dereference
print(dereference(it).first)

样例

Hello World

创建 helloworld.pyx

1
print "Hello World"

创建 setup.py,它是一个类似 Python Makefile 的文件

1
2
3
4
5
6
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)

构建你的 Cython 文件:

1
python setup.py build_ext --inplace

运行完上述命令会在你的当前目录生成一个新文件,如果你的系统是 Unix,文件名为 helloworld.so,如果你的系统是 Windows,文件名为 helloworld.pyd.

1
2
import helloworld
# Hello World

如果你的模块不需要额外的 C 库活特殊的构件安装,那你可以在 import 时使用 Paul Prescod 和 Stefan Behnel 编写的 pyximport 模块来直接读取 .pyx 文件,而不需要编写 setup.py 文件。 它随同 Cython 一并发布和安装,你可以这样使用它:

1
2
3
import pyximport; pyximport.install()
import helloworld
# Hello World

using C++ in Cython

Cython has native support for most of the C++ language. Specifically:

  • C++ objects can be dynamically allocated with new and del keywords.
  • C++ objects can be stack-allocated.
  • C++ classes can be declared with the new keyword cppclass.
  • Templated classes and functions are supported.
  • Overloaded functions are supported.
  • Overloading of C++ operators (such as operator+, operator[],…) is supported.

The general procedure for wrapping a C++ file can now be described as follows:

  • Specify C++ language in a setup.py script or locally in a source file.
  • Create one or more .pxd files with cdef extern from blocks and (if existing) the C++ namespace name. In these blocks:
    • declare classes as cdef cppclass blocks
    • declare public names (variables, methods and constructors)
  • cimport them in one or more extension modules (.pyx files).

example

Rectangle.h:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#ifndef RECTANGLE_H
#define RECTANGLE_H

namespace shapes {
    class Rectangle {
        public:
            int x0, y0, x1, y1;
            Rectangle();
            Rectangle(int x0, int y0, int x1, int y1);
            ~Rectangle();
            int getArea();
            void getSize(int* width, int* height);
            void move(int dx, int dy);
    };
}

#endif

Rectangle.cpp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include "Rectangle.h"

namespace shapes {

    // Default constructor
    Rectangle::Rectangle () {}

    // Overloaded constructor
    Rectangle::Rectangle (int x0, int y0, int x1, int y1) {
        this->x0 = x0;
        this->y0 = y0;
        this->x1 = x1;
        this->y1 = y1;
    }

    // Destructor
    Rectangle::~Rectangle () {}

    // Return the area of the rectangle
    int Rectangle::getArea () {
        return (this->x1 - this->x0) * (this->y1 - this->y0);
    }

    // Get the size of the rectangle.
    // Put the size in the pointer args
    void Rectangle::getSize (int *width, int *height) {
        (*width) = x1 - x0;
        (*height) = y1 - y0;
    }

    // Move the rectangle by dx dy
    void Rectangle::move (int dx, int dy) {
        this->x0 += dx;
        this->y0 += dy;
        this->x1 += dx;
        this->y1 += dy;
    }
}

Rectangle.pxd.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cdef extern from "Rectangle.cpp":
    pass

# Declare the class with cdef
cdef extern from "Rectangle.h" namespace "shapes":
    cdef cppclass Rectangle:
        Rectangle() except +
        Rectangle(int, int, int, int) except +
        int x0, y0, x1, y1
        int getArea()
        void getSize(int* width, int* height)
        void move(int, int)

Declare a var with the wrapped C++ class

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# distutils: language = c++

from Rectangle cimport Rectangle

def main():
    rec_ptr = new Rectangle(1, 2, 3, 4)  # Instantiate a Rectangle object on the heap
    try:
        rec_area = rec_ptr.getArea()
    finally:
        del rec_ptr  # delete heap allocated object

    cdef Rectangle rec_stack  # Instantiate a Rectangle object on the stack

call c++ function in python

mytanh.cpp 中编写 c++ 代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#include <cmath>

const double e = 2.7182818284590452353602874713527;

double mysinh(double x)
{
    return (1 - pow(e, (-2 * x))) / (2 * pow(e, -x));
}

double mycosh(double x)
{
    return (1 + pow(e, (-2 * x))) / (2 * pow(e, -x));
}

double mytanh(double x)
{
    return mysinh(x) / mycosh(x);
}

C++ 的函数已经重写好了,下面要将 .cpp 代码进行一些“包装”,使 Python 能够调用它。这个“包装”的工作就是通过 Cython 进行.Cython 使用后缀名为 .pyx 和 .pxd 的文件,它们也是代码文件。.pyx 类似于 .cpp,.pxd 类似于 .h。下面进行“包装工作”,我们先不使用 .pyd 文件。

新建一个fast_tanh.pyx 文件,文件内容如下。

1
2
3
4
5
6
7
8
# distutils: language = c++
# cython: language_level = 3

cdef extern from "mytanh.cpp":
    double mytanh(double x)

def fast_tanh(double x):
    return mytanh(x)

前两行注释是用于配置编译器的特殊注释,分说明了使用的是 C++ 和 Python3。使用 cdef extern from 来声明一个在 C++ 中实现的函数。上述代码声明了 mytanh 函数,使其可以在 Cython 中使用。虽然 mytanh 现在可以在 Cython 中直接调用了,但 Python 并不能直接调用该函数,因此还要声明一个接口函数,命名为 fast_tahn。

fast_tanh.pyx 编写完后,需要编译后才能被 Python 调用,编译是通过 setup.py 进行的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from distutils.core import setup, Extension
from Cython.Build import cythonize

setup(ext_modules=cythonize(Extension(
    'fast_tanh',                            # 生成的模块名称
    sources=['fast_tanh.pyx'],              # 要编译的文件
    language='c++',                         # 使用的语言
    include_dirs=[],                        # gcc的-I参数
    library_dirs=[],                        # gcc的-L参数
    libraries=[],                           # gcc的-l参数
    extra_compile_args=[],                  # 附加编译参数
    extra_link_args=[],                     # 附加链接参数
)))

编译 Cython 文件

python setup.py build_ext --inplace

编译时的 Python 版本必须和调用时使用的 Python 版本相同。编译完成后,当前目录下会自动生成相应的 cpp 文件和 pyd 文件,在 Linux 上是 so 文件。

完成了编译的步骤后,fast_tanh在 Python 中就和一个普通的 Python 模块一样,可以使用 import 来导入

1
from fast_tanh import fast_tanh

call c++ class in python

在 C++ 中编写了一个矩形类。头文件 Rectangle.h 为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#ifndef RECTANGLE_H
#define RECTANGLE_H

namespace shapes
{
    class Rectangle
    {
        public:
            int x0, y0, x1, y1;      // 矩形对角线上的两个点坐标
            Rectangle();
            Rectangle(int x0, int y0, int x1, int y1);
            ~Rectangle();
            int getArea();
            void getSize(int* width, int* height);
            void move(int dx, int dy);
    };
}
#endif

Rectangle.cpp 中的实现为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include "Rectangle.h"
namespace shapes {
    // 构造函数
    Rectangle::Rectangle () {}
    Rectangle::Rectangle (int x0, int y0, int x1, int y1) {
        this->x0 = x0;
        this->y0 = y0;
        this->x1 = x1;
        this->y1 = y1;
    }
    // 析构函数
    Rectangle::~Rectangle () {}
    // 获取矩形面积
    int Rectangle::getArea () {
        return (this->x1 - this->x0) * (this->y1 - this->y0);
    }
    // 获取矩形的边长
    void Rectangle::getSize (int *width, int *height) {
        (*width) = x1 - x0;
        (*height) = y1 - y0;
    }
    // 移动矩形
    void Rectangle::move (int dx, int dy) {
        this->x0 += dx;
        this->y0 += dy;
        this->x1 += dx;
        this->y1 += dy;
    }
}

在 Cython 中声明类,使用 cdef extern from 来声明一个在 C++ 中实现的类:

1
cdef extern from "Rectangle.h" namespace "shapes":

若没有命名空间,则使用:

cdef extern from "Rectangle.h"

将声明放在 Rectangle.pxd 文件中

cdef extern from "Rectangle.cpp":
    pass
# 用cdef声明类
cdef extern from "Rectangle.h" namespace "shapes":
    cdef cppclass Rectangle:
        Rectangle() except +
        Rectangle(int, int, int, int) except +
        int x0, y0, x1, y1
        int getArea()
        void getSize(int* width, int* height)
        void move(int, int)

由于 .h 文件中没有实现矩形类,还要使用下面的语句来包含 Rectangle.cpp 中实现的代码

cdef extern from "Rectangle.cpp":
    pass

cdef cppclass Rectangle 声明了一个在 C++ 中定义的类,其他函数的声明与前面调用函数类似。在构造函数后加上 except + 可以使 Python 能够捕获到在构造函数中发生的异常,若不加 except +,则 Cython 不会处理构造函数中发生的异常。

在 Cython 中编写接口类,C++ 类的声明放在 .pxd 文件中, 接口类的实现放在 .pyx 中。PyRectangle.pyx为 :

# distutils: language = c++
from Rectangle cimport Rectangle

# 接口类
# Python可以直接访问接口类,接口类可以直接访问C++类
cdef class PyRectangle:
    cdef Rectangle c_rect    # 存储C++对象
    def __cinit__(self, int x0, int y0, int x1, int y1):
        self.c_rect = Rectangle(x0, y0, x1, y1)
    def get_area(self):
        return self.c_rect.getArea()
    def get_size(self):
        cdef int width, height
        self.c_rect.getSize(&width, &height)
        return width, height
    def move(self, dx, dy):
        self.c_rect.move(dx, dy)

PyRectangle 类就像普通的 Python 类一样可以直接在 Python 中调用了。

Cython 也支持使用 new 创建 C++ 对象

def __cinit__(self, int x0, int y0, int x1, int y1):
    self.c_rect = new Rectangle(x0, y0, x1, y1)

与 C++ 相同,使用了 new 就必须使用 delete 释放内存,否则会造成内存泄漏。

def __dealloc__(self):    # 析构函数
    del self.c_rect       # 释放内存

编写 setup.py 如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from distutils.core import setup, Extension
from Cython.Build import cythonize

setup(ext_modules=cythonize(Extension(
    'PyRectangle',                          # 生成的模块名称
    sources=['PyRectangle.pyx'],            # 要编译的文件
    language='c++',                         # 使用的语言
    include_dirs=[],                        # gcc的-I参数
    library_dirs=[],                        # gcc的-L参数
    libraries=[],                           # gcc的-l参数
    extra_compile_args=[],                  # 附加编译参数
    extra_link_args=[],                     # 附加链接参数
)))

使用 python setup.py build_ext --inplace 编译

现在,PyRectangle 类就和普通的 Python 类一样,可以直接被 Python 调用

1
2
3
4
5
import PyRectangle

x0, y0, x1, y1 = 1, 2, 3, 4
rect = PyRectangle.PyRectangle(x0, y0, x1, y1)
print(rect.get_area())

使用 c++ string

需要在 pyx 和 pyd 引入以下语句

1
from libcpp.string cimport string

后面即可以正常的使用 string

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# distutils: language = c++

from Rectangle cimport Rectangle
from libcpp.string cimport string

cdef class PyRectangle:
    cdef Rectangle c_rect    # 存储C++对象
    #def __cinit__(self, int x0, int y0, int x1, int y1):
        #self.c_rect = Rectangle(x0, y0, x1, y1)
        #self.c_rect = Rectangle(x0, y0, x1, y1)
    def __cinit__(self, string msg):
        self.c_rect = Rectangle(msg)
    def get_area(self):
        return self.c_rect.getArea()
    def get_size(self):
        cdef int width, height
        self.c_rect.getSize(&width, &height)
        return width, height
    def move(self, dx, dy):
        self.c_rect.move(dx, dy)

在 python 调用的时候传入 string 需要是 b"hello" 的类型

字符串的转换

1
bytes(s, 'ascii')

cython 编译配置

配置 cython 使用 g++ 进行编译,在 setuu.py 中添加

import os
os.environ["CC"] = "g++"

cython 的编译分为三步:

  1. 使用 cythonize 编译生成相应的 .cpp 文件
cythonize PyRectangle.pyx
  1. 使用 gcc 编译该 .cpp 文件生成 .o 文件
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -I/usr/include/python3.8 -c PyRectangle.cpp -o build/temp.linux-x86_64-3.8/PyRectangle.o
  1. 使用 g++ 编译该 .o 文件生成 .so 文件
g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/PyRectangle.o -o /home/liudy/Workspace/Test/python/PyRectangle.cpython-38-x86_64-linux-gnu.so

释放 gil 锁

在使用外部的 c,c++ 函数,我们可以声明为 nogil 的形式:

1
2
3
cdef extern from "math.h"
  double sin(double x) nogil
  doubel cos(doubel x) nogil

或者直接声明为:

1
2
3
cdef extern from "math.h" nogil
  double sin(double x)
  doubel cos(doubel x)

一旦 GIL 被释放,那么便可以独立地执行 c 代码,而之后要重新和 python 对象交互,则再度获取 GIL,这里我们需要使用上下文管理器

1
2
3
4
5
6
7
8
9
cdef double func(int a, double b) nogil:
  return <double> a + b

def add(int a, double b):
  cdef doubel res
  with nogil:
    res = 0
    res = func(a, b)
  return res

Notice

参数传递拷贝

在 python 中申明的 str 类型传入 pyx 的函数内会出现一次数据的拷贝,然后从 pyx 内调用 c++ 函数又会出现一次数据的拷贝。

在 python 中进行函数参数传递时,str 类型也是按值传递的,但是如果我们把它放进一个 list 中再传递的时候,就是按引用传递的,不会进行数据的拷贝。但是该方法在传递给 pyx 时依旧无效。

指针的解引操作

由于 Python 语言已经使用*args**kwargs语法来允许任意位置和关键字参数并支持函数参数解包,因此 Cython 不支持 *

*解引语法是 C 指针的语法。 取而代之的是,我们在位置 0 的指针处建立索引,以解引 Cython 中的指针的引用。

1
2
3
4
5
cdef double k = 5
cdef double* pd = &k
pd[0] = 17
print("k address:", id(k))
print("k=", k)

Reference

  1. cython userguide
  2. python 调用 c++
  3. Cython 3.0 中文文档
  4. 在 Cython 中使用 C ++
  5. Cython’s Documentation
  6. ctypes document
updatedupdated2022-05-132022-05-13