Porting – Adding Support for Python 3¶
After you modernize your C extension to use the latest features available in Python 2, it is time to address the differences between Python 2 and 3.
The recommended way to port is keeping single-source compatibility between
Python 2 and 3, until support Python 2 can be safely dropped.
For Python code, you can use libraries like six
and future, and, failing that,
if sys.version_info >= (3, 0):
blocks for conditional code.
For C, the py3c library provides common tools, and for special cases you can use
conditional compilation with #if IS_PY3
.
To start using py3c, #include <py3c.h>
, and instruct your compiler to
find the header.
The Bytes/Unicode split¶
The most painful change for extension authors is the bytes/unicode split:
unlike Python 2’s str
or C’s char*
, Python 3 introduces a sharp divide
between human-readable strings and binary data.
You will need to decide, for each string value you use, which of these two
types you want.
Make the division as sharp as possible: mixing the types tends to lead to utter chaos. Functions that take both Unicode strings and bytes (in a single Python version) should be rare, and should generally be convenience functions in your interface; not code deep in the internals.
However, you can use a concept of native strings, a type that corresponds
to the str
type in Python: PyBytes on Python 2, and PyUnicode in Python 3.
This is the type that you will need to return from functions like __str__
and __repr__
.
Using the native string extensively is suitable for conservative projects: it affects the semantics under Python 2 as little as possible, while not requiring the resulting Python 3 API to feel contorted.
With py3c, functions for the native string type are PyStr_* (PyStr_FromString
,
PyStr_Type
, PyStr_Check
, etc.). They correspond to
PyString on Python 2,
and PyUnicode on Python 3.
The supported API is the intersection of PyString_*
and PyUnicode_*,
except PyStr_Size (see below) and the deprecated PyUnicode_Encode;
additionally PyStr_AsUTF8String is defined.
Keep in mind py3c expects that native strings are always encoded with utf-8
under Python 2. If you use a different encoding, you will need to convert
between bytes and text manually.
For binary data, use PyBytes_* (PyBytes_FromString
, PyBytes_Type
, PyBytes_Check
,
etc.). Python 3.x provides them under these names only; in Python 2.6+ they are
aliases of PyString_*. (For even older Pythons, py3c also provides these aliases.)
The supported API is the intersection of PyString_*
and PyBytes_*,
Porting mostly consists of replacing PyString_
to either PyStr_
or PyBytes_
; just see the caveat about size below.
To summarize the four different string type names:
String kind | py2 | py3 | Use |
---|---|---|---|
PyStr_* | PyString_* | PyUnicode_* | Human-readable text |
PyBytes_* | PyString_* | ✔ | Binary data |
PyUnicode_* | ✔ | ✔ | Unicode strings |
PyString_* | ✔ | error | In unported code |
String size¶
When dealing with Unicode strings, the concept of “size” is tricky, since the number of characters doesn’t necessarily correspond to the number of bytes in the string’s UTF-8 representation.
To prevent subtle errors, this library does not provide a PyStr_Size function.
Instead, use PyStr_AsUTF8AndSize()
. This functions like Python 3’s
PyUnicode_AsUTF8AndSize,
except under Python 2, the string is not encoded (as it should already be in UTF-8),
the size pointer must not be NULL, and the size may be stored even if an error occurs.
Ints¶
While string type is split in Python 3, the int is just the opposite:
int
and long
were unified. PyInt_*
is gone and only
PyLong_*
remains (and, to confuse things further, PyLong is named “int”
in Python code).
The py3c headers alias PyInt to PyLong, so if you’re using them,
there’s no need to change at this point.
Floats¶
In Python 3, the function PyFloat_FromString
lost its second, ignored argument.
The py3c headers redefine the function to take one argument even in Python 2. You will need to remove the excess argument from all calls.
Argument Parsing¶
The format codes for argument-parsing functions of the PyArg_Parse family have changed somewhat.
In Python 3, the s
, z
, es
, es#
and U
(plus the new C
)
codes accept only Unicode strings, while c
and S
only accept bytes.
Formats accepting Unicode strings usually encode to char* using UTF-8.
Specifically, these are s
, s*
, s#
, z
, z*
, z#
, and also
es
, et
, es#
, and et#
when the encoding argument is set to NULL.
In Python 2, the default encoding was used instead.
There is no variant of z
for bytes, which means htere’s no built-in way to
accept “bytes or NULL” as a char*
.
If you need this, write an O&
converter.
Python 2 lacks an y
code, which, in Python 3, works on byte objects.
The use cases needing bytes
in Python 3 and str
in Python 2 should be
rare; if needed, use #ifdef IS_PY3
to select a compatible PyArg_Parse call.
Defining Extension Types¶
If your module defines extension types, i.e. variables of type PyTypeObject
(and related structures like PyNumberMethods
and PyBufferProcs
),
you might need to make changes to these definitions.
Please read the Extension types guide for details.
A common incompatibility comes from type flags, like
Py_TPFLAGS_HAVE_WEAKREFS
and Py_TPFLAGS_HAVE_ITER
,
which are removed in Python 3 (where the functionality is always present).
If you are only using these flags in type definitions,
(and not for example in PyType_HasFeature()
),
you can include <py3c/tpflags.h>
to define them to zero under Python 3.
For more information, read the Type flags section.
Module initialization¶
The module creation process was overhauled in Python 3. py3c provides a compatibility wrapper so most of the Python 3 syntax can be used.
PyModuleDef and PyModule_Create¶
Module object creation with py3c is the same as in Python 3.
First, create a PyModuleDef structure:
static struct PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT, /* m_base */
"spam", /* m_name */
NULL, /* m_doc */
-1, /* m_size */
spam_methods /* m_methods */
};
Then, where a Python 2 module would have
m = Py_InitModule3("spam", spam_methods, "Python wrapper ...");
use instead
m = PyModule_Create(&moduledef);
For m_size
, use -1. (If you are sure the module supports multiple
subinterpreters, you can use 0, but this is tricky to achieve portably.)
Additional members of the PyModuleDef structure are not accepted under Python 2.
See Python documentation for details on PyModuleDef and PyModule_Create.
Module creation entrypoint¶
Instead of the void init<name>
function in Python 2, or a Python3-style
PyObject *PyInit_<name>
function, use the MODULE_INIT_FUNC macro
to define an initialization function, and return the created module from it:
MODULE_INIT_FUNC(name)
{
...
m = PyModule_Create(&moduledef);
...
if (error) {
return NULL;
}
...
return m;
}
Comparisons¶
Python 2.1 introduced rich comparisons for custom objects, allowing separate
behavior for the ==
, !=
, <
, >
, <=
, >=
operators,
rather than calling one __cmp__
function and interpreting its result
according to the requested operation.
(See PEP 207 for details.)
In Python 3, the original __cmp__
-based object comparison is removed,
so all code needs to switch to rich comparisons. Instead of a
static int cmp(PyObject *obj1, PyObject *obj2)
function in the tp_compare
slot, there is now a
static PyObject* richcmp(PyObject *obj1, PyObject *obj2, int op)
in the tp_richcompare
slot. The op
argument specifies the comparison
operation: Py_EQ
(==), Py_GT
(>), Py_LE
(<=), etc.
Additionally, Python 3 brings a semantic change. Previously, objects of disparate types were ordered according to type, where the ordering of types was undefined (but consistent across, at least, a single invocation of Python). In Python 3, objects of different types are unorderable. It is usually possible to write a comparison function that works for both versions by returning NotImplemented to explicitly fall back to default behavior.
To help writing rich comparisons, Python 3.7+ provides a convenience macro,
Py_RETURN_RICHCOMPARE
, which returns the right
PyObject *
result based on two values orderable by C’s comparison operators.
With py3c, the macro is available for older versions as well.
A typical rich comparison function will look something like this:
static PyObject* mytype_richcmp(PyObject *obj1, PyObject *obj2, int op)
{
if (mytype_Check(obj2)) {
Py_RETURN_RICHCOMPARE(get_data(obj1), get_data(obj2), op);
}
Py_RETURN_NOTIMPLEMENTED;
}
where get_data
returns an orderable C value (e.g. a pointer or int), and
mytype_Check checks if get_data
is of the correct type
(usually via PyObject_TypeCheck). Note that the first argument, obj1,
is guaranteed to be of the type the function is defined for.
If a “cmp”-style function is provided by the C library, compare its result to 0, e.g.
Py_RETURN_RICHCOMPARE(mytype_cmp(obj1, obj2), 0, op)
The Py_RETURN_RICHCOMPARE
and
Py_RETURN_NOTIMPLEMENTED
macros are provided in Python 3.7+
and 3.3+, respectively;
py3c makes them available to older versions as well.
If you need more complicated comparison, use the Py_UNREACHABLE
macro for unknown operation types (op
).
The macro is was added in Python 3.7+, and py3c backports it.
Note
The tp_richcompare
slot is inherited in subclasses together with
tp_hash
and (in Python 2) tp_compare
: iff
the subclass doesn’t define any of them, all are inherited.
This means that if a class is modernized, its subclasses don’t have to be,
unless the subclass manipulates compare/hash slots after
class creation (e.g. after the PyType_Ready
call).
Note
For backwards compatibility with previous versions of itself,
py3c provides the PY3C_RICHCMP
macro,
an early draft of what became Py_RETURN_RICHCOMPARE
.
The File API¶
The PyFile
API was severely reduced
in Python 3
.
The new version is specifically intended for internal error reporting
in Python.
Native Python file objects are officially no longer backed by FILE*
.
Use the Python API from the io
module instead of handling files
in C. The Python API supports all kinds of file-like objects, not just
built-in files – though, admittedly, it’s cumbersome to use from plain C.
If you really need to access an API that deals with FILE*
only
(e.g. for debugging), see py3c’s limited file API shim.
Py_FindMethod and Generic Attributes¶
While the actual need for type-specific attribute handlers almost completely
disappeared starting with Generic Attribute support in Python 2.2,
there may still be old code that uses a custom tp_getattr
implementation to
return methods for a user-defined type.
The following example snippet uses Py_FindMethod
from a tp_getattr
function to return custom methods for a type:
static struct PyMethodDef mytype_methods[] = {
{"my_method", (PyCFunction)mytype_example, METH_VARARGS, "docstring"},
{NULL, NULL},
};
static PyObject* mytype_getattr(mytype* self, char* name)
{
return Py_FindMethod(mytype_methods, (PyObject*)self, name);
}
A tp_getattr
function like the one above can be eliminated. A pointer to
PyObject_GenericGetAttr
can be set in the tp_getattro
field, rather than implementing a custom
tp_getattr
function ourselves, as long as we we also set the tp_methods
struct field to the mytype_methods
array.
- Set the
tp_methods
struct field to themytype_methods
PyMethodDef
array. - Set the
tp_getattr
PyTypeObject
struct field, which previously was set to the custommytype_getattr
function, to NULL. - Set the
tp_getattro
struct field toPyObject_GenericGetAttr
. - Delete the custom
mytype_getattr
function.
Other changes¶
If you find a case where py3c doesn’t help, use #if IS_PY3
to include
code for only one or the other Python version.
And if your think others might have the same problem,
consider contributing a macro and docs to py3c!
Building¶
When building your extension, note that Python 3.2 introduced ABI version tags
(PEP 3149), which can be added
to shared library filenames to ensure that the library is loaded with the
correct Python version. For example, instead of foo.so
, the shared library
for the extension module foo
might be named foo.cpython-33m.so
.
Your buildsystem might generate these for you already, but if you need to
modify it, you can get the tags from sysconfig
:
>>> import sysconfig
>>> sysconfig.get_config_var('EXT_SUFFIX')
'.cpython-34m.so'
>>> sysconfig.get_config_var('SOABI')
'cpython-34m'
This is completely optional; the old filenames without ABI tags are still valid.