CData Reference Manual


Matt Wette
October 2024

This is a user guide for the CData module for Guile.


Introduction

The cdata module ((nyacc foreign cdata) and its partner arch-info ((nyacc foreign arch-info) provide a way to work with data originating from C libraries. We hope the module is reasonably easy to understand and use. Size and alignment is tracked for all types. Types are classified into the following kinds: base, struct, union, array, pointer, enum and function. The procedures cbase, cstruct, cunion, cpointer, carray, cenum and cfunction generate ctype objects, and the procedure make-cdata will generate data objects based on these. The underlying bits of data are stored in Scheme bytevectors. Access to component data is provided by the cdata-ref procedure and mutation is accomplished via the cdata-set! procedure. The modules support non-native machine architectures via the global parameter *arch*.

Beyond size and alignment, base type objects carry a symbolic tag to determine the appropriate low level machine type. The low level machine types map directly to bytevector setters and getters. Support for C base types is handled by the cbase procedure which converts them to underlying types. For example, for a 64 bit little endian machine architecture, (cbase 'uintptr_t) would generate a type with underlying symbol u64le.

Here is a simple example of using cdata for structures:

(use-modules (system foreign))
(use-modules (system foreign-library))
(use-modules (nyacc foreign cdata))

(define timeval_t
  (cstruct `((tv_sec ,(cbase 'long)) (tv_usec ,(cbase 'long)))))

(define gettimeofday
  (foreign-library-function
   #f "gettimeofday"
   #:return-type (ctype->ffi (cbase 'int))
   #:arg-types (map ctype->ffi
                    (list (cpointer timeval_t)
                          (cpointer 'void)))))

(define d1 (make-cdata timeval_t))
(gettimeofday (cdata-ref (cdata& d1)) %null-pointer)
(format #t "time: ~s ~s\n"
        (cdata-ref d1 'tv_sec) (cdata-ref d1 'tv_usec))
time: 1719062561 676365

In the above cdata& generates a cpointer type for d1 and cdata-ref extracts the Guile (pointer) value.

Basic Usage

This section provides an introduction to procedures you are likely to want on your first approach.

Procedure: cbase name => <ctype>

Given symbolic name, generate a base ctype. The name can be something like unsigned, double, or can be a cdata machine type like u64le. For example,

(define double-type (cbase 'double))

There is a pseudo-type void.

Procedure: cpointer type => <ctype>

Generate a C pointer type for type. To reference or de-reference cdata object see cdata& and cdata*. type can be the symbol void or a symbolic name used as argument to cbase.

(define foo_t (cbase 'int))
(cpointer (delay foo_t))
Procedure: cstruct fields [packed] => ctype

Construct a struct ctype with given fields. If packed, #f by default, is #t, create a packed structure. fields is a list with entries of the form (name type) or (name type lenth) where name is a symbol or #f (for anonymous structs and unions), type is a <ctype> object or a symbol for a base type and length is the length of the associated bitfield.

Procedure: cunion fields => <ctype>

Construct a ctype union type with given fields. See cstruct for a description of the fields argument.

Procedure: carray type n => <ctype>

Create an array of type with length. If length is zero, the array length is unbounded: it’s length can be specified as argument to make-cdata.

Procedure: cenum enum-list [packed] => <ctype>

enum-list is a list of name or name-value pairs

(cenum '((a 1) b (c 4))

If packed is #t the size will be smallest that can hold it, as if defined in C with __attribute__((packed)).

Procedure: cfunction proc->ptr ptr->proc [variadic?] => <ctype>

Generate a C function type to be used with cpointer. The arguments proc->ptr and ptr->proc are procedures that convert a procedure to a pointer, and pointer to procedure, respectively. The optional argument #:variadic, if #t, indicates the function uses variadic arguments. For this case (I need to add documention). Here is an example:

(define (f-proc->ptr proc)
  (ffi:procedure->pointer ffi:void proc (list)))
(define (f-ptr->proc fptr)
  (ffi:pointer->procedure ffi:void fptr (list)))
(define ftype (cpointer (cfunction f-proc->ptr f-ptr->proc)))

The thinking here is that a cfunction type is a proxy for a C function in memory, with a getter and setter to read from or write to memory.

Procedure: make-cdata type [value]

Generate a cdata object of type type with optional value. If value is not provided, the object is zeroed. As a special case, a positive integer arg to a zero-sized array type will allocate storage for that many items, associating it with an array type of that size.

Procedure: make-cdata/* type pointer

Make a cdata object from a pointer. That is, instead of creating a bytevector to hold the data use the memory at the pointer using pointer->bytevector.

Procedure: cdata-ref data [tag ...] => value

Return the Scheme (scalar) slot value for selected tag ... with respect to the cdata object data.

(cdata-ref my-struct-value 'a 'b 'c))

This procedure returns Guile values for cdata kinds base, pointer, procedure, array (an array) and struct (an alist). For union an exception is raised. The returned values are freshly allocated copies. If want a cdata object, use cdata-sel.

Procedure: cdata-set! data value [tag ...]

Set slot for selcted tag ... with respect to cdata data to value. Example:

(cdata-set! my-struct-data 42 'a 'b 'c))

If value is a <cdata> object then copy that (if types match).
The value argument can be a Scheme procedure when the associated ctype is a pointer to function.

Values accepted by cdata-set! and make-cdata are as follows, based on the cdata target ctype.

  1. If the type is a base machine type, then the argument must be an associated Scheme numeric type.
  2. If the type is a pointer type, the value can be a Guile pointer, an integer (address), a string, or a procedure (for the case where the type is a function pointer).
  3. If the type is an array type, TO BE CONTINUED.
  4. If the type is a function type, then an error is returned. See above for pointer (to function) types.
  5. If a struct, and the value is

In addition, if the value is of ctype, a copy of the underlying bytevector contents will be performed. An error will be thrown if the types are not equal. Also, if the value argument to make-cdata is an integer and the type argument is an array of length zero, then space is allocated to accomodate that length array. Here are some examples:

> (define tri1-t (carray (carray (cbase 'double) 2) 3))
> (define tval1 (make-cdata tri1-t #2f64((1.0 2.0) (3.0 4.0) (5.0 6.0))))
> (cdata-ref tval1)
$1 = #2f64((1.0 2.0) (3.0 4.0) (5.0 6.0))
> (cdata-set! tval1 1.5 0 1)
> (cdata-ref tval1)
$2 = #2f64((1.0 1.5) (3.0 4.0) (5.0 6.0))
>

and

> (define loc-t (cstruct `((x double) (y double))) 3)
> (define tri2-t (carray loc-t 3))
> (define tval2 (make-cdata tri2-t (vector '((x . 1.0) (y . 2.0))
                                           '((x . 3.0) (y . 4.0))
                                           '((x . 2.0) (y . 5.0)))))
> tval2
$3 = #<cdata array 0x79ae9a077200>
> (cdata-ref tval2)
$4 = #(((x . 1.0) (y . 2.0)) ((x . 3.0) (y . 4.0)) ((x . 2.0) (y . 5.0)))
> (cdata-set! tval2 1.5 0 'x)
> (cdata-ref tval2)
$5 = #(((x . 1.5) (y . 2.0)) ((x . 3.0) (y . 4.0)) ((x . 2.0) (y . 5.0)))
Procedure: cdata& data => cdata

Generate a reference (i.e., cpointer) to the contents in the underlying bytevector.

Procedure: cdata* data => cdata

De-reference a pointer. Returns a cdata object representing the contents at the address in the underlying bytevector.

Notes

Digression on Garbage Collection

Before going further we remind you that the underlying datastructure is bytevectors. Now, since bytevectors in Guile are not searched for pointers during garbage collection there is a risk that the objects being referenced might be collected during usage. A systematic method to prevent this is work to go. One might try to use cdata& in the following way keep intermediate values from being collected.

  (let ((val (make-cdata foo_t))
        (ptr (cdata& val)))
     (bar ptr))

Going Further

Procedure: cdata-sel data tag ... => cdata

Return a new cdata object representing the associated selection. Note this is different from cdata-ref: it always returns a cdata object. For example,

> (define t1 (cstruct '((a int) (b double))))
> (define d1 (make-cdata t1))
> (cdata-set! d1 42 'a)
> (cdata-sel d1 'a)
$1 = #<cdata s32le 0x77bbf8e52260>
> (cdata-ref $1)
$2 = 42
Procedure: cdata&-ref data [tag ...] => value

Shortcut for (cdata-ref (cdata& data tag ...)) This always returns a Guile pointer.

Procedure: cdata*-ref data [tag ...] => value

Shortcut for (cdata-ref (cdata* data tag ...))

Underneath, cdata data is a triple of bytevector, offset, and ctype. In some cases it may be more efficient to work with the deconstructed triple. The following two procedures are provided for working at that level, in addition to the ctype-sel procedure described below.

Procedure: Xcdata-ref bv ix ct -> value

Reference a deconstructed cdata object. See cdata-ref.

Procedure: Xcdata-set! bv ix ct value

Set the value of a deconstructed cdata object. See cdata-set!.

Working with Types

Procedure: name-ctype name type -> <ctype>

Create a new named version of the type. The name is useful when the type is printed. This procedure does not mutate: a new type object is created. If a specific type is used by multiple names the names can share the underlying type guts. The following examples shows how one type can have two names:

(define raw (cstruct '((a 'int) (b 'double))))
(define foo_t (name-ctype 'foo_t raw))
(define struct-foo (name-ctype 'struct-foo raw))

These types are equal:

(ctype-equal? foo_t struct-foo) => #t

It is recommended that one use symbols for names rather than strings, so that pretty-print-ctype will use names effectively.

Procedure: ctype-eqv? a b => #t|#f
Procedure: ctype-equal? a b => #t|#f

The ctype-eqv? and ctype-equal? predicates assesses equality of their arguments. Two types are considered equvalent if they have the same size, alignment, kind, and eqivalent kind-specific properties. For base types, the symbolic mtype must be equal; this includes size, integer versus float, and signed versus unsigned. For struct and union kinds, the names and types of all fields must be equal, unless, for ctype-eqv? they are pointer types with delays. The implementation of ctype-equal? is not complete: it is currently the same as ctype-eqv?.

Procedure: ctype-sel type ix [tag ...] => ((ix . ct) (ix . ct) ...)

This generate a list of (offset, type) pairs for a type. The result is used to create getters and setter for foreign machine architectures. See make-cdata-getter and make-cdata-setter.

The procedure ctype-sel should be very useful for creating fast code, with compiled procedures or macros. It computes offsets for legs of a selection (e.g., 'a 'x '* 3 'b). Note that if a selection includes dereferences (i.e., '*) then at runtime the address of the currently processed data must be fetched from memory. So, for the selection 'a 'x '* 3 'b a list of two offsets will be produced, looking like the following:

> (ctype-sel some-type 0 'b 'y '* 3 'm)
$1 = ((16 . #<ctype pointer 0x7284a17e2660>)
      (28 . #<ctype s32le 0x7284a2c1d600>))

To use the above, the application procedure could access a pointer at offset 16 in the bytevector, create a new bytevector (via pointer->bytevector) then generate a cdata via the low-level procedure %make-cdata as in

(define data (%make-cdata bv 28 (caadr $1)))

The following routines use this feature to create quick accessors.
TODO: Create a similar procedure that uses case-lambda to do both.

Procedure: make-cdata-getter sel [offset] => lambda

Genererate a procedure that given a cdata object will fetch the value at indicated by the sel, generated by ctype-sel. The procedure takes one argument: (proc data [tag ...]). Pointer dereference tags ('*') are not allowed. The optional offset argument (default 0), is used for cross target use: it is the offset of the address in the host context.

Procedure: make-cdata-setter sel [offset] => lambda

Genererate a procedure that given a cdata object will set the value at the offset given the selector, generated by ctype-sel. The procedure takes two arguments: (proc data value [tag ...]). Pointer dereference tags ('*') are not allowed. The optional offset argument (default 0), is used for cross target use: it is the offset of the address in the host context.

Procedure: ?make-cdata-accessor sel [offset]

This procedure (not final) is similar to make-cdata-getter and make-cdata-setter but the resulting procedure accepts no tag sequence. Called with one arg, it’s a getter; called with two args, it’s a setter.

> (define ct (cstruct ...))
> (define sel (ctype-sel ct 0 'b 'y '* 3 'm))
> (define *foo* (may-be-make-cdata-accessor sel))
> (define cd (make-cdata ct))
> (*foo* cd 42) ; set value
> (*foo* cd)    ; get value
$1 = 42

Working with C Function Calls

The procedure ctype->ffi is a helper for using Guile’s pointer->procedure. It generates an appropriate (system foreign) ffi-type for the given cdata type.

Procedure: ccast type data [do-check] => <cdata>

Cast a cdata object of one (pointer) type to another (pointer) type. This routine creates a new cdata object with the target type, but same bytevector and index.

> (define t1 (cstruct '((a int) (b int))))
> (define t2 (cstruct `((base ,t1) (c int))))
> (define d2 (make-cdata t2))
> (cdata-set! d2 42 'base 'a)
> (define p2 (cdata& d2))
> p2
$1 = #<cdata pointer 0x7c5b59dfac20>
> (define p1 (ccast (cpointer t1) p2))
> (cdata-ref p1 '* 'a)
$2 = 42
> (cdata-ref p2 '* 'base 'a)
$3 = 42
Procedure: arg->number arg => number

Convert an argument to numeric form for a ffi procedure call. This will reference a cdata object or pass a number through.

The above procedure was previously called unwrap-number.

Procedure: arg->pointer arg [hint] => pointer

Convert an argument to a Guile pointer for a ffi procedure call. This will reference a cdata object or pass a number through. If the argument is a function, it will attempt to convert that to a pointer via procedure->pointer if given the function pointer type hint.

The above procedure used to be called unwrap-pointer. It is also used to implement the old unwrap-array.

Procedure: ctype->ffi ctype => ffi-type

Generate a argument spec for Guile’s ffi interface. Example:

(ctype->ffi (cpointer (cbase int))) => '*

Operations on CType Kinds

The ctype kind field indicates which kind a type is and the info field provide kind-specific information for a ctype. The name field provides the type name, if provided, or #f if not.

Note that the kind procecures, cstruct, cpointer, ..., create ctype objects of different kinds. To operate on kind-specific attributes of types, requries one to fetch the info field from the ctype. From the info field, one can then operate using the fields specific to the kind info.

> (define float* (cpointer (cbase 'float)))
> double*
$1 = #<ctype pointer 0x75f3212cbed0>
> (ctype-kind float*)
$2 = pointer
> (define float*-info (ctype-info float*))
> (cpointer-type float*-info)
$3 = #<ctype f32le 0x75f323f8ec90>
> (cpointer-mtype float*-info)
$4 = u64le

The cpointer-mtype procedure lets us know that pointers are stored as unsigned 64 bit (little endian) integers.

The info field for base types is special. Since the only kind-specific type information for a base type is the machine type the info field provides that. Consider the following example.

> (define foo-t (name-ctype 'foo-t (cbase 'int)))
> (ctype-name foo-t)
$1 = foo-t
> (ctype-kind foo-t)
$2 = base
> (ctype-info foo-t)
$3 = s32le

Structs are more complex. A struct type object includes a member cstruct-fields, list of it’s fields (cfield), and a member cstruct-select, a procedure to lookup fields based on symbolic or string name. In addition, with no arg, the select procedure will return a symbolic list of member names.

> (define bar-s
    (cstruct `((a int) (b float) (#f ,(cstruct '(x int) (y int))))))
> (define bar-s-info (ctype-info bar-s))
> (cstruct-fields bar-s-info)
$4 = (#<<cfield> name: a type: #<ctype s32le 0x75f323f8ecf0> offset: 0>
      #<<cfield> name: b type: #<ctype f32le 0x75f323f8ec90> offset: 4>
      #<<cfield> name: #f type: #<ctype struct 0x75f32181a570> offset: 8>)
> (define x-fld ((cstruct-select bar-s-info) 'x))
> x-fld
$5 = #<<cfield> name: x type: #<ctype s32le 0x75f323f8ecf0> offset: 8>
> (cfield-offset x-fld)
$6 = 8

Note that the selection of the x component deals with a field which is an anonymous struct. The struct bar-s would look like the following in C:

struct bar_s {
  int a;
  float b;
  struct {
    int x;
    int y;
  };
};

And just for kicks

> (define sa
    (cstruct `((a int) (b double) (#f ,(cstruct '((x short) (y int)))))))
> (define sp
    (cstruct `((a int) (b double) (#f ,(cstruct '((x short) (y int))))) #t))

> (pretty-print-ctype sa)
(cstruct
  ((a s32le #:offset 0)
   (b f64le #:offset 8)
   (#f
    (cstruct
      ((x s16le #:offset 0) (y s32le #:offset 4)))
    #:offset
    16)))
> (pretty-print-ctype sp)
(cstruct
  ((a s32le #:offset 0)
   (b f64le #:offset 4)
   (#f
    (cstruct
      ((x s16le #:offset 0) (y s32le #:offset 4)))
    #:offset
    12)))

Note the difference in offsets: sa is aligned and sp is packed. The offsets reported for anonymous structs can be misleading. To get the right offsets use select:

> (define tia (ctype-info sa))
> (define tip (ctype-info sp))
> ((cstruct-select tia) 'y)
$8 = 20
> ((cstruct-select tip) 'y)
$9 = 16

Enum Conversions

The enum ctype provides procedures to convert between the numeric and symbolic parts of each enum entry. Currently, the cdata module does not provide enum wrapper and unwrapper routines. However, the FFI Helper will create these. The wrapper, converting a number to a symbol, and unwrapper, converting a symbol to a number, can be generated as the following example demonstrates.

> (define color_t (cenum '((RED #xf00) (GREEN #x0f0) (BLUE #x00f))))
> (define color_t-info (ctype-info color_t))
> (define wrap-color_t (cenum-symf color_t-info))
> (define unwrap-color_t (cenum-numf color_t-info))
> (wrap-color_t #xf00)
$1 = RED
> (unwrap-color_t 'GREEN)
$2 = 240

Handling Machine Architectures

One of the author’s main motivations for writing CData was to be able to work with cross-target machine architectures. This is pretty cool. Just to let you know what’s going on, consider the following:

> (use-modules (nyacc foreign arch-info))
> (define tx64 (with-arch "x86_64" (cstruct '((a int) (b long)))))
> (define tr64 (with-arch "riscv64" (cstruct '((a int) (b long)))))
> (define tr32 (with-arch "riscv32" (cstruct '((a int) (b long)))))
> (define sp32 (with-arch "sparc" (cstruct '((a int) (b long)))))
> (ctype-equal? tx64 tr64)
$1 = #t
> (ctype-equal? tr64 tr32)
$1 = #f
> (ctype-equal? tr32 ts32)
$1 = #f
> (pretty-print-ctype tx64)
(cstruct ((a s32le #:offset 0) (b s64le #:offset 8)))
> (pretty-print-ctype tr64)
(cstruct ((a s32le #:offset 0) (b s64le #:offset 8)))
> (pretty-print-ctype tr32)
(cstruct ((a s32le #:offset 0) (b s32le #:offset 4)))
> (pretty-print-ctype ts32)
(cstruct ((a s32be #:offset 0) (b s32be #:offset 4)))

Rocks, right?

arch-info maps base C types to machine types (e.g., i32le) and alignment for the given machine architecture. To get sizes, it’s a simple matter of mapping machine types to sizes.

The arch-info module currently has size and alignment information for the for the following: aarch64, avr, i383, i686, powerpc32, powerpc64, ppc32, ppc64, riscv32, riscv64, sparc32, sparc64, x86_64.

CData Utilities

Procedure: pretty-print-ctype type [port]

Converts type to a literal tree and uses Guile’s pretty-print function to display it. The default port is the current output port.

Procedure: cdata-kind data

Return the kind of data: pointer, base, struct, ...

Miscellaneous

More to come.

Base Types

Base types are the following, along with the pseudo-type void

void* char signed-char unsigned-char short unsigned-short
float double int unsigned long unsigned-long long-long unsigned-long-long
int8_t uint8_t int16_t uint16_t int32_t uint32_t int64_t uint64_t
size_t ssize_t ptrdiff_t intptr_t uintptr_t _Bool bool
wchar_t char16_t char32_t long-double _Float16 _Float128
float-_Complex double-_Complex __int128 unsigned-__int128

When it come to multi-named types like “long long int” cdata wants to see the shorted variant, here “long long”. Likewise, use “unsigned” for “unsigned int”, “short” for “short int”, etc.

Other Procedures

More to come.

Guile FFI Support

More to come.

Procedure: ctype->ffi-type type

Convert a ctype to the (integer) code for the associated FFI type.

Copyright (C) 2024 – Matthew Wette.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included with the distribution as COPYING.DOC.

References

  1. Guile Manual: https://www.gnu.org/software/guile/manual
  2. Scheme Bytestructures: https://github.com/TaylanUB/scheme-bytestructures