OpenBSD File-system how-to

Node: lower layer structures, Next: lower layer code, Up: Implementation

lower layer structures

Let's first explain the structures you will have to use. As you have see, you must have a vector of vfs operations. This vector is defined in sys/mount.h. This is a bit surprising at first because it's clearly not the place where it would logically be found, but the definition is used in the code of mount utilities. So I think they have put this code here for facilities instead of sys/vfsops.h for example. The structure is as follows. Comments included are not in the original file but here to explain when these functions are called.

/*
 * Operations supported on mounted file system.
 */

struct vfsops {
// this is simply the function called when the filesystem is mount
// by the sys_mount syscall
	int	(*vfs_mount)(struct mount *mp, const char *path,
				    void *data,
				    struct nameidata *ndp, struct proc
                                    *p);
// nothing to do in most cases but was designed to make the filesystem
// operational
	int	(*vfs_start)(struct mount *mp, int flags,
				    struct proc *p);
// inverse of mount, called by sys_umount
	int	(*vfs_unmount)(struct mount *mp, int mntflags,
				    struct proc *p);
// you must return the root inode of the filesystem
	int	(*vfs_root)(struct mount *mp, struct vnode **vpp);
// here you must do operations in relation with the quota
// make quota on, off, setting quotas, getquota, ...
	int	(*vfs_quotactl)(struct mount *mp, int cmds, uid_t uid,
				    caddr_t arg, struct proc *p);
// get the filesystem statistics
	int	(*vfs_statfs)(struct mount *mp, struct statfs *sbp,
				    struct proc *p);
// sync the filesystem, called by sys_sync
// copy the memory buffers to the storage master
// this operation must be done when halting computer
	int	(*vfs_sync)(struct mount *mp, int waitfor,
				    struct ucred *cred, struct proc
                                    *p);
// look up at a dinode number and find the proper vnode
	int	(*vfs_vget)(struct mount *mp, ino_t ino,
				    struct vnode **vpp);
// file handle to vnode
// must return the vnode corresponding to the file handle
	int	(*vfs_fhtovp)(struct mount *mp, struct fid *fhp,
				     struct vnode **vpp);
// vnode pointer to file handle
// the invert
	int	(*vfs_vptofh)(struct vnode *vp, struct fid *fhp);
// initialisation of the filesystem
// called at boot up. you must put here global initialisations of the
// filesystem if needed.
	int	(*vfs_init)(struct vfsconf *);
// if you have sysctl controling the filesystem operations
// or working modes the add them here
	int     (*vfs_sysctl)(int *, u_int, void *, size_t *, void *,
				     size_t, struct proc *);
// in case of an exported filesystem (probably nfs) you check here if
// the remote user has the rights and allow the filesystem export
	int	(*vfs_checkexp)(struct mount *mp, struct mbuf *nam,
				    int *extflagsp, 
                                    struct ucred **credanonp);
// manage acls on filesystem (extended attributes)
// this function enables, starts, stops and disables extended attributes
// features of the filesystem
	int     (*vfs_extattrctl)(struct mount *mp, int cmd,
				    struct vnode *filename_vp,
				    int attrnamespace, const char *attrname,
				    struct proc *p);
};

Even if the comments are clear we will explain what happens and when these functions are useful.

Let's began with the beginning, vfs_init() is called at boot up and is designed to be used for global filesystem initialisations. The work done here is global to all mounted filesystem and you must not put code related to a particular mount instance of you filesystem here. Like in the ufs implementation you can ensure that the work is done only one time by putting a static int and checking it's value at the beginning. The ufs after initialise global hash structures and call ufs_quota_init() that does the same job for quota subsystem.

After that the normal use of a filesystem is to mount a device. The sys_mount() function basically does 3 things; verify that the wanted mount device is a block device and that it is not already mounted (OpenBSD does not support multiple mounts of the same device); initialise per mount point structures; verify that the pointed device is a good filesystem (reading superbloc, verify magic numbers, verify that the filesystem is clean, additional work depending of your filesystem and assure the next operations that are a correct filesystem). Once you have properly returned vfs upper layer code supposes that you have all the information needed to read and write to your filesystem.

Others functions are very easy to write with the above comments most of the time. If you don't understand what you need to do the more complete filesystem model in OpenBSD is ufs/ffs. In most cases it is very difficult to understand because it is a huge piece of code. I personally prefer the ext2fs implementation.
The other type of operation is the vnops (vnode operations). All these operations are related to vnode. The declaration of these vnops is a little bit more tricky than vfsops because for speed it's an association structure composed of two pointers. The operations are described in the file vnode.h. The first structure is vnodeopv_desc (vnode operation vector description) :

/*
 * This structure is used to configure the new vnodeops vector.
 */
struct vnodeopv_entry_desc {
	struct vnodeop_desc *opve_op;   /* which operation this is */
	int (*opve_impl)(void *);	/* code implementing this 
                                        operation */
};
struct vnodeopv_desc {
			/* ptr to the ptr to the vector where op should go */
	int (***opv_desc_vector_p)(void *);
	struct vnodeopv_entry_desc *opv_desc_ops;   /* null terminated 
                                                       list */
};

This structure is composed of a struct that lists all the operations we can do on vnodes. The list is an association of a vnodeop_desc (that describes one type of vnode operation) and a function pointer to the operation associated. This vector must be filled by each filesystem and all the operation supported must be filled in.

/*
 * This structure describes the vnode operation taking place.
 */
struct vnodeop_desc {
	int	vdesc_offset;		/* offset in vector--first for 
                                           speed */
	char    *vdesc_name;		/* a readable name for debugging */
	int	vdesc_flags;		/* VDESC_* flags */

	/*
	 * These ops are used by bypass routines to map and locate arguments.
	 * Creds and procs are not needed in bypass routines, but sometimes
	 * they are useful to (for example) to transport layers.
	 * Nameidata is useful because it has a cred in it.
	 */
	int	*vdesc_vp_offsets;	/* list ended by VDESC_NO_OFFSET */
	int	vdesc_vpp_offset;	/* return vpp location */
	int	vdesc_cred_offset;	/* cred location, if any */
	int	vdesc_proc_offset;	/* proc location, if any */
	int	vdesc_componentname_offset; /* if any */
	/*
	 * Finally, we've got a list of private data (about each operation)
	 * for each transport layer.  (Support to manage this list is not
	 * yet part of BSD.)
	 */
	caddr_t	*vdesc_transports;
};

All the base operations needed are already defined in vnode_if.h and vnode_if.c as these are common for all the filesystem. Only the associated pointer is different depending on the implementation. This example shows the description of the lookup operation :

int vop_lookup_vp_offsets[] = {
	VOPARG_OFFSETOF(struct vop_lookup_args,a_dvp),
	VDESC_NO_OFFSET
};
struct vnodeop_desc vop_lookup_desc = {
	0,
	"vop_lookup",
	0,
	vop_lookup_vp_offsets,
	VOPARG_OFFSETOF(struct vop_lookup_args, a_vpp),
	VDESC_NO_OFFSET,
	VDESC_NO_OFFSET,
	VOPARG_OFFSETOF(struct vop_lookup_args, a_cnp),
	NULL,
};

As all these common operations are already defined you don't really need to understand how they are defined. But if you make a really new filesystem with new concepts and vnode operations you must create the corresponding syscalls, vop_, and integrate them in upper layer vfs code. Therefore the important part of vnops declarations is the struct the system will put to your operations. Because all the operations take a void*, it's necessary to get a different structure for all the functions. The following example is the corresponding structure for lookup:

struct vop_lookup_args {
	struct vnodeop_desc *a_desc;
	struct vnode *a_dvp;
	struct vnode **a_vpp;
	struct componentname *a_cnp;
};

You now have all the concepts and internal structures needed to make your new filesystem code.

The description of all vnops can be easily found in ufs,ffs code so it's not necessary to put it here.