Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

Abstract

High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information directly, such as those based on point clouds, offer a stronger geometric prior over purely image-based ones, yet their performance remains highly task-dependent. We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features. We thus propose to map point clouds from Cartesian space into high-dimensional Fourier space, effectively equipping the point cloud encoder with direct access to high-frequency features. We experimentally validate the use of Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks and on a real robot setup. Despite their simplicity, we find that Fourier features provide significant benefits across diverse encoder architectures and benchmarks and are robust across hyperparameters. Our results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning.

From Pointclouds to Fourier Features

Adding a Fourier feature mapping from Cartesian coordinates into a higher-dimensional feature space improves performance for any point cloud encoder used for diffusion imitation learning. For high-precision policies, the network must learn to condition on fine details in the scene geometry to e.g. decide whether to insert the leg into the slot or reposition it, yet neural networks learn the high frequency components of the target function only slowly, if at all. While neighbouring points in the scene have very similar Cartesian features, the high-dimensional Fourier features allow them to easily be distinguished.

Point cloud coordinates mapped to Fourier features.

Framework Overview

Given a pointcloud, we first map each point and its neighbourhood to Fourier feature space. This amplifies subtle geometric differences in each neighborhood. The tokenizer extracts and aggregates features for each neighborhood to produce a set of tokens which are then forwarded to a goal-conditioned diffusion policy to denoise the next chunk of actions.

Architecture overview for the Fourier feature diffusion policy.

Main Results

Across diverse point cloud encoders and benchmarks, Fourier features consistently improve task success by making geometric detail easier for the policy to exploit. The largest gains appear on the harder RoboCasa and real-world tasks, while ManiSkill3 is comparatively closer to saturation.

On RoboCasa, Fourier features raise PointPatch from 13% to 34% average success, with large per-task gains such as CloseDrawer from 34% to 72% and TurnOffSinkFaucet from 28% to 63%. Similar trends hold across DP3 and PCM, while real-world performance for PointPatch + RGB improves from 14.8% to 40.23%, showing that the geometric benefit transfers beyond simulation.

Analysis and Ablations

Fourier features matter more when the observation preserves richer geometry. As the point cloud is heavily downsampled to around 2k points, the gap narrows sharply, while the Cartesian baseline barely changes, indicating that it was not using the removed fine detail in the first place.

The advantage is not limited to the most delicate alignment tasks. Policies with Fourier features with in the presence of extreme noise still average 24% success versus policies without Fourier features or noise, which only average 13%. This suggests that even when fine geometric cues are removed, there is an additional benefit through improved learning dynamics.

Graph Fourier spectral sensitivity analysis

Spectral analysis shows that training increases sensitivity across the frequency spectrum, and Fourier features add several more orders of magnitude on top of the baseline. The effect is not only at the highest frequencies, which supports the claim that Fourier features improve both geometric fidelity and overall optimization behavior.

Performance is robust across a broad range of wavelength settings. The default log-spaced configuration with 16 bands and a 2 cm minimum wavelength works well, but the broader conclusion is that Fourier features in diffusion imitation learning are relatively insensitive to this hyperparameter choice.

Real-World Experiments by Task

Drawer

With FF

Without FF

With FF

Without FF

Fold

With FF

Without FF

With FF

Without FF

Stack

With FF

Without FF

With FF

Without FF

Arrange

With FF

Without FF

With FF

Without FF

RoboCasa Experiments by Task

Coffee

With FF

Without FF

Microwave

With FF

Without FF

Sink

With FF

Without FF

Oven

With FF

Without FF

BibTeX

If you find this work useful, please cite:

@inproceedings{
    fourier-il,
    title={Fourier Features Let Agents Learn High Precision Policies with Imitation Learning},
    author={Bal{\'a}zs Gyenes and Emiliyan Gospodinov and Jan Frieling and Enrico Krohmer and Xiaogang Jia and Nicolas Schreiber and Niklas Freymuth and Gerhard Neumann},
    booktitle={Forty-third International Conference on Machine Learning},
    year={2026},
    url={https://openreview.net/forum?id=y03xxeUkgN}
}