neRF

PyTorch implementations of Neural Radiance Fields variants for view synthesis.

Implementations

nerf/ - Original NeRF architecture

Full MLP network with positional encoding for 3D coordinates and view directions. Predicts density and view-dependent color at each point, then uses volume rendering to composite rays into pixels. Produces high-quality novel view synthesis.

Paper: https://arxiv.org/abs/2003.08934
fastneRF/ - Factorized NeRF for fast inference

Decomposes the radiance field into separate position (Fpos) and direction (Fdir) networks. Position network outputs density + UV basis weights; direction network outputs mixing coefficients. Enables 3000x faster inference via caching, but produces lower quality images.

Paper: https://arxiv.org/abs/2103.10380

Why FastNeRF has lower quality:
- D=8 bottleneck: Only 8 basis functions to represent view-dependent radiance, limiting expressiveness
- Smaller direction network: 128 hidden dim, 3 layers vs NeRF's deeper architecture
- Factorization trade-off: Separating position/direction networks reduces capacity for modeling complex view-dependent effects
FastNeRF prioritizes real-time inference (200fps) over image quality - this is the expected trade-off from the paper.

kiloneRF/ - Grid of thousands of tiny MLPs

Partitions the scene into an N×N×N grid where each cell has its own tiny MLP. Points are routed to their cell's network, enabling massive parallelism. Designed for real-time rendering with custom CUDA kernels.

Paper: https://arxiv.org/abs/2103.13744

Why KiloNeRF produces poor quality (and is slow):

The current implementation is fundamentally incomplete. The paper states: "using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality."

Issue	Current Implementation	Paper's Approach
Training	Direct from RGB images	Teacher-student distillation from pre-trained NeRF
Architecture	32-dim tiny MLPs learning from scratch	Tiny MLPs distilled from 256-dim teacher
Grid boundaries	Hard boundaries, no interpolation	Occupancy-aware sampling
Performance	Python indexed matmul (slow)	Custom CUDA kernels (fast)

Without distillation, each tiny MLP only sees sparse samples from its grid cell and cannot learn a good representation. The blocky artifacts are from hard cell boundaries. The slowness is because KiloNeRF requires custom CUDA kernels to achieve the claimed 3 orders of magnitude speedup.

inverseRendering/ - Fourier Feature NeRF

Uses Random Fourier Features instead of deterministic positional encoding. Maps inputs through sin/cos(x @ B) where B is a random Gaussian matrix, making the neural tangent kernel stationary with tunable bandwidth. No view-direction dependency.

Paper: https://arxiv.org/abs/2006.10739

Why quality is poor despite fast training:
- No view-direction input: Cannot model view-dependent effects (specular, reflections)
- Random encoding: The random matrix B may not be optimal; deterministic powers-of-2 encoding is better suited for multi-scale scenes
- Simpler architecture: 4-layer MLP vs NeRF's 8-layer with skip connections
- No hierarchical sampling: Uses uniform sampling instead of coarse-to-fine
The paper's contribution is theoretical (NTK analysis) - the Fourier feature insight was incorporated into NeRF's positional encoding design, not meant as a standalone replacement.
nerf-minus-minus/ - NeRF without known camera parameters

Jointly optimizes camera intrinsics (focal length), extrinsics (6-DoF poses), and the NeRF model through photometric loss. Removes the need for COLMAP/SfM preprocessing.

Paper: https://arxiv.org/abs/2102.07064

Limitation: Forward-facing scenes only.

The joint optimization can recover accurate cameras for forward-facing scenes where cameras share a roughly consistent viewing direction. For 360-degree scenes (like tiny_nerf), camera pose estimation from scratch fails due to too many degrees of freedom and local minima. Use ground truth cameras for 360-degree scenes.
freeneRF/ - Few-shot NeRF with frequency regularization

Two "free lunch" techniques for few-shot neural rendering: (1) progressively unmask positional encoding frequencies during training, and (2) penalize near-camera density to prevent floaters. Achieves state-of-the-art few-shot performance with minimal code changes.

Paper: https://arxiv.org/abs/2303.07418

Key insight: Limit high-frequency encoding early in training to force learning robust low-frequency structure first, preventing overfitting when training views are scarce.
plenOctrees/ - Spherical Harmonic NeRF for real-time rendering

Network outputs spherical harmonic (SH) coefficients instead of view-dependent RGB. Removes viewing direction as network input - view dependence is encoded in SH coefficients that are evaluated at render time. Enables pre-tabulation into an octree for 150+ FPS rendering.

Paper: https://arxiv.org/abs/2103.14024

Key insight: Factorize view-dependent appearance into position-dependent SH coefficients (cacheable) and direction-dependent SH basis functions (cheap closed-form). This implementation covers the NeRF-SH training phase only.
kplanes/ - Explicit radiance fields with feature planes

Uses 3 axis-aligned 2D feature planes (XY, YZ, XZ) instead of an MLP. Features are sampled via bilinear interpolation and combined via Hadamard product before decoding to density/color. Achieves 1000x compression over a full 4D grid with fast pure-PyTorch optimization.

Paper: https://arxiv.org/abs/2301.10241

Key insight: Factorize 3D space into 2D planes. Easy to extend to d=4 (dynamic scenes) by adding time-dependent planes.
infoneRF/ - Few-shot NeRF with ray entropy regularization

Standard NeRF with an information-theoretic regularizer: minimizes entropy of the normalized alpha weights along each ray. This penalizes spread-out density (floaters) and encourages compact surface representations. Uses only 4 training images.

Paper: https://arxiv.org/abs/2112.15399

Key insight: H(p) = -∑ p_k log(p_k) where p_k = α_k / ∑ α_k. Minimizing ray entropy makes density distributions peak sharply at surfaces, preventing floaters in few-shot settings.
plenOxels/ - Plenoxels: Radiance Fields without Neural Networks

Dense 3D voxel grid storing density and spherical harmonic (SH) coefficients. Trilinear interpolation for smooth sampling, SH degree-2 for view-dependent color. Pure gradient optimization — no MLP at all. 58.7M parameters (128³ × 28 channels). Faster training than NeRF but blockier quality due to fixed grid resolution.

Paper: https://arxiv.org/abs/2112.05131

Why quality is lower than NeRF:

Dense grid wastes capacity on empty space (paper uses sparse octree)
Fixed resolution cannot adapt to scene complexity
No hierarchical sampling
No TV regularization or coarse-to-fine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neRF

Implementations

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
fastneRF		fastneRF
freeneRF		freeneRF
infoneRF		infoneRF
instantNGP		instantNGP
inverseRendering		inverseRendering
kiloneRF		kiloneRF
kplanes		kplanes
learnedInitializations		learnedInitializations
nerf-minus-minus		nerf-minus-minus
nerf		nerf
plenOctrees		plenOctrees
plenOxels		plenOxels
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

neRF

Implementations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages