
Semantic Segmentation vs. 3D Bounding Boxes: Precision or Efficiency?
29/04/2026Ghost Points and Occlusions: Solving the Noise Problem
7 min read
In this article, you'll learn about the challenges faced in collecting 3D point cloud data. You'll also learn why HITL annotation can help mitigate these.
To an untrained eye or an unguided algorithm, a raw point cloud can look like a blizzard of digital static. This is the Noise Problem, and it’s where the difference between a functional ML model and a failed one is decided.
When Sensors Hallucinate: Ghost Points and Reflections
Ghost points are the phantoms of the 3D world. They occur when the laser pulse from a LiDAR sensor is reflected or refracted by something other than a solid object.
- Environmental Noise: Raindrops, snowflakes, or thick dust can reflect enough light to register as a "point" in space.
- Sensor Artifacts: Highly reflective surfaces—like a chrome bumper or a wet road—can cause "multipath reflections," where the laser bounces off several surfaces before returning, placing a "ghost" object several meters underground or hanging in mid-air.
For an automated system, these points are indistinguishable from actual obstacles. For a human annotator, they are clearly anomalies that need to be filtered out to ensure the training data remains pure.
The Two Pillars of Spatial Complexity: Occlusion and Sparsity
Two other physical realities of LiDAR data make annotation a massive challenge: Occlusion and Sparsity.
1. The Shadow of Occlusion
LiDAR is a line-of-sight technology. If a pedestrian walks behind a parked van, the sensor cannot see through the van. It creates a "shadow" in the point cloud where no data exists.
The Challenge: How do you label an object you can only see half of? A human annotator uses spatial reasoning to "complete" the object, placing a bounding box where the pedestrian must be, even if the points are missing.
2. The Law of Sparsity
As an object moves further away from the sensor, the density of the points hitting it drops significantly. This is roughly governed by an inverse relationship; as distance increases, the angular resolution of the laser leads to fewer "hits" on the target.
A car 5 meters away might be represented by 5,000 points. That same car 50 meters away might only be represented by 5 points.
The "Why Manual?" Angle: Context vs. Calculation
This is where the Auto-Labeling dream hits a wall.
A machine looking at a sparse cluster of 5 points at the edge of a frame sees noise. It sees a random fluctuation of data that it likely discards or misclassifies as a fence post or a sign.
A human looks at those same 5 points and sees a pedestrian. Why? Because humans don't just calculate; we contextualize. We see:
- Height: The vertical distribution of those 5 points matches a human profile.
- Movement: By looking at the previous frame (temporal consistency), we see those points moving at a walking pace.
- Environment: We see the points are located on a sidewalk, not in the middle of a flowerbed.
Manual Human-in-the-Loop (HITL) annotation is the only way to transform noise into knowledge. We provide the intuition that allows your model to eventually learn these subtle patterns for itself.
Stop Training on Static
If your data is noisy, your model will be hesitant. If your data is incorrectly filtered, your model will be blind. We specialize in the high-intensity manual cleanup required to make sense of the messiest LiDAR datasets.




