Automated Perspective Correction
Fully automated perspective correction pipeline for scanned documents and cards using computer vision techniques
Overview
- Modern scanning applications, such as CamScanner or built-in iPad scanning tools, employ perspective correction to automatically detect and straighten skewed documents. This project implements a fully automated perspective correction pipeline using computer vision techniques, eliminating the need for manual corner point selection.
- Key Features: Automated edge detection and contour extraction, Quadrilateral shape detection and validation, Perspective transformation via homography, Configurable parameters via YAML configuration.
To run:
make install
make run # uses default card.jpg as input
Pipeline Architecture
The system follows a structured pipeline from image input to corrected output:
- Image Preprocessing – Load and convert to grayscale
- Edge Detection – Canny edge detection with configurable thresholds
- Contour Extraction – Find contours and compute convex hulls
- Intersection Detection – Calculate line intersections forming quadrilateral corners
- Shape Validation – Verify detected points form valid quadrilateral using
approxPolyDP - Corner Sorting – Order corners (top-left, top-right, bottom-right, bottom-left) using centroid
- Perspective Transformation – Apply homography matrix to correct perspective
- Image Warping – Transform image to aligned rectangle
Core Implementation
1. Shape Descriptor Class
The ShapeDescriptor dataclass manages coordinate storage, centroid computation, and corner detection:
@dataclass
class ShapeDescriptor:
coord: List[List[float]] = field(default_factory=list)
size: int = 0
centroidx: float = 0.0
centroidy: float = 0.0
sumx: float = 0.0
sumy: float = 0.0
corners: List[Tuple[float, float]] = field(default_factory=list)
shape_type: ImageObjects = field(default=ImageObjects.QUADRILATERAL)
def calculate_centroid(self):
if not self.coord:
raise ValueError("No coordinates available")
x = self.sumx / self.size
y = self.sumy / self.size
return [x, y]
def append_coord(self, x: float, y: float):
self.coord.append([x, y])
self.sumx += x
self.sumy += y
self.size += 1
Key Methods:
-
append_coord(): Accumulates coordinates and maintains running sums for efficient centroid calculation -
calculate_centroid(): Computes geometric center for corner sorting -
find_intersection(): Calculates intersection point of two line segments using line-line intersection formula
2. Corner Detection and Sorting
The calculate_corners() method identifies and orders quadrilateral corners:
def calculate_corners(self, centroid: Tuple) -> list:
"""Calculate top-right, top-left, bottom-right, bottom-left points."""
top_points = []
bottom_points = []
cx, cy = centroid[0], centroid[1]
# Separate points above and below centroid
for coord in self.coord:
x, y = coord[0][0], coord[0][1]
if y < cy:
top_points.append(coord)
else:
bottom_points.append(coord)
# Sort corners: top-left, top-right, bottom-right, bottom-left
top_left = min(top_points)
top_right = max(top_points)
bottom_left = min(bottom_points)
bottom_right = max(bottom_points)
self.corners = [top_left, top_right, bottom_right, bottom_left]
# Calculate destination rectangle dimensions
xmin = min([each_arr[0][0] for each_arr in self.corners])
ymin = min([each_arr[0][1] for each_arr in self.corners])
xmax = max([each_arr[0][0] for each_arr in self.corners])
ymax = max([each_arr[0][1] for each_arr in self.corners])
width = abs(int(xmax - xmin))
height = abs(int(ymax - ymin))
dest = np.array(
[[0, 0], [width, 0], [width, height], [0, height]],
dtype=np.float32
)
return self.corners, dest, width, height
This method uses the centroid to distinguish top and bottom edges, then sorts corners to match the destination rectangle format required for perspective transformation.
3. Shape Validation
The system validates detected shapes using OpenCV’s approxPolyDP:
def shape_approx_check(
self, shape_check: ImageObjects.QUADRILATERAL, epsilon_factor: float = 0.1
) -> bool:
"""Check if points form expected shape using Douglas-Peucker approximation."""
coords_array = np.array(self.coord, dtype=np.float32).reshape(self.size, 1, 2)
perimeter = cv2.arcLength(coords_array, True)
approx = cv2.approxPolyDP(coords_array, epsilon_factor * perimeter, True)
self.coord = approx.tolist() # Update to simplified coordinates
return len(approx) == self.shape_type.value
The epsilon_factor parameter controls approximation accuracy—smaller values yield more precise shapes but may miss valid quadrilaterals with slight irregularities.
4. Perspective Transformation Class
The Perspective class orchestrates the entire correction pipeline:
@dataclass
class Perspective:
source: str = field(metadata={"desc": "Full path of image to transform."})
destination: np.ndarray = field(init=False)
def __post_init__(self):
"""Initialize image and set destination points."""
self.img = cv2.imread(self.source, 0) # Read as grayscale
height, width = self.img.shape[:2]
self.destination = np.asarray(
[[0, 0], [width, 0], [width, height], [0, height]],
dtype=np.float32
)
Edge Detection:
def find_edges(self, lower_thresh: int, upper_thresh: int) -> np.ndarray:
"""Detect edges using Canny algorithm after binary thresholding."""
_, binary = cv2.threshold(self.img, 127, 255, cv2.THRESH_BINARY)
return cv2.Canny(binary, lower_thresh, upper_thresh)
Contour Extraction:
def find_contours(self, contour_area: int, edges: np.ndarray) -> ShapeDescriptor:
"""Find contours and extract intersection points from convex hulls."""
hull_arr = []
contours, hierarchy = cv2.findContours(
edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
# Filter by area and hierarchy (external contours only)
for i, cnt in enumerate(contours):
if cv2.contourArea(cnt) > contour_area and hierarchy[0, i, 3] == -1:
hull_arr.append(cv2.convexHull(cnt, returnPoints=True))
if not hull_arr:
raise ValueError("No valid contours found.")
# Calculate intersections between consecutive hull edges
coord_obj = ShapeDescriptor()
for hull in hull_arr:
for j in range(len(hull) - 3):
intersection = coord_obj.find_intersection(
hull[j][0], hull[j + 1][0],
hull[j + 2][0], hull[j + 3][0]
)
if intersection:
coord_obj.append_coord(*intersection)
return coord_obj
Perspective Transformation:
def transform(self, corners: List[Tuple[float, float]], dest, w, h) -> np.ndarray:
"""Apply perspective transformation using homography matrix."""
corners_array = np.array(corners, dtype=np.float32)
transformation_matrix = cv2.getPerspectiveTransform(corners_array, dest)
return cv2.warpPerspective(self.img, transformation_matrix, (w, h))
5. Main Pipeline
The find_transformed_image() method orchestrates the complete pipeline:
def find_transformed_image(self, config):
"""Execute full perspective correction pipeline from config."""
lower, upper = (
config["canny_thresholds"]["lower"],
config["canny_thresholds"]["upper"],
)
area = config["min_contour_area"]
shape, epsil = (
config["shape_detection"]["target_shape"],
config["shape_detection"]["epsilon_factor"],
)
# Pipeline execution
edges = self.find_edges(lower, upper)
c = self.find_contours(area, edges)
if c.shape_approx_check(shape, epsil): # Validate quadrilateral
centroid = c.calculate_centroid()
corners, dest, w, h = c.calculate_corners(centroid)
transformed_img = self.transform(corners, dest, w, h)
cv2.imwrite("result.jpg", transformed_img)
6. Configuration Management
The system uses YAML configuration for parameter tuning:
def load_config(config_path: str) -> dict:
with open(config_path, "r") as file:
return yaml.safe_load(file)
Example config.yaml:
canny_thresholds:
lower: 50
upper: 150
min_contour_area: 1000
shape_detection:
target_shape: 4 # QUADRILATERAL
epsilon_factor: 0.1
Usage Example
if __name__ == "__main__":
import os
config_path = "config.yaml"
config = load_config(config_path)
file_path = "card.jpg"
persp = Perspective(source=os.path.join(os.path.dirname(__file__), file_path))
transformed_img = persp.find_transformed_image(config)
Technical Highlights
Key Algorithms:
- Canny Edge Detection: Multi-stage algorithm for robust edge extraction
- Convex Hull: Simplifies contour representation while preserving shape
- Douglas-Peucker Approximation: Reduces polygon complexity for shape validation
- Homography Transformation: 3×3 matrix mapping for perspective correction
Design Patterns:
- Dataclass-based architecture for clean data structures
- Separation of concerns:
ShapeDescriptorhandles geometry,Perspectivehandles transformation - Configurable parameters via YAML for easy tuning
Applications
This automated perspective correction system enables:
- Document Scanning: Automated alignment for OCR preprocessing
- Card Recognition: ID card and business card digitization
- AR Registration: Marker detection and alignment
- Robotic Vision: Object pose estimation and alignment
Future Enhancements
- Deep learning-based quadrilateral detection for improved robustness
- Multi-shape support (pentagons, hexagons) via
ImageObjectsenum - Real-time processing pipeline for video streams
- Adaptive threshold selection based on image characteristics