Automated Perspective Correction | Sraddhanjali (Sam) Acharya

Overview

Modern scanning applications, such as CamScanner or built-in iPad scanning tools, employ perspective correction to automatically detect and straighten skewed documents. This project implements a fully automated perspective correction pipeline using computer vision techniques, eliminating the need for manual corner point selection.
Key Features: Automated edge detection and contour extraction, Quadrilateral shape detection and validation, Perspective transformation via homography, Configurable parameters via YAML configuration.

To run:

make install
make run  # uses default card.jpg as input

Pipeline Architecture

The system follows a structured pipeline from image input to corrected output:

Image Preprocessing – Load and convert to grayscale
Edge Detection – Canny edge detection with configurable thresholds
Contour Extraction – Find contours and compute convex hulls
Intersection Detection – Calculate line intersections forming quadrilateral corners
Shape Validation – Verify detected points form valid quadrilateral using approxPolyDP
Corner Sorting – Order corners (top-left, top-right, bottom-right, bottom-left) using centroid
Perspective Transformation – Apply homography matrix to correct perspective
Image Warping – Transform image to aligned rectangle

Core Implementation

1. Shape Descriptor Class

The ShapeDescriptor dataclass manages coordinate storage, centroid computation, and corner detection:

@dataclass
class ShapeDescriptor:
    coord: List[List[float]] = field(default_factory=list)
    size: int = 0
    centroidx: float = 0.0
    centroidy: float = 0.0
    sumx: float = 0.0
    sumy: float = 0.0
    corners: List[Tuple[float, float]] = field(default_factory=list)
    shape_type: ImageObjects = field(default=ImageObjects.QUADRILATERAL)
    
    def calculate_centroid(self):
        if not self.coord:
            raise ValueError("No coordinates available")
        x = self.sumx / self.size
        y = self.sumy / self.size
        return [x, y]
    
    def append_coord(self, x: float, y: float):
        self.coord.append([x, y])
        self.sumx += x
        self.sumy += y
        self.size += 1

Key Methods:

append_coord(): Accumulates coordinates and maintains running sums for efficient centroid calculation
calculate_centroid(): Computes geometric center for corner sorting
find_intersection(): Calculates intersection point of two line segments using line-line intersection formula

2. Corner Detection and Sorting

The calculate_corners() method identifies and orders quadrilateral corners:

def calculate_corners(self, centroid: Tuple) -> list:
    """Calculate top-right, top-left, bottom-right, bottom-left points."""
    top_points = []
    bottom_points = []
    cx, cy = centroid[0], centroid[1]
    
    # Separate points above and below centroid
    for coord in self.coord:
        x, y = coord[0][0], coord[0][1]
        if y < cy:
            top_points.append(coord)
        else:
            bottom_points.append(coord)
    
    # Sort corners: top-left, top-right, bottom-right, bottom-left
    top_left = min(top_points)
    top_right = max(top_points)
    bottom_left = min(bottom_points)
    bottom_right = max(bottom_points)
    self.corners = [top_left, top_right, bottom_right, bottom_left]
    
    # Calculate destination rectangle dimensions
    xmin = min([each_arr[0][0] for each_arr in self.corners])
    ymin = min([each_arr[0][1] for each_arr in self.corners])
    xmax = max([each_arr[0][0] for each_arr in self.corners])
    ymax = max([each_arr[0][1] for each_arr in self.corners])
    width = abs(int(xmax - xmin))
    height = abs(int(ymax - ymin))
    
    dest = np.array(
        [[0, 0], [width, 0], [width, height], [0, height]], 
        dtype=np.float32
    )
    return self.corners, dest, width, height

This method uses the centroid to distinguish top and bottom edges, then sorts corners to match the destination rectangle format required for perspective transformation.

3. Shape Validation

The system validates detected shapes using OpenCV’s approxPolyDP:

def shape_approx_check(
    self, shape_check: ImageObjects.QUADRILATERAL, epsilon_factor: float = 0.1
) -> bool:
    """Check if points form expected shape using Douglas-Peucker approximation."""
    coords_array = np.array(self.coord, dtype=np.float32).reshape(self.size, 1, 2)
    perimeter = cv2.arcLength(coords_array, True)
    approx = cv2.approxPolyDP(coords_array, epsilon_factor * perimeter, True)
    self.coord = approx.tolist()  # Update to simplified coordinates
    return len(approx) == self.shape_type.value

The epsilon_factor parameter controls approximation accuracy—smaller values yield more precise shapes but may miss valid quadrilaterals with slight irregularities.

4. Perspective Transformation Class

The Perspective class orchestrates the entire correction pipeline:

@dataclass
class Perspective:
    source: str = field(metadata={"desc": "Full path of image to transform."})
    destination: np.ndarray = field(init=False)
    
    def __post_init__(self):
        """Initialize image and set destination points."""
        self.img = cv2.imread(self.source, 0)  # Read as grayscale
        height, width = self.img.shape[:2]
        self.destination = np.asarray(
            [[0, 0], [width, 0], [width, height], [0, height]], 
            dtype=np.float32
        )

Edge Detection:

def find_edges(self, lower_thresh: int, upper_thresh: int) -> np.ndarray:
    """Detect edges using Canny algorithm after binary thresholding."""
    _, binary = cv2.threshold(self.img, 127, 255, cv2.THRESH_BINARY)
    return cv2.Canny(binary, lower_thresh, upper_thresh)

Contour Extraction:

def find_contours(self, contour_area: int, edges: np.ndarray) -> ShapeDescriptor:
    """Find contours and extract intersection points from convex hulls."""
    hull_arr = []
    contours, hierarchy = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )
    
    # Filter by area and hierarchy (external contours only)
    for i, cnt in enumerate(contours):
        if cv2.contourArea(cnt) > contour_area and hierarchy[0, i, 3] == -1:
            hull_arr.append(cv2.convexHull(cnt, returnPoints=True))
    
    if not hull_arr:
        raise ValueError("No valid contours found.")
    
    # Calculate intersections between consecutive hull edges
    coord_obj = ShapeDescriptor()
    for hull in hull_arr:
        for j in range(len(hull) - 3):
            intersection = coord_obj.find_intersection(
                hull[j][0], hull[j + 1][0], 
                hull[j + 2][0], hull[j + 3][0]
            )
            if intersection:
                coord_obj.append_coord(*intersection)
    return coord_obj

Perspective Transformation:

def transform(self, corners: List[Tuple[float, float]], dest, w, h) -> np.ndarray:
    """Apply perspective transformation using homography matrix."""
    corners_array = np.array(corners, dtype=np.float32)
    transformation_matrix = cv2.getPerspectiveTransform(corners_array, dest)
    return cv2.warpPerspective(self.img, transformation_matrix, (w, h))

5. Main Pipeline

The find_transformed_image() method orchestrates the complete pipeline:

def find_transformed_image(self, config):
    """Execute full perspective correction pipeline from config."""
    lower, upper = (
        config["canny_thresholds"]["lower"],
        config["canny_thresholds"]["upper"],
    )
    area = config["min_contour_area"]
    shape, epsil = (
        config["shape_detection"]["target_shape"],
        config["shape_detection"]["epsilon_factor"],
    )
    
    # Pipeline execution
    edges = self.find_edges(lower, upper)
    c = self.find_contours(area, edges)
    
    if c.shape_approx_check(shape, epsil):  # Validate quadrilateral
        centroid = c.calculate_centroid()
        corners, dest, w, h = c.calculate_corners(centroid)
        transformed_img = self.transform(corners, dest, w, h)
        cv2.imwrite("result.jpg", transformed_img)

6. Configuration Management

The system uses YAML configuration for parameter tuning:

def load_config(config_path: str) -> dict:
    with open(config_path, "r") as file:
        return yaml.safe_load(file)

Example config.yaml:

canny_thresholds:
  lower: 50
  upper: 150
min_contour_area: 1000
shape_detection:
  target_shape: 4  # QUADRILATERAL
  epsilon_factor: 0.1

Usage Example

if __name__ == "__main__":
    import os
    
    config_path = "config.yaml"
    config = load_config(config_path)
    file_path = "card.jpg"
    
    persp = Perspective(source=os.path.join(os.path.dirname(__file__), file_path))
    transformed_img = persp.find_transformed_image(config)

Technical Highlights

Key Algorithms:

Canny Edge Detection: Multi-stage algorithm for robust edge extraction
Convex Hull: Simplifies contour representation while preserving shape
Douglas-Peucker Approximation: Reduces polygon complexity for shape validation
Homography Transformation: 3×3 matrix mapping for perspective correction

Design Patterns:

Dataclass-based architecture for clean data structures
Separation of concerns: ShapeDescriptor handles geometry, Perspective handles transformation
Configurable parameters via YAML for easy tuning

Applications

This automated perspective correction system enables:

Document Scanning: Automated alignment for OCR preprocessing
Card Recognition: ID card and business card digitization
AR Registration: Marker detection and alignment
Robotic Vision: Object pose estimation and alignment

Future Enhancements

Deep learning-based quadrilateral detection for improved robustness
Multi-shape support (pentagons, hexagons) via ImageObjects enum
Real-time processing pipeline for video streams
Adaptive threshold selection based on image characteristics