Quick Start

Ready to write some code? This guide will take you from “zero” to running your first Computer Vision pipeline in Python.

We won’t just dump code on you; we’ll explain why things work the way they do. By the end, you’ll be able to read images, manipulate them as matrices, and even detect faces in real-time.

Your First OpenCV Script

Let’s start with the “Hello World” of Computer Vision: loading an image, showing it, and saving a copy.

import cv2

# 1. Read an image from disk
# Returns a NumPy array, or None if the file doesn't exist
img = cv2.imread("image.jpg")

if img is None:
    raise FileNotFoundError("Could not load image. Double-check your path!")

# 2. Show the image in a window
# "Original image" is the window title
cv2.imshow("Original image", img)

# 3. Wait indefinitely for a key press
# 0 means "wait forever". If you pass 1000, it waits 1 second.
cv2.waitKey(0)

# 4. Clean up
cv2.destroyAllWindows()

# 5. Save the result
cv2.imwrite("image_copy.jpg", img)

This simple script introduces the “Holy Trinity” of OpenCV I/O: imread, imshow, and imwrite. You’ll use these in almost every project.

Core Concepts (The “Gotchas”)

Before we go further, there are three quirks you need to internalized to avoid hours of debugging later.

OpenCV is BGR, not RGB

This is a historical artifact. When OpenCV was created, BGR (Blue-Green-Red) was a popular memory layout for cameras. Today, almost everyone (browsers, matplotlib, your phone) uses RGB.

If your images look “smurfish” (blue people, orange skies), you probably forgot to convert:
Python
```
img_bgr = cv2.imread("image.jpg")
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) # Fix colors for Matplotlib/Web
```
Images are just NumPy Arrays

There is no special “Image Class” in OpenCV Python. There is only numpy.ndarray. This is fantastic news because it means you can use the entire NumPy ecosystem tools on your images.
Python
```
import cv2
import numpy as np

img = cv2.imread("image.jpg")

print(type(img))       # <class 'numpy.ndarray'>
print(img.shape)       # (height, width, channels) -> e.g. (1080, 1920, 3)
print(img.dtype)       # uint8 (Unsigned Integer 8-bit)
```
Mental Model: Think of an image as a 3D generic matrix of numbers. - shape: (Rows/Height, Cols/Width, Channels/Colors) - dtype: uint8 (0-255). 0 is black, 255 is bright.
(Y, X) not (X, Y)

Because they are matrices, we access them by [Row, Column]. In image terms, Row is the Y-axis (height) and Column is the X-axis (width).
Python
```
# Accessing the pixel at y=10, x=20
pixel = img[10, 20]
```
If you try img[x, y], you will crash or get the wrong pixel. Always think “Row, Column”.

Basic Image Operations

Since images are just arrays, “editing” an image is really just “slicing” an array.

1. Grayscale Conversion

Information reduction is key. Color often adds noise without adding value (e.g., for detecting edges).

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

2. Resizing

Smaller images process faster. Often, a 4K image is overkill for detection; resizing to 640x480 is standard.

# Fixed size (Width, Height) <- Note: resize uses (Width, Height) order!
resized = cv2.resize(img, (640, 480))

# Relative scaling
half_size = cv2.resize(img, None, fx=0.5, fy=0.5)

3. Cropping (Slicing)

No special function needed. Just slice the NumPy array!

# Crop [StartRow:EndRow, StartCol:EndCol]
# AKA [y1:y2, x1:x2]
crop = img[100:300, 200:500]

4. Drawing

Annotate your results. Helpful for debugging or showing the user what the AI saw.

# Draw a Green Rectangle (BGR: 0, 255, 0)
cv2.rectangle(img, (50, 100), (250, 250), (0, 255, 0), 2)

# Write Text
cv2.putText(img, "Target Detected", (50, 420), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)

Filters and Features

Now let’s do some actual computer vision.

Blurring (Gaussian)

Why blur? Cameras produce “noise” (grain). Blurring smooths out that noise so algorithms don’t mistake grain for edges.

# Kernel size (15, 15) determines how strong the blur is
blur = cv2.GaussianBlur(img, (15, 15), 0)

Edge Detection (Canny)

The Canny algorithm finds where pixel intensity changes rapidly (gradients). It’s basically a line drawing generator.

# Lower/Upper thresholds determine how picky the algorithm is
edges = cv2.Canny(gray, 100, 200)

Working with Video

Video is just a sequence of images (frames) shown fast. If you can process an image, you can process video.

Webcam Capture loop

cap = cv2.VideoCapture(0) # 0 is usually your default webcam

if not cap.isOpened():
    raise RuntimeError("Could not open webcam.")

while True:
    # 1. Grab a frame
    ret, frame = cap.read()
    if not ret: break # Stop if stream ends

    # 2. Process the frame (e.g., convert to gray)
    processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 3. Show it
    cv2.imshow("Webcam Feed", processed)

    # 4. Quit if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Putting it Together: A Mini-Pipeline

Let’s combine everything into a useful script: detecting shapes.

Input: Image
Pre-process: Grayscale -> Blur (remove noise)
Process: Canny Edges -> Find Contours
Post-process: Draw boxes
Output: Image with detections

import cv2

# 1. Load
img = cv2.imread("shapes.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 2. Blur to remove noise
blur = cv2.GaussianBlur(gray, (5, 5), 0)

# 3. Detect Edges
edges = cv2.Canny(blur, 50, 150)

# 4. Find Contours (continuous curves of points)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# 5. Draw results
output = img.copy()
for cnt in contours:
    # Get the bounding box of the contour
    x, y, w, h = cv2.boundingRect(cnt)
    # Draw it
    cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow("Detections", output)
cv2.waitKey(0)
cv2.destroyAllWindows()

Face Detection (Haar Cascades)

For a quick win, OpenCV comes with pre-trained models. Haar Cascades are an older technique (from 2001), but they are fast and run well on CPUs.

# Load the pre-trained model XML
face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)

img = cv2.imread("people.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# The Magic Line
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=4)

# Draw faces
for (x, y, w, h) in faces:
    cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

Common Pitfalls

Even pros make these mistakes. Run through this checklist if your code acts weird:

Path Issues: cv2.imread() doesn’t error if the file is missing; it just returns None. Always check if img is None:.
Color Confusion: Remember BGR. If your colors look swapped, they probably are.
Data Types: Images are uint8 (0-255). If you do math and end up with floats (0.0-1.0), imshow might display a black square unless you convert back or normalize.
Infinite Loops: If your video window freezes, you probably forgot cv2.waitKey(1). This function is what actually tells the OS to draw the window pixels.