Skip to content

Quick Start

Ready to write some code? This guide will take you from “zero” to running your first Computer Vision pipeline in Python.

We won’t just dump code on you; we’ll explain why things work the way they do. By the end, you’ll be able to read images, manipulate them as matrices, and even detect faces in real-time.

Let’s start with the “Hello World” of Computer Vision: loading an image, showing it, and saving a copy.

Python
import cv2
# 1. Read an image from disk
# Returns a NumPy array, or None if the file doesn't exist
img = cv2.imread("image.jpg")
if img is None:
raise FileNotFoundError("Could not load image. Double-check your path!")
# 2. Show the image in a window
# "Original image" is the window title
cv2.imshow("Original image", img)
# 3. Wait indefinitely for a key press
# 0 means "wait forever". If you pass 1000, it waits 1 second.
cv2.waitKey(0)
# 4. Clean up
cv2.destroyAllWindows()
# 5. Save the result
cv2.imwrite("image_copy.jpg", img)

This simple script introduces the “Holy Trinity” of OpenCV I/O: imread, imshow, and imwrite. You’ll use these in almost every project.

Before we go further, there are three quirks you need to internalized to avoid hours of debugging later.

  1. OpenCV is BGR, not RGB

    This is a historical artifact. When OpenCV was created, BGR (Blue-Green-Red) was a popular memory layout for cameras. Today, almost everyone (browsers, matplotlib, your phone) uses RGB.

    If your images look “smurfish” (blue people, orange skies), you probably forgot to convert:

    Python
    img_bgr = cv2.imread("image.jpg")
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) # Fix colors for Matplotlib/Web
  2. Images are just NumPy Arrays

    There is no special “Image Class” in OpenCV Python. There is only numpy.ndarray. This is fantastic news because it means you can use the entire NumPy ecosystem tools on your images.

    Python
    import cv2
    import numpy as np
    img = cv2.imread("image.jpg")
    print(type(img)) # <class 'numpy.ndarray'>
    print(img.shape) # (height, width, channels) -> e.g. (1080, 1920, 3)
    print(img.dtype) # uint8 (Unsigned Integer 8-bit)
  3. (Y, X) not (X, Y)

    Because they are matrices, we access them by [Row, Column]. In image terms, Row is the Y-axis (height) and Column is the X-axis (width).

    Python
    # Accessing the pixel at y=10, x=20
    pixel = img[10, 20]

    If you try img[x, y], you will crash or get the wrong pixel. Always think “Row, Column”.

Since images are just arrays, “editing” an image is really just “slicing” an array.

Information reduction is key. Color often adds noise without adding value (e.g., for detecting edges).

Python
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Smaller images process faster. Often, a 4K image is overkill for detection; resizing to 640x480 is standard.

Python
# Fixed size (Width, Height) <- Note: resize uses (Width, Height) order!
resized = cv2.resize(img, (640, 480))
# Relative scaling
half_size = cv2.resize(img, None, fx=0.5, fy=0.5)

No special function needed. Just slice the NumPy array!

Python
# Crop [StartRow:EndRow, StartCol:EndCol]
# AKA [y1:y2, x1:x2]
crop = img[100:300, 200:500]

Annotate your results. Helpful for debugging or showing the user what the AI saw.

Python
# Draw a Green Rectangle (BGR: 0, 255, 0)
cv2.rectangle(img, (50, 100), (250, 250), (0, 255, 0), 2)
# Write Text
cv2.putText(img, "Target Detected", (50, 420), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)

Now let’s do some actual computer vision.

Why blur? Cameras produce “noise” (grain). Blurring smooths out that noise so algorithms don’t mistake grain for edges.

Python
# Kernel size (15, 15) determines how strong the blur is
blur = cv2.GaussianBlur(img, (15, 15), 0)

The Canny algorithm finds where pixel intensity changes rapidly (gradients). It’s basically a line drawing generator.

Python
# Lower/Upper thresholds determine how picky the algorithm is
edges = cv2.Canny(gray, 100, 200)

Video is just a sequence of images (frames) shown fast. If you can process an image, you can process video.

Python
cap = cv2.VideoCapture(0) # 0 is usually your default webcam
if not cap.isOpened():
raise RuntimeError("Could not open webcam.")
while True:
# 1. Grab a frame
ret, frame = cap.read()
if not ret: break # Stop if stream ends
# 2. Process the frame (e.g., convert to gray)
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# 3. Show it
cv2.imshow("Webcam Feed", processed)
# 4. Quit if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

Let’s combine everything into a useful script: detecting shapes.

  1. Input: Image
  2. Pre-process: Grayscale -> Blur (remove noise)
  3. Process: Canny Edges -> Find Contours
  4. Post-process: Draw boxes
  5. Output: Image with detections
Python
import cv2
# 1. Load
img = cv2.imread("shapes.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 2. Blur to remove noise
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# 3. Detect Edges
edges = cv2.Canny(blur, 50, 150)
# 4. Find Contours (continuous curves of points)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# 5. Draw results
output = img.copy()
for cnt in contours:
# Get the bounding box of the contour
x, y, w, h = cv2.boundingRect(cnt)
# Draw it
cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("Detections", output)
cv2.waitKey(0)
cv2.destroyAllWindows()

For a quick win, OpenCV comes with pre-trained models. Haar Cascades are an older technique (from 2001), but they are fast and run well on CPUs.

Python
# Load the pre-trained model XML
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)
img = cv2.imread("people.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# The Magic Line
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=4)
# Draw faces
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

Even pros make these mistakes. Run through this checklist if your code acts weird:

  1. Path Issues: cv2.imread() doesn’t error if the file is missing; it just returns None. Always check if img is None:.
  2. Color Confusion: Remember BGR. If your colors look swapped, they probably are.
  3. Data Types: Images are uint8 (0-255). If you do math and end up with floats (0.0-1.0), imshow might display a black square unless you convert back or normalize.
  4. Infinite Loops: If your video window freezes, you probably forgot cv2.waitKey(1). This function is what actually tells the OS to draw the window pixels.