Quick Start
Ready to write some code? This guide will take you from “zero” to running your first Computer Vision pipeline in Python.
We won’t just dump code on you; we’ll explain why things work the way they do. By the end, you’ll be able to read images, manipulate them as matrices, and even detect faces in real-time.
Your First OpenCV Script
Section titled “Your First OpenCV Script”Let’s start with the “Hello World” of Computer Vision: loading an image, showing it, and saving a copy.
import cv2
# 1. Read an image from disk# Returns a NumPy array, or None if the file doesn't existimg = cv2.imread("image.jpg")
if img is None: raise FileNotFoundError("Could not load image. Double-check your path!")
# 2. Show the image in a window# "Original image" is the window titlecv2.imshow("Original image", img)
# 3. Wait indefinitely for a key press# 0 means "wait forever". If you pass 1000, it waits 1 second.cv2.waitKey(0)
# 4. Clean upcv2.destroyAllWindows()
# 5. Save the resultcv2.imwrite("image_copy.jpg", img)This simple script introduces the “Holy Trinity” of OpenCV I/O: imread, imshow, and imwrite. You’ll use these in almost every project.
Core Concepts (The “Gotchas”)
Section titled “Core Concepts (The “Gotchas”)”Before we go further, there are three quirks you need to internalized to avoid hours of debugging later.
-
OpenCV is BGR, not RGB
This is a historical artifact. When OpenCV was created, BGR (Blue-Green-Red) was a popular memory layout for cameras. Today, almost everyone (browsers, matplotlib, your phone) uses RGB.
If your images look “smurfish” (blue people, orange skies), you probably forgot to convert:
Python img_bgr = cv2.imread("image.jpg")img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) # Fix colors for Matplotlib/Web -
Images are just NumPy Arrays
There is no special “Image Class” in OpenCV Python. There is only
numpy.ndarray. This is fantastic news because it means you can use the entire NumPy ecosystem tools on your images.Python import cv2import numpy as npimg = cv2.imread("image.jpg")print(type(img)) # <class 'numpy.ndarray'>print(img.shape) # (height, width, channels) -> e.g. (1080, 1920, 3)print(img.dtype) # uint8 (Unsigned Integer 8-bit) -
(Y, X) not (X, Y)
Because they are matrices, we access them by
[Row, Column]. In image terms,Rowis the Y-axis (height) andColumnis the X-axis (width).Python # Accessing the pixel at y=10, x=20pixel = img[10, 20]If you try
img[x, y], you will crash or get the wrong pixel. Always think “Row, Column”.
Basic Image Operations
Section titled “Basic Image Operations”Since images are just arrays, “editing” an image is really just “slicing” an array.
1. Grayscale Conversion
Section titled “1. Grayscale Conversion”Information reduction is key. Color often adds noise without adding value (e.g., for detecting edges).
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)2. Resizing
Section titled “2. Resizing”Smaller images process faster. Often, a 4K image is overkill for detection; resizing to 640x480 is standard.
# Fixed size (Width, Height) <- Note: resize uses (Width, Height) order!resized = cv2.resize(img, (640, 480))
# Relative scalinghalf_size = cv2.resize(img, None, fx=0.5, fy=0.5)3. Cropping (Slicing)
Section titled “3. Cropping (Slicing)”No special function needed. Just slice the NumPy array!
# Crop [StartRow:EndRow, StartCol:EndCol]# AKA [y1:y2, x1:x2]crop = img[100:300, 200:500]4. Drawing
Section titled “4. Drawing”Annotate your results. Helpful for debugging or showing the user what the AI saw.
# Draw a Green Rectangle (BGR: 0, 255, 0)cv2.rectangle(img, (50, 100), (250, 250), (0, 255, 0), 2)
# Write Textcv2.putText(img, "Target Detected", (50, 420), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)Filters and Features
Section titled “Filters and Features”Now let’s do some actual computer vision.
Blurring (Gaussian)
Section titled “Blurring (Gaussian)”Why blur? Cameras produce “noise” (grain). Blurring smooths out that noise so algorithms don’t mistake grain for edges.
# Kernel size (15, 15) determines how strong the blur isblur = cv2.GaussianBlur(img, (15, 15), 0)Edge Detection (Canny)
Section titled “Edge Detection (Canny)”The Canny algorithm finds where pixel intensity changes rapidly (gradients). It’s basically a line drawing generator.
# Lower/Upper thresholds determine how picky the algorithm isedges = cv2.Canny(gray, 100, 200)Working with Video
Section titled “Working with Video”Video is just a sequence of images (frames) shown fast. If you can process an image, you can process video.
Webcam Capture loop
Section titled “Webcam Capture loop”cap = cv2.VideoCapture(0) # 0 is usually your default webcam
if not cap.isOpened(): raise RuntimeError("Could not open webcam.")
while True: # 1. Grab a frame ret, frame = cap.read() if not ret: break # Stop if stream ends
# 2. Process the frame (e.g., convert to gray) processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# 3. Show it cv2.imshow("Webcam Feed", processed)
# 4. Quit if 'q' is pressed if cv2.waitKey(1) & 0xFF == ord('q'): break
cap.release()cv2.destroyAllWindows()Putting it Together: A Mini-Pipeline
Section titled “Putting it Together: A Mini-Pipeline”Let’s combine everything into a useful script: detecting shapes.
- Input: Image
- Pre-process: Grayscale -> Blur (remove noise)
- Process: Canny Edges -> Find Contours
- Post-process: Draw boxes
- Output: Image with detections
import cv2
# 1. Loadimg = cv2.imread("shapes.png")gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 2. Blur to remove noiseblur = cv2.GaussianBlur(gray, (5, 5), 0)
# 3. Detect Edgesedges = cv2.Canny(blur, 50, 150)
# 4. Find Contours (continuous curves of points)contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# 5. Draw resultsoutput = img.copy()for cnt in contours: # Get the bounding box of the contour x, y, w, h = cv2.boundingRect(cnt) # Draw it cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("Detections", output)cv2.waitKey(0)cv2.destroyAllWindows()Face Detection (Haar Cascades)
Section titled “Face Detection (Haar Cascades)”For a quick win, OpenCV comes with pre-trained models. Haar Cascades are an older technique (from 2001), but they are fast and run well on CPUs.
# Load the pre-trained model XMLface_cascade = cv2.CascadeClassifier( cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
img = cv2.imread("people.jpg")gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# The Magic Linefaces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=4)
# Draw facesfor (x, y, w, h) in faces: cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)Common Pitfalls
Section titled “Common Pitfalls”Even pros make these mistakes. Run through this checklist if your code acts weird:
- Path Issues:
cv2.imread()doesn’t error if the file is missing; it just returnsNone. Always checkif img is None:. - Color Confusion: Remember BGR. If your colors look swapped, they probably are.
- Data Types: Images are
uint8(0-255). If you do math and end up with floats (0.0-1.0),imshowmight display a black square unless you convert back or normalize. - Infinite Loops: If your video window freezes, you probably forgot
cv2.waitKey(1). This function is what actually tells the OS to draw the window pixels.