OpenCV : Guide to Computer Vision and Image Processing

zero comment

AI / ML / DS Machine Learning

Computer vision is a field of study that deals with how computers can interpret and understand images and video. It involves the use of various techniques and algorithms to extract meaningful information from visual data.

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. It was initially developed by Intel Corporation and later, it was maintained by Willow Garage and is now maintained by Itseez. It has a large community of developers and is used in many applications, including robotics, automotive, medical, and more.

OpenCV is written in C++ and has bindings for Python, Java, and MATLAB. It can run on Windows, Linux, macOS, Android, and iOS.

OpenCV provides tools for image and video processing, object detection, tracking, and recognition. It also provides tools for machine learning, such as support vector machines (SVM), k-nearest neighbors (k-NN), and neural networks.

In this article, we will explore some of the basic functionality of OpenCV using Python.

Installation of OpenCV

To install OpenCV, you can use the following command in the terminal:

pip install opencv-python

To install the contrib module, which contains additional algorithms and functions, you can use the following command:

pip install opencv-contrib-python

Basic Image Processing

In this section, we’ll cover some basic image manipulation operations using OpenCV, including reading and writing images, resizing images, and changing color spaces.

Loading an Image

To load an image, we can use the cv2.imread() function. This function takes the path to the image file as an argument and returns a NumPy array of the image.

import cv2
import numpy as np

img = cv2.imread('image.jpg')

Displaying an Image

To display an image, we can use the cv2.imshow() function. This function takes the window name and the NumPy array of the image as arguments.

cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The cv2.waitKey() function waits for a specified amount of time for a key event. If 0 is passed as an argument, it waits indefinitely for a key event. The cv2.destroyAllWindows() function closes all the windows.

Drawing on an Image

To draw on an image, we can use the cv2.line(), cv2.rectangle(), cv2.circle(), and cv2.putText() functions. These functions take the NumPy array of the image, the coordinates of the shape, the color, the thickness, and other arguments as input.

cv2.line(img, (0,0), (img.shape[1],img.shape[0]), (255,0,0), 5)
cv2.rectangle(img, (384,0), (510,128), (0,255,0), 3)
cv2.circle(img, (447,63), 63, (0,0,255), -1)
cv2.putText(img, 'OpenCV', (10,500), cv2.FONT_HERSHEY_SIMPLEX, 4, (255,255,255), 2, cv2.LINE_AA)

Saving an Image

To save an image, we can use the cv2.imwrite() function. This function takes the path to the output file and the NumPy array of the image as arguments.

cv2.imwrite('output.jpg', img)

Basic Video Processing

In this section, we’ll cover some basic video processing operations using OpenCV, including reading and displaying videos, capturing frames, and writing videos.

Reading and Displaying Videos

The first step in any video processing task is to read a video from a file. OpenCV provides the cv2.VideoCapture() function for reading videos, which takes the path to the video file as input and returns a VideoCapture object.

Here’s an example code snippet that reads a video and displays it using OpenCV:

import cv2

# Create a VideoCapture object and read from input file
cap = cv2.VideoCapture('video.mp4')

# Check if the video opened successfully
if not cap.isOpened():
    print("Error opening video file")

# Read until video is completed
while cap.isOpened():
    # Capture frame-by-frame
    ret, frame = cap.read()
    if ret:
        # Display the resulting frame
        cv2.imshow('Frame', frame)

        # Press 'q' to exit
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    else:
        break

# Release the video capture object and close all windows
cap.release()
cv2.destroyAllWindows()

The cap.read() function is used to read the next frame from the video file, and the resulting frame is displayed using the cv2.imshow() function. The cv2.waitKey() function is used to handle user input and exit the loop when the user presses the ‘q’ key.

Capturing Frames

Capturing frames from a video stream is a common operation in video processing, and OpenCV provides the cv2.VideoCapture() function for this purpose. This function takes the device index or the path to the video file as input and returns a VideoCapture object.

import cv2

# Create a VideoCapture object and read from the default camera
cap = cv2.VideoCapture(0)

# Check if the camera opened successfully
if not cap.isOpened():
    print("Error opening camera")

# Capture frame-by-frame
while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    if ret:
        # Display the resulting frame
        cv2.imshow('Frame', frame)

        # Press 'q' to exit
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    else:
        break

# Release the video capture object and close all windows
cap.release()
cv2.destroyAllWindows()

In this example, we’re capturing frames from the default camera (device index 0) using the cv2.VideoCapture() function. The cap.read() function is used to capture the next frame from the camera, and the resulting frame is displayed using the cv2.imshow() function.

Writing Videos

Writing videos is a common operation in video processing, and OpenCV provides the cv2.VideoWriter() function for this purpose. This function takes the output file path, the fourCC code, the frames per second, and the frame size as input and returns a VideoWriter object.

import cv2

# Open the video capture object
cap = cv2.VideoCapture("input_video.mp4")

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*"XVID")
fps = cap.get(cv2.CAP_PROP_FPS)
frame_size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
out = cv2.VideoWriter("output_video.avi", fourcc, fps, frame_size)

# Check if the video opened successfully
if not cap.isOpened():
    print("Error opening video file")

# Loop through the frames of the video and write to output file
while cap.isOpened():
    # Read a frame from the video
    ret, frame = cap.read()
    if ret:
        # Write the frame to the output video file
        out.write(frame)

        # Display the resulting frame
        cv2.imshow("Output Frame", frame)

        # Exit on 'q' key press
        if cv2.waitKey(25) & 0xFF == ord("q"):
            break
    else:
        break

# Release the video capture and writer objects, and close all windows
cap.release()
out.release()
cv2.destroyAllWindows()

In this code, we first open the video capture object using cv2.VideoCapture(). We then define the codec and create a VideoWriter object using cv2.VideoWriter(). The codec used here is XVID, which is a common codec for writing AVI files.

We then loop through the frames of the video using a while loop. For each frame, we read the frame using cap.read(), write the frame to the output video file using out.write(), and display the frame using cv2.imshow(). We also exit the loop if the user presses the ‘q’ key.

Finally, we release the video capture and writer objects using cap.release() and out.release(), respectively, and close all windows using cv2.destroyAllWindows().

Basic Object Detection

Object detection is the process of locating and classifying objects within images or video frames. OpenCV provides several pre-trained object detection models that can be used for a variety of tasks, from face detection to object tracking. In this article, we’ll walk through the basics of object detection with OpenCV, using a pre-trained classifier to detect faces in an image.

Loading the Classifier

First, we need to load a pre-trained classifier for detecting faces. OpenCV provides several pre-trained classifiers for face detection, which are trained using the Haar feature-based cascade classifier. These classifiers are available in the cv2.data module, and can be loaded using the cv2.CascadeClassifier() function. Here’s an example:

import cv2

# Load the classifier
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

Detecting Objects

Once we’ve loaded the classifier, we can use it to detect objects in an image. The cv2.CascadeClassifier.detectMultiScale() function is used for this purpose, which takes an input image and returns a list of bounding boxes for detected objects. Here’s an example:

import cv2

# Load the classifier
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

# Load the image
img = cv2.imread("input.jpg")

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

# Draw bounding boxes around detected faces
for (x, y, w, h) in faces:
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)

# Display the image
cv2.imshow("Output", img)
cv2.waitKey(0)

In this example, we load an image using the cv2.imread() function and convert it to grayscale using the cv2.cvtColor() function. We then detect faces in the image using the face_cascade.detectMultiScale() function, which returns a list of bounding boxes for detected faces. Finally, we draw bounding boxes around the detected faces using the cv2.rectangle() function and display the image using the cv2.imshow() function.

Youtube / Video Tutorials

Here are some YouTube video tutorials for OpenCV along with a brief description

“LEARN OPENCV in 3 HOURS with Python | Including 3xProjects | Computer Vision Martaza workshop” is a comprehensive tutorial video that teaches the basics of OpenCV with Python in just 3 hours. This tutorial is a great resource for anyone looking to learn OpenCV with Python quickly and efficiently. The step-by-step approach and hands-on projects make it easy to follow along and understand the core concepts of OpenCV.
“Python OpenCV Tutorial – Full Course for Beginners” by freeCodeCamp.org: This tutorial series covers the basics of image and video processing with OpenCV using Python. It starts with the basics of image processing and gradually moves towards more advanced topics like object tracking and face recognition.
“OpenCV with Python for Image and Video Analysis” is a comprehensive video tutorial series by sentdex on YouTube that covers image and video analysis using OpenCV with Python. The series covers a wide range of topics, from the basics of image processing to advanced techniques such as object detection and face recognition.
“OpenCV Python Tutorial | Creating Face Detection System And Motion Detector Using OpenCV | Edureka” is a great resource for anyone interested in learning how to use OpenCV with Python to create real-world projects. The tutorial covers two interesting and practical projects that demonstrate the power and versatility of OpenCV for computer vision applications.

Reference Documentation and Forums

OpenCV provides comprehensive reference documentation on their official website, which covers all the functions, classes, and modules available in OpenCV. The documentation is well-structured and includes examples, explanations, and code snippets for each function.

The OpenCV documentation is available at: https://docs.opencv.org/master/

In addition to the official documentation, there are several forums and communities where developers can seek help and advice on OpenCV-related issues. Some of the popular forums and communities for OpenCV include:

OpenCV Forum: The official forum for OpenCV, where developers can ask questions, share knowledge, and discuss OpenCV-related topics. The forum is moderated by the OpenCV development team and is a great resource for getting help and advice on OpenCV.
Stack Overflow on OpenCV: A popular Q&A website where developers can ask and answer technical questions related to OpenCV and other programming languages and frameworks. Stack Overflow has a large community of developers who are knowledgeable about OpenCV and can provide help and advice on a wide range of topics.
Reddit OpenCV: A subreddit dedicated to OpenCV, where developers can share news, tutorials, and projects related to OpenCV. The subreddit has a large and active community of developers who are passionate about OpenCV and can provide help and advice on OpenCV-related issues.

Overall, the OpenCV documentation and forums provide a wealth of knowledge and resources for developers who are using or learning OpenCV. Whether you are a beginner or an experienced developer, these resources can help you get the most out of OpenCV and develop high-quality computer vision applications.

Reference Technical Books

There are several technical books available that provide a comprehensive guide to using OpenCV for computer vision applications. Some of the popular technical books on OpenCV include:

“Learning OpenCV 4 Computer Vision with Python 3” by Joseph Howse and Prateek Joshi: This book provides a hands-on guide to using OpenCV for computer vision applications, with a focus on using Python 3. The book covers a wide range of topics, including image processing, feature detection, and object tracking, and provides code examples and practical tips for using OpenCV in real-world projects.
“Mastering OpenCV 4 with Python” by Alberto Fernández Villán: This book provides an in-depth guide to using OpenCV 4 with Python for computer vision applications. The book covers advanced topics such as deep learning, object detection, and facial recognition, and provides practical examples and code snippets for each topic.
“OpenCV 4 with Python Blueprints” by Gabriel Garrido Calvo: This book provides a project-based approach to learning OpenCV with Python, with a focus on practical examples and applications. The book covers a wide range of topics, including face recognition, image segmentation, and object detection, and provides step-by-step instructions and code snippets for each project.
“OpenCV Computer Vision with Python” by Joseph Howse: This book provides an introduction to using OpenCV for computer vision applications, with a focus on using Python. The book covers a wide range of topics, including image processing, feature detection, and object recognition, and provides code examples and practical tips for using OpenCV in real-world projects.

Overall, these books provide a comprehensive guide to using OpenCV for computer vision applications and can be a valuable resource for developers who are learning or working with OpenCV.

Conclusion

In conclusion, OpenCV is a powerful open-source library for computer vision and image processing that offers a wide range of tools and functions for developing advanced computer vision applications. From basic image manipulation to complex object detection and tracking, OpenCV provides a versatile and flexible platform that can be used in a wide range of applications. The library is supported by a large community of developers and researchers, and there are numerous resources available for learning and using OpenCV, including documentation, forums, and technical books. Overall, OpenCV is a valuable tool for developers who are working with computer vision and image processing, and its versatility and flexibility make it a popular choice for a wide range of applications.

About The Author

KS Rao

Srinivasa Rao Koyyalamudi (KS) is a seasoned professional with over 26+ years of diverse expertise spanning multiple business functions, industries, and global geographies. With a proven track record of assembling and nurturing high-performing teams from the ground up, KS has consistently driven success throughout his career journey.

Bringing a wealth of knowledge and expertise to the table, KS holds a graduate degree in Chemical Engineering, a postgraduate degree in Chemical Plant Design, an Executive MBA in Global Business Management, and PMP certification. This diverse educational foundation underscores his ability to navigate complex challenges and drive innovative solutions.

Beyond his impressive credentials, KS is an avid advocate of continuous learning, driven by a fervent passion for AI and Data Science. He remains at the forefront of industry advancements, continuously experimenting with innovative techniques and tools.

See author's posts