In this section, key processes that enable robots to understand their visual surroundings are explained, including object detection, which identifies objects' positions, segmentation that categorizes image areas, and recognition that classifies these objects. Various methods such as CNNs and tools like Mask R-CNN are discussed in detail.