SLAM Module Hardware Components:
The Simultaneous Localization and Mapping Algorithm is dependent on data; gathering copious data is paramount to generating the most accurate map possible. For a robot to generate a map of its surroundings, it must be able to gather data on all of its surroundings in the most efficient manner possible. In designing a hardware module for gathering data, we concluded that the hardware module must be able to gather distance readings at all angles in as little time as possible. In order to provide the module with a full 360° view on its surroundings, we 3D-printed a hollow cube with holes that could be attached to a Servo motor. On the cube, we attached 4 Time-of-Flight laser sensors that connect directly to our robot’s microcontroller: an Arduino Uno. Because the lasers had wires on them, the Servo we purchased was designed to only spin 180° before being forced to spin in the reverse direction.
SLAM Module Mounted on Bot
The biggest issue we had when designing this system was the hardware limitations of the robot itself. Unlike million-dollar projects that implement SLAM such as Google’s self-driving car prototype, our robot’s main processors were a Raspberry Pi and an Arduino communicating over serial. The sensors themselves were also relatively slow, taking roughly 100ms to fully capture the readings from all 4 Time-of-Flight sensors at a single angle. Ultimately, however, the module was sufficient in providing a means for gathering enough data for SLAM.
Designing a SLAM Algorithm for our Robot:
As such, we need a reliable method of extracting and processing points surrounding the robot. The first component of our SLAM algorithm was creating a localized map of our surroundings given feedback from our SLAM module. For data extraction, the SLAM module returns 6 values, with 1 corresponding to the current timestamp, 4 corresponding to laser readings from each of the 4 respective lasers, and 1 corresponding to the current angle of the servo motor. The robot was also equipped with an IMU that fed the robot information regarding its global angle in the map with respect to its starting position. With information about the angle of each sensor with respect to the global map and a distance reading, the robot is able to convert the (distance, angle) readings, or polar coordinate readings, into cartesian coordinates (x, y) through simple trigonometric functions. After the servo spins a full cycle, the SLAM module generates enough points on its local map to be translated to its global map. The local map can then be added to the robot’s global map by adding the distance the robot traveled from its origin (x, y).
The global map is essentially the local map translated by the robot's position and angle. This is important to understand when going into error correction
For simplicity, our SLAM algorithm generates lines using OpenCV’s fitLine() function with a preset minimum cluster of points with maximum error ε. The lines act as useful approximations of walls within the robot’s map that the robot should avoid. For us, detecting more clustered shapes and patterns in our SLAM algorithm was unnecessary, however, in more realistic scenarios detecting jagged edges and different types of obstacles is important.
Line fitting is a simple feature; delving into shapes is a more problem-specific and complicated
The method of extracting and processing data seems to solve the SLAM problem altogether, however, the method relies on a perfect set of inputs for the generated map to be accurate. In a realistic scenario, noise and error generated by the SLAM module, the IMU, and the robot itself accumulate quickly as the robot continues to generate its map. Without proper error correction, the map generated by the robot is distorted and offset, rendering the generated map useless.
SLAM Error Correction:
Our SLAM algorithm applies two different error correction methods on the robot’s processed data. The first method we applied to our SLAM algorithm involved error correction through point alignment. Point alignment refers to aligning a previously seen set of points from one juncture to the same set of previously seen points in a future juncture. The picture below is a visual representation of point alignment.
Knowing the past frame I(x1, y1), we calculate the error in I'(x2, y2) by aligning the two frames in the global map
While the idea behind point alignment is intuitive, its implementation is not as trivial. When mapping a set of points from one position/timestamp to another, the points are not exactly identical. Because the sensors on the SLAM module return one exact point as a reading, it is almost impossible for those sensors to point at that exact point at a different point in time. To compensate for this issue, we estimate the points around the same area yield similar readings to each other; this is also the basis behind the line/shape fitting component used during data processing. For this point alignment problem, we want to try and map the set of points from our robot’s current juncture to the same set of points on the robot’s previous juncture. Thus, we’re looking for the minimum difference between those two junctures. With this intuition in mind, we can mathematically represent this “minimum difference” to actually solve for it in real time.
Usually, in minimization problems where we try to find a set of values that minimizes a function, we perform a gradient descent optimizer on a cost function to find the minimum of that cost function. Minimizing the error in a SLAM problem is the same. We define a cost function J(Δx, Δy) that denotes the total error or discrepancy between two timeframes, I(x1, y1) and I'(x2, y2). Note that the variables x1 and y1 denote a vector of numbers; together they form an array of points corresponding to the points on the map collected during that time frame. The physical difference between I(x1, y1) and I'(x2, y2) is simply the distance the robot traveled from I to I’, Δx and Δy. However, knowing that error exists, we attempt to find the scalar pair of real numbers Δx and Δy that minimizes J(Δx, Δy) because the current Δx and Δy are incorrect. We thus define the full cost function, J(Δx, Δy) = (I'(x2-Δx, y2-Δy)-I(x1, y1))2 where x and y are actually a vector of values; hence the entire equation if the sum of the difference between all overlapping points in both I and I’. We put the squared there simply to make deriving the function easier; wrapping it in an absolute value function works as well, however, the gradient of such functions are undefined and the compiler might complain.
Now that we’ve defined our function we want to minimize, it becomes clearer how to solve the issues of error in our SLAM Algorithm. The next issue is how to perform the gradient descent optimizer. Note that we cannot simply take the derivative of the cost function J(Δx, Δy) because the cost function J(Δx, Δy) is actually the sum of the difference between all overlapping vectors from timeframe I and I’. Calculating the derivative of such a function is nearly impossible, however, calculating the derivative at one point is easy by finding the limit of the cost function at a certain point. For machine learning/deep learning practitioners at there, it’s common knowledge that gradient descent algorithms are performed during training because of how many iterations they require to perform accurately; especially for a board like the Raspberry Pi, it is unrealistic to believe that a standard gradient descent algorithm could minimize the cost in real time. Luckily, because our problem is so simple and because we are only looking at two scalars Δx and Δy, there exists an iterative method known as Newton’s Method which can quickly solve for when the gradient of J(Δx, Δy). To summarize, we can approximate the root(s) of J(Δx, Δy) by finding when the tangent of J(Δx, Δy) at a certain point (Δx, Δy) is 0, and repeat this process until we find a minimum. The method is faster than standard gradient descent because it does not rely on an alpha learning rate and simply approximates where the cost function is at a minimum by taking large steps. If you’re interested in the underlying math behind this method, there are plenty of tutorials and papers discussing its derivation. Because we minimize J(Δx, Δy) based on robot’s previous set of recorded data, we can perform this minimization function every single time the SLAM module spins its full rotation (roughly every 8 seconds), making it extremely efficient and useful.
Although this method of error correction is extremely useful, there are several reasons our SLAM algorithm requires a more accurate form of error correction. Firstly, our cost function is not as simple as it looks. Because of noise in the sensor data, the distribution of points is all over the place. Thus, several local minimums exist in the function that makes it difficult for Newton’s Method or any gradient descent optimizer to find the true global minimum of the function. Secondly, noise messes up the idea of point alignment because it offsets the position of points that theoretically should align. While noise may not seem like a big issue, in the long run, small offsets will begin the accumulate as the robot spends more time navigating a region. Fitting another method into the SLAM Algorithm would cause our robot to slow down, however, so we created an error correction method that corrects error more accurately but runs less frequently.
The second method of error correction revolves around the idea of feature landmarking. This directly ties into the PiCam and image recognition and does not rely on any assumptions that the previous method of error correction did, making it less reactive to noise. The main drawback of feature landmarking is that it cannot be applied at all times, and thus, combining point alignment and feature landmarking achieves the highest efficacy and efficiency.
Merging Image Recognition with SLAM for Feature Landmarking and SLAM Correction:
Feature Landmarking Example: Letter Recognition
To start, the robot is fed a 3-channel, 320 x 240-pixel image from our PiCam, with each channel representing a color from the standard RGB color model. When applying filters or processing our image, our bot must iterate through each pixel and channel, meaning it must iterate through 320 * 240 * 3, or 230400 values to apply one filter or mask to one image. Because our bot uses a commercial microcontroller with a limited processor, the first step we took was to reduce our image to 1 channel by grayscaling it(removing all color). In the context of this problem, grayscaling is useful because the color of the images is not a determinant in identifying letters.
To further reduce the image, we applied thresholding on the image to reduce the 1-channel image with 28 possible values to a 1-channel image with 21 values: black or white. Thresholding is simply converting all pixel values above some threshold to one value, and all values below to the other. The importance of this operation is that we can completely distinguish our region of interest from our background, allowing our bot to more easily identify patterns when filters are applied and the image size is reduced. A problem arises, however, when strong light is visible in an image. Certain parts of an image that are supposed to be dark may appear bright due to the reflection of light. A perfect example of this is shown below:
Image from OpenCV Documentation
To solve this we can use a method called adaptive thresholding. Instead of comparing pixel values to the image as a whole, pixels are compared to their neighboring pixels. We can also apply a Gaussian threshold to further improve our modified image, where pixels are weighted based on how close they are to the compared pixel, with the weights determined by a Gaussian/normal distribution. Below is the same image, but with adaptive thresholding.
Image from OpenCV Documentation
You’ll notice in the image that despite filtering out a majority of the background, small pixels of “noise” are still visible on the image. Unfortunately, removing this noise may be difficult because based on the previous techniques used, the remaining pixels must have had similar values to our target letter in addition to having a similar structure. To solve this issue, we took the contours of each blob in the image and chose the largest blob as our “letter”. In most cases, this technique is extremely dangerous, however, given the simplicity of our task, it was the best and easiest option. Below is the result of thresholding our grayscaled example image:
Once the selection is complete, we will resize the image to a set size. This ensures that whether the letter is far away or very close that a similar image appears to the computer. In addition, it reduces the number of pixels that the computer has to process in the later identification steps. Once all this is done, it is finally time for pattern recognition. As we only have to identify three letters, H, S, and U, we can carefully look at the letters and identify similarities and differences. A careful examination of the letters shows that by slicing each image into 9 equal squares like a tic tac toe boards, each image has a different combination of colors in the top middle and bottom middleboxes. Therefore, we can analyze the pixel color in these two spots to identify the letter we are looking at.
Overall, developing a SLAM algorithm for our bot was a fun project to explore two growing topics in modern technology. While we were unfortunately unable to complete the project in time for the RCJ competition, continuing and creating a workable version of this algorithm helped us learn a lot, and we hope to continue playing around with the algorithm to see if we can apply it in the future.