Abstract
This report outlines a practical implementation of the Tsai camera calibration technique, excluding the effects of radial distortion. It explains the calibration procedure, presents the results, evaluates calibration accuracy using reprojection errors, and concludes with suggestions for improvement and future experimentation.
Keywords: Calibration

1. Introduction
Camera calibration is a fundamental task in computer vision, aimed at determining both intrinsic (e.g., focal length, sensor offset) and extrinsic (e.g., camera position and orientation) parameters. This process typically involves capturing images of a calibration object (often a chessboard), identifying known 3D points and their 2D projections, and solving a system of equations.
Calibration enables applications such as 3D measurements, robotic navigation, and stereo imaging. Some camera parameters are fixed (intrinsic) and need calibration once, while others (extrinsic) may vary and require frequent recalibration. Calibration is especially crucial when using two cameras together (stereo vision), as both must be precisely aligned to extract depth.
Real-world cameras often introduce radial distortion, particularly with wide-angle or fish-eye lenses. Distortion can be either barrel-type (curving inward) or pincushion-type (curving outward), each affecting how light is bent and how images appear.
This report first reviews key calibration techniques, then walks through the Tsai calibration method using a real experiment and ends with validation and suggestions for further enhancement.
2. Previous work
Two pivotal methods in camera calibration are: Tsai [1] calibration and Zhang [2] calibration.
- Tsai’s method (1987): Made camera calibration more accessible by enabling use with standard consumer-grade cameras.
- Zhang’s method (2000): Introduced a more flexible, easy-to-use paper-based calibration target and helped shift calibration from lab setups to real-world environments.
Tsai’s method remains foundational in modern calibration workflows.
3. Camera Calibration Setup
The experiment follows the Tsai calibration steps (see Appendix Figure 6). World coordinates Xw, Yw and Zw are first transformed into camera coordinates Xcam, Ycam and Zcam using rotation and translation matrices. These coordinates are then projected onto the image plane, resulting in undistorted 2D coordinates Xu and Yu (still measured in millimeters). The homogeneous transform requires a division by the last component in the vector. In this case fXcam and fYcam are divided by Zcam. For simplicity, distortion is ignored in this process.
The calibration estimates both intrinsic and extrinsic parameters. Then, approximations to kappa 1 will be done. At the end, two types of errors will be found. The calibration then uses estimated parameters to analyze 2D and 3D errors in positioning. These errors help determine the calibration’s accuracy. First error analyses parameters that bring object coordinates into camera coordinates. 3DError analyses parameters that project world frame coordinates onto the image plane coordinates such as the focal length f, the translation parameters tx,ty,tz and others.
4. Camera Calibration Experiment
This section summarizes the six steps of Tsai’s calibration (without distortion correction):
- Step 1 – Convert image pixel coordinates into physical camera coordinates (mm).
- Step 2 – Estimate seven transformation parameters L1, L2, L3… up to L7.
- Steps 3, 4 – Determine scale factor, partial rotation, and translation parameters Ty, sx and first estimates of r11, r12, r13, r21, r22, r23, tx.
- Step 5 – Finalize rotation matrix with r31, r32, r33 and the final estimates of r11, r12, r13, r21, r22, r23, tx
- Step 6 – Calculate focal length and translation along Z.
The derivation of the formulas of all calibration steps can be found in Figures 7 and 8 in the Appendix.
4.1. Preliminary
A GoPro Hero5 Black was used (see Appendix Table 1). Focal length was recorded as 3.0 mm, sourced from image metadata. The camera’s narrow field-of-view (FOV) setting reported a 28 mm equivalent, which differs due to internal mode changes.
4.2. Step 1
Chessboard corners on a cube were used as reference points. Their coordinates in pixels were matched with physical positions (in mm). To map pixels to real-world distances, pixel spacing (dx, dy) was calculated based on sensor size and resolution.
Image center was defined at (2000, 1500) pixels, with the origin of the camera coordinate system placed there. A sample conversion showed correct mapping of pixels to millimeter positions.
Image points were recorded in pixels, using a coordinate system where the x-axis extends to the right and the y-axis extends downward. However, the overdetermined systems used in calibration steps 1 through 6 do not consider the transformations occurring after the projection from world coordinates to camera coordinates. As a result, pixel-based image points need to be manually converted into camera coordinates, using millimeters as the unit of measurement. Figure 1 illustrates the orientation of the coordinate axes and provides the transformation equations for converting between the two point types.

As illustrated in Figure 1, the values dx and dy represent the center-to-center spacing between adjacent sensor elements along the X and Y directions, respectively. These are calculated by dividing the physical dimensions of the camera sensor (in millimeters) by its resolution (in pixels) along each axis. For this experiment, dx equals 0.0015425 mm/pixel and dy equals 0.0015166 mm/pixel.
The coordinates Cx and Cy indicate the pixel location of the image center, which are 2000 and 1500 pixels, respectively, in this setup.
To verify the accuracy of the conversion, consider the pixel location (2466, 2625), which lies to the right and below the image center:
Pixel (2466, 2625) → Millimeters (-0.718805, -1.70625)
Because the resulting point falls in the lower-right quadrant of the camera coordinate system (with both x and y values negative), this confirms that the conversion was carried out correctly.
4.3. Step 2
This step involves calculating seven parameters, with their values listed in Table 5 of the Appendix. These parameters are essentially combinations of values used to convert world coordinates into undistorted camera coordinates.
A total of 30 calibration reference points were used, resulting in an overdetermined system of equations. To solve this system, the least squares method was applied, which required computing the inverse of the matrix MtM. The determinant of MtM was approximately 8.57 × 10²⁰ — a value significantly different from zero — indicating that a valid solution exists.
4.4. Steps 3, 4, 5 and 6
These steps resulted in the determination of both the camera’s extrinsic and intrinsic parameters. The computed values can be found in Tables 6 and 7 of the Appendix.
For instance, the calibrated uncertainty scale factor, sx, was calculated to be 1.024425. Since this value is typically close to 1 for most cameras, the result suggests that the calibration process accurately identified it.
4.5. Validation of results
Different validation steps were taken to confirm the accuracy of results:
- First, we check that the rotation matrix is an orthonormal matrix:
| R11 | R12 | R13 |
| R21 | R22 | R23 |
| R31 | R32 | R33 |
=
| 0.7237929659931259 | 0.011667201171291419 | -0.6899185595385168 |
| 0.004863973152674891 | 0.9997889347057825 | 0.019960656428251837 |
| 0.6900065164876475 | -0.017803145868804843 | 0.7235841728518912 |
We have that the cross product of the first two rows is [0.69000583 -0.01780313 0.72358345] which is equal to row 3 of the matrix.
Then, finding the length of each of the columns and rows leads to the following output:
| Mag1 | 0.9999999999999998 |
| Mag2 | 1.0 |
| Mag3 | 0.9999999999999999 |
| Mag4 | 1.000004454315767 |
| Mag5 | 1.0000154946530182 |
| Mag6 | 0.9999800507022648 |
| MagAvg | 0.9999999999451749 |
Where Mag 1,2,3 are the lengths of the row vectors and Mag 4,5,6 are the lengths of the column vectors. The average among all magnitudes was extremely close to one.
- Then we compute estimates of the pitch, yaw and tilt angles in degrees:
| Angle | Degrees |
| Yaw | 43.62366249575263 |
| Pitch | 1.5800896496011994 |
| Tilt | 0.9235009554590418 |
The formulas were as follows:
Yaw (radians) = asin(-r13)
Pitch (radians) = asin(r23 / cos(yaw))
Tilt (radians) = asin(r12 / cos(yaw))
For the yaw angle we observe that the camera and object orientations were roughly as follows:

Since the z-axis of the camera reference frame forms an approximate angle of 45 degrees with the z-axis of the object reference frame, the z-axis of the object must rotate 45 degrees counter-clockwise along its y-axis. Hence, the estimated value of the yaw angle (43.62 degrees) is consistent with the orientations of the camera and the object as seen in Figures 4 and 5 of the Appendix.
For the pitch angle the configuration was roughly as follows:

In class, we used three measurements to estimate the pitch angle:
- The height from the center of the camera to the ground/table: 92 mm
- The height from the object’s origin to the ground: 86 mm
- The horizontal distance between the camera and the object: 62 mm
Using these values, the pitch angle was calculated as arcsin((92 – 86) / 62) = 5.55 degrees. This result differs from the calibrated pitch angle of 1.58 degrees, but both values are relatively small and close to zero.
Because the camera was positioned higher than the object, the object’s z-axis had to rotate downward (relative to its x-axis) to align with the camera’s z-axis. According to the right-hand rule, this rotation corresponds to a positive pitch angle, which confirms that both the classroom estimate and the calibration result are directionally consistent.
The tilt angle was considered to be 0 degrees, as both the camera’s XZ plane and the cube’s XZ plane were aligned—resting flat on the table. The calibrated tilt angle was 0.9235 degrees, which is also close to zero, though minor discrepancies likely stem from distortion or measurement inaccuracies.
- We now look at the translation coefficients, which are measured in millimeters:
| Tx = -1.56296899 |
| Ty = -1.61224413 |
| Tz = 59.1900819 |
| Distance = 59.23266 |
The translation from the object’s reference frame to the camera’s reference frame occurs after their coordinate axes have been aligned—meaning the rotation is applied first. In our case, the camera is primarily offset from the object along the z-axis, which is why Tz is the dominant component.
The Tx and Ty values are small and negative, reflecting the minor shifts needed to align the object’s z-axis (represented by a blue dot) with the camera’s z-axis (red dot). We estimated Ty using the ratio between pixel distances on the image and the known size of the grid cells (1×1 cm). The calibrated value of Ty was –1.61 mm, which is close to our estimate of –1.92 mm. A similar approach can be used to estimate Tx.

The calibrated distance between the camera and the object was 59.23 mm, which closely matches our manually measured distance of 62 mm.
Lastly, the focal length f was calibrated to be approximately 5.387 mm. Compared to the actual focal length of 3.0 mm at the time the image was taken, this suggests that while the calibration captured a reasonable approximation, it wasn’t entirely precise.
5. Analysis
To estimate the kappa value, the process begins by projecting the world coordinates into undistorted camera coordinates (in millimeters) and then comparing them to the distorted camera coordinates (also in millimeters) derived from distorted pixel positions.
The relationship between distorted and undistorted coordinates, incorporating kappa₁, is shown in the image below.

The method for estimating kappa₁ is outlined in the accompanying Python script. It works by computing kappa₁ for each reference point pair and then averaging those values to produce an overall estimate.
In this experiment, the initial estimate of kappa₁ was:
| Kappa 1 (mm-2) | -0.016682349122165107 |
The negative sign of kappa₁ is expected. The GoPro Hero5 Black camera exhibits fisheye distortion, a type of lens distortion similar to barrel distortion, where a negative kappa₁ value pulls pixels toward the center of the image. In this context, the image center aligns with the origin of the camera coordinate system—not the top-left corner like in standard image coordinates, but shifted to the center with reversed axes.
To evaluate the accuracy of the calibration, two types of error were measured:
- Mean 2D Error:
The 2D error is the Euclidean distance between the measured distorted camera coordinate and the estimated one.
To estimate this, world coordinates in homogeneous form are multiplied by the relevant transformation and projection matrices. A list of the resulting 2D errors (in mm) can be found in Table 8 of the Appendix.
Here is the mean and standard deviation of the 2D errors:
| E2dMean (mm) | 0.02847 |
| E2dStd (mm) | 0.07782 |
With 30 reference points, the average 2D error is about 0.02847 mm. Given the pixel-to-mm ratios (dx = 0.0015425 mm/pixel, dy = 0.0015166 mm/pixel), this corresponds to roughly 18 pixels. On a 4000×3000 resolution image, an 18-pixel discrepancy is minimal and barely noticeable (e.g., shifting an image by 18 pixels in GIMP results in almost no visible change).
- Mean 3D Error:
The 3D error is the Euclidean distance between the actual world coordinate and the estimated world coordinate.
This is computed by tracing a ray through the camera’s reference frame and finding its intersection with the world’s X or Z plane. The full method is detailed in the Python script. The list of 3D errors is included in Table 9 of the Appendix.
Here is the mean and standard deviation of the 3D errors:
| E3dMean (mm) | 0.912371 |
| E3dStd (mm) | 1.90176646 |
With 30 reference points, the average 3D error is approximately 0.9 mm. Considering the grid cells used in the setup measure 10×10 mm, this error margin is relatively small and indicates a fairly accurate estimation.
6. Comparison with 236 data points
This section presents the results of the same calibration experiment, but this time using all detected chessboard points as input, and compares them to the earlier results based on just 30 reference points.
| 30 data points | 236 data points | ||
| E2dMean (mm) | 0.02847 | E2dMean (mm) | 0.22613 |
| E2dStd (mm) | 0.077824 | E2dStd (mm) | 0.1828 |
| E3dMean (mm) | 0.91237 | E3dMean (mm) | 37.77 |
| E3dStd (mm) | 1.90176 | E3dStd (mm) | 47.2239 |
As anticipated, the reprojection errors increased significantly when using all 236 data points. The average 2D error rose to 0.22613 mm, equivalent to about 149 pixels, and the average 3D error reached 37.77 mm, which spans approximately 3 to 4 grid cells.
This increase is expected because many of the marked points lie near the edges of the image, where radial distortion is strongest. These outer points introduce more error into the calibration process, affecting most parameters — though the focal length appears to have been improved.
The focal length f was estimated at 3.55 mm (see Figure 10 in the Appendix), which is significantly closer to the actual value of 3.0 mm (Table 1 in the Appendix) compared to the earlier estimate of 5.38 mm (Table 6). This suggests that using a larger number of points spread across the image improves the accuracy of certain parameters like focal length.
7. Improvements
This experiment does not take image distortion into account, which limits the accuracy of the estimated camera parameters. For future studies, it is recommended to determine the radial distortion parameters using non-linear optimization techniques, such as the Newton method, gradient descent, or similar approaches. Once these distortion parameters are known, they can be used to correct the images and obtain undistorted coordinates for more precise calibration.
Additional experiments are also suggested to assess whether the intrinsic camera parameters obtained here remain valid when applied to objects located farther from the camera. In this study, the object was positioned roughly 62 mm from the camera. It would be useful to explore whether calibrating with objects at greater distances leads to reduced error.
Moreover, further investigation is needed to understand how the camera’s placement relative to the object influences calibration accuracy. In this experiment, Tx and Ty were assumed to be close to zero. However, other studies (such as Eric’s report) suggest that the camera should not lie on the XZ plane, implying that Ty should not be near zero, and that placement could have a significant impact on calibration results.
8. Conclusion
This report demonstrated the implementation of the Tsai calibration method to determine both intrinsic and extrinsic camera parameters. The primary calibration used 30 reference points, while an extended dataset of 236 points was included to support error analysis. The evaluation was based on mean reprojection errors, and the results indicate that the calibration was successfully performed.
9. References
[10]R. Tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” Robotics and Automation, IEEE Journal of, vol. 3, no. 4, pp. 323-344, 1987
[11]Z. Zhang. “A flexible new technique for camera calibration.” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 11, pp. 1330-1334, 2000.
10. Appendix
Table 1 Camera parameters
| Make and Model | GoPro Hero5 Black |
| Sensor type | CMOS |
| Resolution | 4000×3000 pixels |
| Sensor size | 1/2.3’’ (6.17 x 4.55 mm) |
| Focal length multiplier | 5.64 |
| Distortion | Fish-eye |
| Focal length (photo) | 3.0 mm |
| Narrow FOV focal length | 28 mm |
Table 2 Environment parameters
| Origin of the object coordinate system (pix) | (2092, 1600) |
| Grid size (cm x cm) | 1×1 |
| Image center (pix) | (2000, 1500) |
| Camera to ground height (meters) | 0.092 |
| Object origin to ground height (meters) | 0.086 |
| Distance from camera to the object (m) | 0.062 |
Table 3 Calibration data points
| Xw mm | Yw mm | Zw mm | Xd pix | Yd pix |
| 0 | -20 | 10 | 2466 | 2625 |
| 0 | -10 | 10 | 2469 | 2118 |
| 0 | -10 | 20 | 2751 | 2043 |
| 0 | -10 | 30 | 2976 | 1980 |
| 0 | 0 | 10 | 2466 | 1572 |
| 0 | 0 | 20 | 2757 | 1557 |
| 0 | 0 | 30 | 2985 | 1548 |
| 0 | 0 | 40 | 3165 | 1527 |
| 0 | 0 | 50 | 3312 | 1524 |
| 0 | 10 | 10 | 2457 | 1029 |
| 0 | 10 | 20 | 2748 | 1071 |
| 0 | 10 | 30 | 2973 | 1107 |
| 0 | 20 | 10 | 2445 | 510 |
| 0 | -20 | 0 | 2104 | 2776 |
| 10 | -20 | 0 | 1708 | 2652 |
| 0 | -10 | 0 | 2096 | 2208 |
| 10 | -10 | 0 | 1696 | 2136 |
| 20 | -10 | 0 | 1364 | 2068 |
| 30 | -10 | 0 | 1100 | 2012 |
| 0 | 0 | 0 | 2092 | 1596 |
| 10 | 0 | 0 | 1690 | 1588 |
| 20 | 0 | 0 | 1096 | 1571 |
| 30 | 0 | 0 | 1092 | 1570 |
| 40 | 0 | 0 | 884 | 1560 |
| 0 | 10 | 0 | 2084 | 996 |
| 10 | 10 | 0 | 1684 | 1037 |
| 20 | 10 | 0 | 1355 | 1086 |
| 30 | 10 | 0 | 1094 | 1124 |
| 0 | 20 | 0 | 2080 | 392 |
| 10 | 20 | 0 | 1680 | 504 |
Table 4 Camera coordinates of the calibration data points (Xd mm, Yd mm)

Table 5 The calculated seven unknowns
| L1 | -0.45990048635890823 |
| L2 | -0.007413378887651565 |
| L3 | 0.43837657450074563 |
| L4 | 0.9931157539934758 |
| L5 | -0.0030168961818809164 |
| L6 | -0.6201225469638754 |
| L7 | -0.012380666232319315 |
Table 6 The calculated extrinsic and intrinsic parameters with 30 reference points
| R11 | 0.7237929659931259 |
| R12 | 0.011667201171291419 |
| R13 | -0.6899185595385168 |
| R21 | 0.004863973152674891 |
| R22 | 0.9997889347057825 |
| R23 | 0.019960656428251837 |
| R31 | 0.6900065164876475 |
| R32 | -0.017803145868804843 |
| R33 | 0.7235841728518912 |
| Tx (mm) | -1.562968986722217 |
| Ty (mm) | -1.6122441275531048 |
| Tz (mm) | 59.1900819439242 |
| F (mm) | 5.387079764033248 |
| Sx | 1.024425344302128 |
Table 7a. Evaluated metrics for 30 reference points
| K1 first estimate (mm-2) | -0.016682349122165107 |
| Determinant 1 (step 2) | 8.57040865828559e+20 |
| Determinant 2 (step 6) | 777.8818643908144 |
| Rotation magnitude avg | 0.9999999999451749 |
| Yaw angle (deg) | 43.62366249575263 |
| Pitch angle (deg) | 1.5800896496011994 |
| Tilt angle (deg) | 0.9235009554590418 |
| Distance (mm) | 59.232659941190775 |
| E2dMean (mm) | 0.028470959838563046 |
| E2dStd (mm) | 0.0778247536594827 |
| E3dMean (mm) | 0.912371035599039 |
| E3dStd (mm) | 1.9017664667890135 |
Table 7b. Evaluated metrics for 236 reference points
| K1 first estimate (mm-2) | 0.3864329251755696 |
| Determinant 1 (step 2) | 7.410289143106744e+35 |
| Determinant 2 (step 6) | 8271398.775090234 |
| Rotation magnitude avg | 0.9999984398244774 |
| Yaw angle (deg) | 44.481314909942604 |
| Pitch angle (deg) | 6.367712556665941 |
| Tilt angle (deg) | 4.469331695512939 |
| Distance (mm) | 31.758136709013204 |
| E2dMean (mm) | 0.2261378112461626 |
| E2dStd (mm) | 0.1828498237220668 |
| E3dMean (mm) | 37.770069752484446 |
| E3dStd (mm) | 47.22391182233289 |
12.1. Experimental Protocol
The Experimental Protocol outlines the procedure for conducting the experiment and specifies which parameters will be recorded or measured.
There are three categories of parameters to be measured: those related to the calibration object, the camera, and the environment, including the relative positioning of the camera with respect to the object.
Object:
It must be possible to determine the model coordinates of each point on the object from the image. Key parameters to define include:
- A reference point marking the origin of the model coordinate system
- The physical dimensions of the pattern present on the object
Camera:
The camera’s intrinsic parameters need to be documented to calculate the physical distance between pixels. These include:
- Camera make and model
- Megapixel resolution (pixel grid dimensions)
- Physical pixel size (in units of length)
Environment:
An approximate estimation of rotation angles and translation components of the transformation matrix is required, such as:
- Translation parameters (tx, ty, tz) or distance relative to the object origin
- Distance from the camera center to the ground or table
- Distance from the image center to the ground
While the placement of the camera origin relative to the object origin should not affect the calibration outcome, the camera should be positioned to minimize tilt and pitch angles to simplify the validation of calibration parameters.
Depending on the camera and whether the uncertainty scale factor (sx) is known beforehand, different types of calibration objects are required. If sx is known, a simple, single-view coplanar point set suffices. If sx is unknown, a non-coplanar calibration object made up of two or more orthogonal planes should be used.








Table 8 2D error Euclidean distances in mm

Table 9 3D Error Euclidean distances in mm

