Estimating Camera Parameters Using Tsai’s Method: A Practical Experiment

Table of Contents

Abstract

This report outlines a practical implementation of the Tsai camera calibration technique, excluding the effects of radial distortion. It explains the calibration procedure, presents the results, evaluates calibration accuracy using reprojection errors, and concludes with suggestions for improvement and future experimentation.

Keywords: Calibration

1. Introduction

Camera calibration is a fundamental task in computer vision, aimed at determining both intrinsic (e.g., focal length, sensor offset) and extrinsic (e.g., camera position and orientation) parameters. This process typically involves capturing images of a calibration object (often a chessboard), identifying known 3D points and their 2D projections, and solving a system of equations.

Calibration enables applications such as 3D measurements, robotic navigation, and stereo imaging. Some camera parameters are fixed (intrinsic) and need calibration once, while others (extrinsic) may vary and require frequent recalibration. Calibration is especially crucial when using two cameras together (stereo vision), as both must be precisely aligned to extract depth.

Real-world cameras often introduce radial distortion, particularly with wide-angle or fish-eye lenses. Distortion can be either barrel-type (curving inward) or pincushion-type (curving outward), each affecting how light is bent and how images appear.

This report first reviews key calibration techniques, then walks through the Tsai calibration method using a real experiment and ends with validation and suggestions for further enhancement.

2. Previous work

Two pivotal methods in camera calibration are: Tsai [1] calibration and Zhang [2] calibration.

Tsai’s method (1987): Made camera calibration more accessible by enabling use with standard consumer-grade cameras.

Zhang’s method (2000): Introduced a more flexible, easy-to-use paper-based calibration target and helped shift calibration from lab setups to real-world environments.

Tsai’s method remains foundational in modern calibration workflows.

3. Camera Calibration Setup

The experiment follows the Tsai calibration steps (see Appendix Figure 6). World coordinates Xw, Yw and Zw are first transformed into camera coordinates Xcam, Ycam and Zcam using rotation and translation matrices. These coordinates are then projected onto the image plane, resulting in undistorted 2D coordinates Xu and Yu (still measured in millimeters). The homogeneous transform requires a division by the last component in the vector. In this case fXcam and fYcam are divided by Zcam. For simplicity, distortion is ignored in this process.

The calibration estimates both intrinsic and extrinsic parameters. Then, approximations to kappa 1 will be done. At the end, two types of errors will be found. The calibration then uses estimated parameters to analyze 2D and 3D errors in positioning. These errors help determine the calibration’s accuracy. First error analyses parameters that bring object coordinates into camera coordinates. 3DError analyses parameters that project world frame coordinates onto the image plane coordinates such as the focal length f, the translation parameters tx,ty,tz and others.

4. Camera Calibration Experiment

This section summarizes the six steps of Tsai’s calibration (without distortion correction):

Step 1 – Convert image pixel coordinates into physical camera coordinates (mm).
Step 2 – Estimate seven transformation parameters L1, L2, L3… up to L7.
Steps 3, 4 – Determine scale factor, partial rotation, and translation parameters Ty, sx and first estimates of r11, r12, r13, r21, r22, r23, tx.
Step 5 – Finalize rotation matrix with r31, r32, r33 and the final estimates of r11, r12, r13, r21, r22, r23, tx
Step 6 – Calculate focal length and translation along Z.

The derivation of the formulas of all calibration steps can be found in Figures 7 and 8 in the Appendix.

4.1. Preliminary

A GoPro Hero5 Black was used (see Appendix Table 1). Focal length was recorded as 3.0 mm, sourced from image metadata. The camera’s narrow field-of-view (FOV) setting reported a 28 mm equivalent, which differs due to internal mode changes.

4.2. Step 1

Chessboard corners on a cube were used as reference points. Their coordinates in pixels were matched with physical positions (in mm). To map pixels to real-world distances, pixel spacing (dx, dy) was calculated based on sensor size and resolution.

Image center was defined at (2000, 1500) pixels, with the origin of the camera coordinate system placed there. A sample conversion showed correct mapping of pixels to millimeter positions.

Image points were recorded in pixels, using a coordinate system where the x-axis extends to the right and the y-axis extends downward. However, the overdetermined systems used in calibration steps 1 through 6 do not consider the transformations occurring after the projection from world coordinates to camera coordinates. As a result, pixel-based image points need to be manually converted into camera coordinates, using millimeters as the unit of measurement. Figure 1 illustrates the orientation of the coordinate axes and provides the transformation equations for converting between the two point types.

Figure 1 Image coordinate system to Camera coordinate system setup

As illustrated in Figure 1, the values dx and dy represent the center-to-center spacing between adjacent sensor elements along the X and Y directions, respectively. These are calculated by dividing the physical dimensions of the camera sensor (in millimeters) by its resolution (in pixels) along each axis. For this experiment, dx equals 0.0015425 mm/pixel and dy equals 0.0015166 mm/pixel.

The coordinates Cx and Cy indicate the pixel location of the image center, which are 2000 and 1500 pixels, respectively, in this setup.

To verify the accuracy of the conversion, consider the pixel location (2466, 2625), which lies to the right and below the image center:

Pixel (2466, 2625) → Millimeters (-0.718805, -1.70625)

Because the resulting point falls in the lower-right quadrant of the camera coordinate system (with both x and y values negative), this confirms that the conversion was carried out correctly.

4.3. Step 2

This step involves calculating seven parameters, with their values listed in Table 5 of the Appendix. These parameters are essentially combinations of values used to convert world coordinates into undistorted camera coordinates.

A total of 30 calibration reference points were used, resulting in an overdetermined system of equations. To solve this system, the least squares method was applied, which required computing the inverse of the matrix MtM. The determinant of MtM was approximately 8.57 × 10²⁰ — a value significantly different from zero — indicating that a valid solution exists.

4.4. Steps 3, 4, 5 and 6

These steps resulted in the determination of both the camera’s extrinsic and intrinsic parameters. The computed values can be found in Tables 6 and 7 of the Appendix.

For instance, the calibrated uncertainty scale factor, sx, was calculated to be 1.024425. Since this value is typically close to 1 for most cameras, the result suggests that the calibration process accurately identified it.

4.5. Validation of results

Different validation steps were taken to confirm the accuracy of results:

First, we check that the rotation matrix is an orthonormal matrix:

R11	R12	R13
R21	R22	R23
R31	R32	R33

0.7237929659931259	0.011667201171291419	-0.6899185595385168
0.004863973152674891	0.9997889347057825	0.019960656428251837
0.6900065164876475	-0.017803145868804843	0.7235841728518912

We have that the cross product of the first two rows is [0.69000583 -0.01780313 0.72358345] which is equal to row 3 of the matrix.

Then, finding the length of each of the columns and rows leads to the following output:

Mag1	0.9999999999999998
Mag2	1.0
Mag3	0.9999999999999999
Mag4	1.000004454315767
Mag5	1.0000154946530182
Mag6	0.9999800507022648
MagAvg	0.9999999999451749

Where Mag 1,2,3 are the lengths of the row vectors and Mag 4,5,6 are the lengths of the column vectors. The average among all magnitudes was extremely close to one.

Then we compute estimates of the pitch, yaw and tilt angles in degrees:

Angle	Degrees
Yaw	43.62366249575263
Pitch	1.5800896496011994
Tilt	0.9235009554590418

The formulas were as follows:

Yaw (radians) = asin(-r13)

Pitch (radians) = asin(r23 / cos(yaw))

Tilt (radians) = asin(r12 / cos(yaw))

For the yaw angle we observe that the camera and object orientations were roughly as follows:

Since the z-axis of the camera reference frame forms an approximate angle of 45 degrees with the z-axis of the object reference frame, the z-axis of the object must rotate 45 degrees counter-clockwise along its y-axis. Hence, the estimated value of the yaw angle (43.62 degrees) is consistent with the orientations of the camera and the object as seen in Figures 4 and 5 of the Appendix.

For the pitch angle the configuration was roughly as follows:

In class, we used three measurements to estimate the pitch angle:

The height from the center of the camera to the ground/table: 92 mm
The height from the object’s origin to the ground: 86 mm
The horizontal distance between the camera and the object: 62 mm

Using these values, the pitch angle was calculated as arcsin((92 – 86) / 62) = 5.55 degrees. This result differs from the calibrated pitch angle of 1.58 degrees, but both values are relatively small and close to zero.

Because the camera was positioned higher than the object, the object’s z-axis had to rotate downward (relative to its x-axis) to align with the camera’s z-axis. According to the right-hand rule, this rotation corresponds to a positive pitch angle, which confirms that both the classroom estimate and the calibration result are directionally consistent.

The tilt angle was considered to be 0 degrees, as both the camera’s XZ plane and the cube’s XZ plane were aligned—resting flat on the table. The calibrated tilt angle was 0.9235 degrees, which is also close to zero, though minor discrepancies likely stem from distortion or measurement inaccuracies.

We now look at the translation coefficients, which are measured in millimeters:

Tx = -1.56296899

Ty = -1.61224413

Tz = 59.1900819

Distance = 59.23266

The translation from the object’s reference frame to the camera’s reference frame occurs after their coordinate axes have been aligned—meaning the rotation is applied first. In our case, the camera is primarily offset from the object along the z-axis, which is why Tz is the dominant component.

The Tx and Ty values are small and negative, reflecting the minor shifts needed to align the object’s z-axis (represented by a blue dot) with the camera’s z-axis (red dot). We estimated Ty using the ratio between pixel distances on the image and the known size of the grid cells (1×1 cm). The calibrated value of Ty was –1.61 mm, which is close to our estimate of –1.92 mm. A similar approach can be used to estimate Tx.

The calibrated distance between the camera and the object was 59.23 mm, which closely matches our manually measured distance of 62 mm.

Lastly, the focal length f was calibrated to be approximately 5.387 mm. Compared to the actual focal length of 3.0 mm at the time the image was taken, this suggests that while the calibration captured a reasonable approximation, it wasn’t entirely precise.

5. Analysis

To estimate the kappa value, the process begins by projecting the world coordinates into undistorted camera coordinates (in millimeters) and then comparing them to the distorted camera coordinates (also in millimeters) derived from distorted pixel positions.

The relationship between distorted and undistorted coordinates, incorporating kappa₁, is shown in the image below.

The method for estimating kappa₁ is outlined in the accompanying Python script. It works by computing kappa₁ for each reference point pair and then averaging those values to produce an overall estimate.

In this experiment, the initial estimate of kappa₁ was:

Kappa 1 (mm-2)

-0.016682349122165107

The negative sign of kappa₁ is expected. The GoPro Hero5 Black camera exhibits fisheye distortion, a type of lens distortion similar to barrel distortion, where a negative kappa₁ value pulls pixels toward the center of the image. In this context, the image center aligns with the origin of the camera coordinate system—not the top-left corner like in standard image coordinates, but shifted to the center with reversed axes.

To evaluate the accuracy of the calibration, two types of error were measured:

Mean 2D Error:

The 2D error is the Euclidean distance between the measured distorted camera coordinate and the estimated one.

To estimate this, world coordinates in homogeneous form are multiplied by the relevant transformation and projection matrices. A list of the resulting 2D errors (in mm) can be found in Table 8 of the Appendix.

Here is the mean and standard deviation of the 2D errors:

E2dMean (mm)	0.02847
E2dStd (mm)	0.07782

With 30 reference points, the average 2D error is about 0.02847 mm. Given the pixel-to-mm ratios (dx = 0.0015425 mm/pixel, dy = 0.0015166 mm/pixel), this corresponds to roughly 18 pixels. On a 4000×3000 resolution image, an 18-pixel discrepancy is minimal and barely noticeable (e.g., shifting an image by 18 pixels in GIMP results in almost no visible change).

Mean 3D Error:

The 3D error is the Euclidean distance between the actual world coordinate and the estimated world coordinate.

This is computed by tracing a ray through the camera’s reference frame and finding its intersection with the world’s X or Z plane. The full method is detailed in the Python script. The list of 3D errors is included in Table 9 of the Appendix.

Here is the mean and standard deviation of the 3D errors:

E3dMean (mm)	0.912371
E3dStd (mm)	1.90176646

With 30 reference points, the average 3D error is approximately 0.9 mm. Considering the grid cells used in the setup measure 10×10 mm, this error margin is relatively small and indicates a fairly accurate estimation.

6. Comparison with 236 data points

This section presents the results of the same calibration experiment, but this time using all detected chessboard points as input, and compares them to the earlier results based on just 30 reference points.

30 data points		236 data points
E2dMean (mm)	0.02847	E2dMean (mm)	0.22613
E2dStd (mm)	0.077824	E2dStd (mm)	0.1828
E3dMean (mm)	0.91237	E3dMean (mm)	37.77
E3dStd (mm)	1.90176	E3dStd (mm)	47.2239

As anticipated, the reprojection errors increased significantly when using all 236 data points. The average 2D error rose to 0.22613 mm, equivalent to about 149 pixels, and the average 3D error reached 37.77 mm, which spans approximately 3 to 4 grid cells.

This increase is expected because many of the marked points lie near the edges of the image, where radial distortion is strongest. These outer points introduce more error into the calibration process, affecting most parameters — though the focal length appears to have been improved.

The focal length f was estimated at 3.55 mm (see Figure 10 in the Appendix), which is significantly closer to the actual value of 3.0 mm (Table 1 in the Appendix) compared to the earlier estimate of 5.38 mm (Table 6). This suggests that using a larger number of points spread across the image improves the accuracy of certain parameters like focal length.

7. Improvements

This experiment does not take image distortion into account, which limits the accuracy of the estimated camera parameters. For future studies, it is recommended to determine the radial distortion parameters using non-linear optimization techniques, such as the Newton method, gradient descent, or similar approaches. Once these distortion parameters are known, they can be used to correct the images and obtain undistorted coordinates for more precise calibration.

Additional experiments are also suggested to assess whether the intrinsic camera parameters obtained here remain valid when applied to objects located farther from the camera. In this study, the object was positioned roughly 62 mm from the camera. It would be useful to explore whether calibrating with objects at greater distances leads to reduced error.

Moreover, further investigation is needed to understand how the camera’s placement relative to the object influences calibration accuracy. In this experiment, Tx and Ty were assumed to be close to zero. However, other studies (such as Eric’s report) suggest that the camera should not lie on the XZ plane, implying that Ty should not be near zero, and that placement could have a significant impact on calibration results.

8. Conclusion

This report demonstrated the implementation of the Tsai calibration method to determine both intrinsic and extrinsic camera parameters. The primary calibration used 30 reference points, while an extended dataset of 236 points was included to support error analysis. The evaluation was based on mean reprojection errors, and the results indicate that the calibration was successfully performed.

9. References

[10]R. Tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” Robotics and Automation, IEEE Journal of, vol. 3, no. 4, pp. 323-344, 1987

[11]Z. Zhang. “A flexible new technique for camera calibration.” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 11, pp. 1330-1334, 2000.

10. Appendix

Table 1 Camera parameters

Make and Model	GoPro Hero5 Black
Sensor type	CMOS
Resolution	4000×3000 pixels
Sensor size	1/2.3’’ (6.17 x 4.55 mm)
Focal length multiplier	5.64
Distortion	Fish-eye
Focal length (photo)	3.0 mm
Narrow FOV focal length	28 mm

Table 2 Environment parameters

Origin of the object coordinate system (pix)	(2092, 1600)
Grid size (cm x cm)	1×1
Image center (pix)	(2000, 1500)
Camera to ground height (meters)	0.092
Object origin to ground height (meters)	0.086
Distance from camera to the object (m)	0.062

Table 3 Calibration data points

Xw mm	Yw mm	Zw mm	Xd pix	Yd pix
0	-20	10	2466	2625
0	-10	10	2469	2118
0	-10	20	2751	2043
0	-10	30	2976	1980
0	0	10	2466	1572
0	0	20	2757	1557
0	0	30	2985	1548
0	0	40	3165	1527
0	0	50	3312	1524
0	10	10	2457	1029
0	10	20	2748	1071
0	10	30	2973	1107
0	20	10	2445	510
0	-20	0	2104	2776
10	-20	0	1708	2652
0	-10	0	2096	2208
10	-10	0	1696	2136
20	-10	0	1364	2068
30	-10	0	1100	2012
0	0	0	2092	1596
10	0	0	1690	1588
20	0	0	1096	1571
30	0	0	1092	1570
40	0	0	884	1560
0	10	0	2084	996
10	10	0	1684	1037
20	10	0	1355	1086
30	10	0	1094	1124
0	20	0	2080	392
10	20	0	1680	504

Table 4 Camera coordinates of the calibration data points (Xd mm, Yd mm)

Table 5 The calculated seven unknowns

L1	-0.45990048635890823
L2	-0.007413378887651565
L3	0.43837657450074563
L4	0.9931157539934758
L5	-0.0030168961818809164
L6	-0.6201225469638754
L7	-0.012380666232319315

Table 6 The calculated extrinsic and intrinsic parameters with 30 reference points

R11	0.7237929659931259
R12	0.011667201171291419
R13	-0.6899185595385168
R21	0.004863973152674891
R22	0.9997889347057825
R23	0.019960656428251837
R31	0.6900065164876475
R32	-0.017803145868804843
R33	0.7235841728518912
Tx (mm)	-1.562968986722217
Ty (mm)	-1.6122441275531048
Tz (mm)	59.1900819439242
F (mm)	5.387079764033248
Sx	1.024425344302128

Table 7a. Evaluated metrics for 30 reference points

K1 first estimate (mm-2)	-0.016682349122165107
Determinant 1 (step 2)	8.57040865828559e+20
Determinant 2 (step 6)	777.8818643908144
Rotation magnitude avg	0.9999999999451749
Yaw angle (deg)	43.62366249575263
Pitch angle (deg)	1.5800896496011994
Tilt angle (deg)	0.9235009554590418
Distance (mm)	59.232659941190775
E2dMean (mm)	0.028470959838563046
E2dStd (mm)	0.0778247536594827
E3dMean (mm)	0.912371035599039
E3dStd (mm)	1.9017664667890135

Table 7b. Evaluated metrics for 236 reference points

K1 first estimate (mm-2)	0.3864329251755696
Determinant 1 (step 2)	7.410289143106744e+35
Determinant 2 (step 6)	8271398.775090234
Rotation magnitude avg	0.9999984398244774
Yaw angle (deg)	44.481314909942604
Pitch angle (deg)	6.367712556665941
Tilt angle (deg)	4.469331695512939
Distance (mm)	31.758136709013204
E2dMean (mm)	0.2261378112461626
E2dStd (mm)	0.1828498237220668
E3dMean (mm)	37.770069752484446
E3dStd (mm)	47.22391182233289

12.1. Experimental Protocol

The Experimental Protocol outlines the procedure for conducting the experiment and specifies which parameters will be recorded or measured.

There are three categories of parameters to be measured: those related to the calibration object, the camera, and the environment, including the relative positioning of the camera with respect to the object.

Object:
It must be possible to determine the model coordinates of each point on the object from the image. Key parameters to define include:

A reference point marking the origin of the model coordinate system
The physical dimensions of the pattern present on the object

Camera:
The camera’s intrinsic parameters need to be documented to calculate the physical distance between pixels. These include:

Camera make and model
Megapixel resolution (pixel grid dimensions)
Physical pixel size (in units of length)

Environment:
An approximate estimation of rotation angles and translation components of the transformation matrix is required, such as:

Translation parameters (tx, ty, tz) or distance relative to the object origin
Distance from the camera center to the ground or table
Distance from the image center to the ground

While the placement of the camera origin relative to the object origin should not affect the calibration outcome, the camera should be positioned to minimize tilt and pitch angles to simplify the validation of calibration parameters.

Depending on the camera and whether the uncertainty scale factor (sx) is known beforehand, different types of calibration objects are required. If sx is known, a simple, single-view coplanar point set suffices. If sx is unknown, a non-coplanar calibration object made up of two or more orthogonal planes should be used.