FocalPose overview. Given a single in-the-wild RGB input image \(I\) of a known object 3D model \(\mathcal{M}\), parameters \(\theta^k\) composed of focal length \(f^k\) and the object 6D pose (3D translation \(t^k\) and 3D rotation \(R^k\)) are iteratively updated using our render-and-compare approach. Rendering \(R\), together with the input image \(I\), are given to a deep neural network \(F\) that predicts update \(\Delta \theta_k\), which is then converted into parameter update \(\theta^{k+1}\) using a non-linear update rule \(U\).