Martin Cífka

I am a Ph.D. student in the Intelligent Machine Perception group with the Czech Institute of Informatics, Robotics and Cybernetics (CIIRC, CTU Prague) under the supervision of Josef Šivic. I obtained my M.Sc. degree in Visual Computing at Faculty of Mathematics and Physics, Charles University in Prague (MFF, CUNI) in 2024. My research interests include 6D pose estimation and object detection in an uncontrolled environment.

selected publications

6D Object Pose Tracking in Internet Videos for Robotic Manipulation

Georgy Ponimatkin, Martin Cífka, Tomas Soucek, and 4 more authors

In The Thirteenth International Conference on Learning Representations, 2025

Abs Bib PDF Website

We seek to extract a temporally consistent 6D pose trajectory of a manipulated object from an Internet instructional video. This is a challenging set-up for current 6D pose estimation methods due to uncontrolled capturing conditions, subtle but dynamic object motions, and the fact that the exact mesh of the manipulated object is not known. To address these challenges, we present the following contributions. First, we develop a new method that estimates the 6D pose of any object in the input image without prior knowledge of the object itself. The method proceeds by (i) retrieving a CAD model similar to the depicted object from a large-scale model database, (ii) 6D aligning the retrieved CAD model with the input image, and (iii) grounding the absolute scale of the object with respect to the scene. Second, we extract smooth 6D object trajectories from Internet videos by carefully tracking the detected objects across video frames. The extracted object trajectories are then retargeted via trajectory optimization into the configuration space of a robotic manipulator. Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories. We demonstrate significant improvements over existing state-of-the-art RGB 6D pose estimation methods. Finally, we show that the 6D object motion estimated from Internet videos can be transferred to a 7-axis robotic manipulator both in a virtual simulator as well as in a real world set-up. We also successfully apply our method to egocentric videos taken from the EPIC-KITCHENS dataset, demonstrating potential for Embodied AI applications.
@inproceedings{ponimatkin2025d, title = {{{6D}} {{Object}} {{Pose}} {{Tracking}} in {{Internet}} {{Videos}} for {{Robotic}} {{Manipulation}}}, author = {Ponimatkin, Georgy and C{\'i}fka, Martin and Soucek, Tomas and Fourmy, M{\'e}d{\'e}ric and Labb{\'e}, Yann and Petrik, Vladimir and Sivic, Josef}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, }
FocalPose++: Focal Length and Object Pose Estimation via Render and Compare

Martin Cífka, Georgy Ponimatkin, Yann Labbé, and 4 more authors

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Abs DOI Bib PDF Website

We introduce FocalPose++, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object. The contributions of this work are threefold. First, we derive a focal length update rule that extends an existing state-of-the-art render-and-compare 6D pose estimator to address the joint estimation task. Second, we investigate several different loss functions for jointly estimating the object pose and focal length. We find that a combination of direct focal length regression with a reprojection loss disentangling the contribution of translation, rotation, and focal length leads to improved results. Third, we explore the effect of different synthetic training data on the performance of our method. Specifically, we investigate different distributions used for sampling object’s 6D pose and camera’s focal length when rendering the synthetic images, and show that parametric distribution fitted on real training data works the best. We show results on three challenging benchmark datasets that depict known 3D models in uncontrolled settings. We demonstrate that our focal length and 6D pose estimates have lower error than the existing state-of-the-art methods.
@article{cifka2024focalpose++, title = {{F}ocal{P}ose++: {F}ocal {L}ength and {O}bject {P}ose {E}stimation via {R}ender and {C}ompare}, author = {C{\'i}fka, Martin and Ponimatkin, Georgy and Labb{\'e}, Yann and Russell, Bryan and Aubry, Mathieu and Petrik, Vladimir and Sivic, Josef}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, year = {2025}, volume = {47}, number = {2}, pages = {755-772}, doi = {10.1109/TPAMI.2024.3475638}, }