BestMoment Annotation Example using our pipeline and GPT4o API, which is the best in action retrieval.
This dataset aims to address the issues of high cost (¥0.03/character) in manual annotation and low accuracy (accuracy manual:GPT:ours=95%:70%:95%) in GPT4o annotation for pose description. It is divided into two versions: (a) single-frame pose description and (b) dual-frame pose change description, used for training text-to-frame models and image editing models.