Generate 5 thoughts, prune 3, branch, repeat. I think that’s what o1 pro and o3 do

  • artificialfish@programming.devOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 小时前

    All theoretical, but I would cut the decoder off a very smart chat model, then fine tune the encoder to provide a score on the rationality test dataset under CoT prompting.