4 Comments

Great post. I didn't expect the wheels to fall of the training compute risk threshold this quickly!

What do they mean by this: "We also do not want to make an unaligned chain of thought directly visible to users."? It sounds very much like "We don't want to show you the reasoning, because in some cases the reasoning will be spurious." But surely those are the exact cases in which a user should be able to audit the reasoning?

Expand full comment

I think there a mixture of reasons for hiding the chain of thought. One is that it might be more expensive to reveal the chain of thought as compute would need to be expended to produce the tokens (maybe, not totally sure on this). Another is for the safety reasons that they mention - I suppose the ‘unaligned chain of thought’ may reveal undesirable reasoning steps that OpenAI would not want others to use for training their own models. Linked to this is the third possible reason which is IP-related - maintaining competitive advantage by keeping the secret sauce secret, since everything else about these LLMs are pretty much replicable (at least for the big tech developers with enough resources, like training data, compute, talent etc).

But you’re right, the chain of thoughts maybe ought to be accessible to evaluate performance and safety by others, for now we are just relying on OpenAI’s internal testing and trusting that the model is carrying out the ‘correct’ reasoning steps…

Expand full comment

Yeah, the ability to see the chain-or-reasoning seems like it'd helpful in addressing explainability/auditability/human oversight concerns. If you can't see inside the black box, at least you can beak it down into smaller ones and see the connections between them, look at the imputs and outputs, and perhaps spot where things are going wrong.

Expand full comment

Yes definitely

Expand full comment