How will we know if AI is smart enough to do science?

New tests gauge whether large language models can use their deep troves of knowledge to actually make discoveries