[ad_1]
Quickly after OpenAI launched GPT-4o final Monday, some Chinese language audio system began to note that one thing appeared off about this latest model of the chatbot: the tokens it makes use of to parse textual content had been stuffed with spam and porn phrases.
People learn in phrases, however LLMs learn in tokens, that are distinct models in a sentence which have constant and important meanings. GPT-4o is meant to be higher than its predecessors at dealing with multi-language duties, and lots of the advances had been achieved by a brand new tokenization software that does a greater job compressing texts in non-English languages.
However, no less than in the case of the Chinese language language, the brand new tokenizer utilized by GPT-4o has launched a disproportionate variety of meaningless phrases—and consultants say that’s seemingly because of inadequate knowledge cleansing and filtering earlier than the tokenizer was skilled. If left unresolved, it may result in hallucinations, poor efficiency, and misuse. Learn the complete story.
—Zeyi Yang
Astronomers are enlisting AI to organize for an information downpour
In deserts throughout Australia and South Africa, astronomers are planting forests of metallic detectors that may collectively scour the cosmos for radio indicators. When it boots up in 5 years or so, the Sq. Kilometer Array Observatory will search for new details about the universe’s first stars and the totally different levels of galactic evolution.
However after synching lots of of 1000’s of dishes and antennas, astronomers will rapidly face a brand new problem: combing by some 300 petabytes of cosmological knowledge a yr—sufficient to fill 1,000,000 laptops. So in preparation for the knowledge deluge, astronomers are turning to AI for help. Learn the complete story.
[ad_2]