Mistral releases ‘Pixtral 12B,’ its first multimodal AI model
French AI startup Mistral has released its first multimodal model, the Pixtral 12B, which can handle both text and images, according to Techcrunch. The model uses 12 billion parameters and is based on Mistral’s Nemo 12B text model. Pixtral 12B can answer questions about images via URLs or images encoded with base64 such as how many copies of a certain object are visible.
Most generative AI (genAI) models have been partially trained on copyrighted material, which has led to lawsuits from copyright owners. (AI companies claim that the tactic should be classified as fair use.)
It is unclear what image data Mistral used to develop the Pixtral 12B.
The multimodal model checks in at about 24 gigabytes, can be downloaded via Github and the Hugging Face machine learning platform, and can be used and modified under an Apache 2.0 license without restrictions.