Nils Reimers on Cohere Embedding Models
Weaviate Podcast - A podcast by Weaviate
Categories:
Weaviate podcast #33. Thank you so much for watching the 33rd Weaviate Podcast! This episode features one of the heroes of Deep Learning for Search, Nils Reimers! Nils' work on SentenceBERT is one of the foundational works for applying Deep Representation Learning to text search. This is the idea that personally inspired me to work in this field. Having seen the successes of Contrastive Representation Learning for Computer Vision, I was mind-blown by the possibility of this for NLP and text search. In addition to the scientific foundation, the software development of the Sentence Transformers library and BEIR benchmarks has been enormously impactful! It was an honor getting to ask Nils the questions I have about these things, from the role of Data Quality to Intent, Sparse Vectors, Long Document Encoding, Distribution Shift, and many more. I really hope you enjoy the podcast! We are so excited about the Cohere Multilingual embedding model and can't wait to see what else comes out of Cohere and their amazing team! Cohere Multilingual ML Models with Weaviate: https://weaviate.io/blog/2022/12/Cohe... Nils Reimers: https://scholar.google.com/citations?... Mentioned in the podcast, Cross-Encoders: https://weaviate.io/blog/2022/08/Usin... How to choose a Sentence Transformer from HuggingFace: https://weaviate.io/blog/2022/10/How-... Chapters 0:00 Cohere X Weaviate 0:22 Welcome Nils Reimers! 1:18 Origin Story 3:15 Learning Text Embeddings 6:54 Positive and Negative Sampling in Contrastive Learning 13:32 1 Billion Pairs for Text Embedding Optimization 15:44 Impact of Data Quality 18:40 New Cohere Multilingual Model! 24:50 Challenge of Debugging Multilingual Models 28:30 Intent in Search 30:40 Thoughts on ColBERT 33:50 Sparse Vectors in Search 36:17 Long Documents and Multi-Discourse 43:40 Entity Parsing in Query Understanding 46:08 Unknown Words and Distribution Shift 50:07 Re-Vectorizing with Fine-Tuning 53:07 More on Search Interfaces and Intent in Search 55:15 Thank you Nils!