#36 Paris Women in Machine Learning & Data Science: joint meetup with Dakar
We loved meeting with chapters abroad last year during our European tour: with Poznan, Dresden and Milan among others. What about going a little bit further? On February 8th, we were glad to have a joint meetup with the Dakar chapter!
The ball started with a presentation of WiMLDS: did you know that there was more than 100 chapters in the world now? This is awesome 🔥
Our co-founder Chloé Azencott presented the Paris chapter. We have big news: after more than two years thinking about organizing a scikit-learn sprint, this is finally being real! It will happen on March 12, for a fully on-site event.
On Dakar’s side, Mariame Drame presented their chapter. It was recently founded in 2020: you wouldn’t tell they are so recent by how perfectly they handled the organization!
This meetup was truly global, as our speaker Azade Farshad is a Research Scientist at Technical University of Munich. She came to present her recent paper accepted to British Machine Vision Conference 2021, “Meta Image Generation from Scene Graphs”.
Scene Graphs are semantic graphs that describe a scene. The nodes are the objects (e.g. bike, tee-shirt, person), the edges represent the relationships between objects (e.g. geographical link, comparison, action), and the nodes and have attributes, which represent their properties (e.g. color). The goal is to create an image based on a scene graph.
Meta-learning is learning from several datasets at the same time. It’s a useful method for several situations, including when once has few training samples. Azade and her colleagues used this method to generate images. Although most of the time, the images generated wouldn’t fool anyone (as honestly showed by Azade), the results are still impressive!
Sokhar Samb, Data scientist at Theolex, took the next part of the event. She presented the challenge of doing Neural Machine Translation for low resource languages, taking English–Wolof as a use case.
There’s about 7,000 languages in the world, but only about 20 of them have the kind of corpus large enough to perform machine learning on! Wolof is one of these 6,980 languages with low resources, on Sokhar focused her talk.
To train her model for translation, Sokhar used the JW300 parallel corpus, a dataset that contains 17,945 sentences in both English and Wolof (and many other languages). This dataset is incredibly heterogeneous, as the sentences come from the Bible… and the GNOME user manual! And it’s a little bit on the small side to train an NLP model. Therefore, she used data augmentation techniques to extend her original dataset with synthetic data, and compared the results with a baseline trained on the original dataset only. And the results were better indeed! The perplexity score decreased a little, and the BLUE score increased by more than 4 points!
The meetup was recorded! You can find it here:
See you in March!
You can also:
📑 check our Google spreadsheet if you want to speak 📣, host 💙, help 🌠
📍join our Slack channel for more discussions about machine learning, data science, and diversity in tech!
📩 send an email to the Paris WiMLDS team to keep in touch >email@example.com
🔥 Share your company or lab’s job positions for free on WiMLDS’ website.