#18 Paris Women in Machine Learning & Data Science: Knowledge Graph, Language Detection & Twitter

WiMLDS Paris
5 min readAug 1, 2019
Marina Vinyes, Morgane Dalbergue, Caroline Chavier, Natalie Cernecka, Sanghamitra Deb, Cécile Chailloux & Mathilde Kurzawa

Our 18th meetup at Heetch will be unforgettable. After a snowy ⛄ ️meetup in January, we selected the hottest day of the year 🌞 to organize our 4rth “Hors-série” meetup!

After Etienne Dancoisne, Head of Data Science & Analytics at Heetch (see his slides below ⬇️), kicked-off the meetup, our very own Natalie Cernecka shared precious information:

Kate Crawford, Co-Founder of the AI Institute, will be giving an unique lecture at ENS for the launch of the #AbeonaENSchair in #AI_Justice. Do not miss the opportunity to listen to her!

Register

💟 Several members of the WiMLDS Paris group will attend a summer camp organized by ESPCI to give young female students a taste of what a STEM career looks like (from June the 29th to August the 2nd 2019).

💌 You can follow us on LinkedIn, join our Slack channel or reach out to the WiMLDS Paris team via paris@wimlds.org

📺 The talk has been recorded and is available on our Youtube channel 📺

1️⃣ All the way from California, Sanghamitra Deb kick-started the evening with a talk about how to build a knowledge graph using weak supervision. After a Ph.D. in astrophysics, Sangha switched to Machine Learning and is now Data Scientist at Chegg, an edTech company.

Sangha presented a whole pipeline on how to build a knowledge graph. The top tips were:

  • Use weak supervision. Labeled data is very expensive to get. As an alternative use other noisy sources to generate your dataset. In NLP a useful technique is using rules and heuristics. Sangha highlighted the Python package Snorkel that can be used for weak supervision.
  • Use active learning to improve the accuracy of your model. For instance, if the classification probability of an example is below some threshold, send it to be labeled.
  • Play with the threshold. Only classifying as positive examples with a probability above some threshold allows you to tune the threshold. A high threshold will improve the percentage of true positives at the cost of coverage.
Sanghamitra Deb, Chief Data Scientist at Chegg

2️⃣ We continued the evening with Cécile Chailloux, who presented another NLP task: High-performance language detection of web pages. Cécile is a semantic engineer at Dashlane, a company that secures and manages your passwords.

The task consists in detecting the language of “tiny” webpages (only tens of words) in a fast and secure way. Because of privacy and performance, the solution needs to avoid dependencies (no APIs).

Their solution for recognizing language is based on n-grams. Cécile stretched out that “Adhoc solutions are worth it. Don’t be too ambitious and be good at your specific task.” To illustrate this point she gave specific examples:

  • Use your own dataset (and not Wikipedia) to reduce your scope. It improves your model on this scope, and it is faster to learn
  • Many pages are multilingual, which makes detecting the language much harder. After a quick analysis, they could see that most multilingual cases are original language + English. The chosen solution is to label as English. A pragmatic solution that worked.

Finally, Cécile talked about new challenges, like non-alphabetic language recognition that will need character recognition.

Cécile Chailloux, Semantic Engineer Dashlane

3️⃣ The evening concluded with a panel discussion moderated by our own Natalie Cernecka about Twitter and how to use it. Mathilde Kurzawa, Morgane Dalbergue, and our very own Caroline Chavier shared do’s and dont’s with the audience.

Thread about the panel by Betty Moreschini
  • Don’t get lost. Remember you can use lists to separate subjects. You can track subjects that are important to you, like AI conferences.
  • Brand yourself. Authenticity is important, also don’t forget to highlight other people’s expertise. If you want to grow your follower base visuals like emojis and gifs help 😊.
  • Don’t be afraid to interact. People you are following in your field are usually nice and willing to interact if you message them. Morgane shared more tips in French here.
  • About negativity: block and report. Don’t hesitate to support the person being abused.
Mathilde Kurzawa, Morgane Dalbergue & Caroline Chavier

To conclude, the WiMLDS Paris is wishing you an amazing summer 💛

We will get back on the 12th of September 2019 with an exceptional joint meetup with the DDD meetup at vpTech! Plus, we will organize a second meetup in September on the 26th! #StayTuned ➡️ here ⬅️

If you want to keep posted about our activities, you are welcome to

📑check our Google spreadsheet if you want to speak 📣, host 💙, help 🌠

🔗follow our Twitter account, Meetup page, Instagram or LinkedIn page

📍join our Slack channel for more discussions about machine learning, data science, and diversity in tech!

📩send an email to the Paris WiMLDS team to keep in touch >paris@wimlds.org

🎬 Follow our WiMLDS Paris Youtube channel

📸 WiMLDS has an Instagram account, a global LinkedIn page and a Facebook page!

🔥 Feel free to share your company or lab’s job positions for free on WiMLDS’ website.

A special thanks to Mathilde Kurzawa and the Heetch team for their warm welcome, Betty Moreschini for live-tweeting and Morgane Dalbergue for replacing Marie Langé on the fly for the panel discussion!

--

--

WiMLDS Paris

WiMLDS Paris is a community of women interested in Machine Learning & Data Science