Abdesselam - Data Scientist NLP
Ref : 191013B001-
Domicile
91400 ORSAY
-
Profil
Data Scientist (37 ans)
-
MobilitéTotalement mobile
-
StatutEn cours d'immatriculation
-
Tarif Journalier MoyenVoir le tarif
-
Le Mans - Paris R&D at SNCF and LIUMJan 2017 - Jan 2017
{ Project
Technical environment: Scikit-learn, Python, TOM, NLTK, TreeTagger.
- Text classification of SNCF documents. The task consists of assigning categories to text according
to its content.The implemented system is based on LDA (Latent Dirichlet Allocation).
Participants: SNCF and LIUM (Le Mans University). -
Teaching & Research Assistant
Le Mans LannionJan 2015 - Jan 2017 -
R&D Engineer/PhD candidate (CIFRE contract)
Jan 2012 - Jan 2015{ Projects
- Topic segmentation of TV Broadcast News The taskt consists in splitting the document into thematically homogeneous fragments (i.e : i.e. talking about a single subjec).
Participants: Orange Labs and LIUM.
Technical environment: Perl, Java, Lia-tagg, Boilerpipe, Word2vec.
- Topic Titiling. The taskt consists in assigning a title to topic segments automatically extracted from
TV Broadcast News. The implemented system is based on the similarity computation between a
topic segment and newspaper articles in order to assign to the segment the title of the article that
maximizes a similarity.
Participants: Orange Labs and LIUM.
Technical environment: Perl, Java, Lia-tagg, Boilerpipe, Word2vec.
{ Miscellaneous
- Wrote 11 scientific papers: LREC 2018 (Japan), Interspeech 2017 (Sweden), ICASSP 2016 (China),
Interspeech 2015 (Germany), ICASSP 2015 (Australia), TALN 2015 (France), Interspeech 2014
(Singapore), ICASSP 2014 (Italy), JEP 2014 (France), SLSP 2013 (Spain), TALN 2013 (France) -
* Detecting emotions in textual conversations. The task consists of classifying a given textual
aujourd'huidialogue into one of four emotion classes : Angry, Happy, Sad and Others. The implemented
Technical environment: Keras, Python, NLTK, Gensim.
system is based on the combination of different deep neural networks techniques. In particular,
we use Recurrent Neural Networks (LSTM, B-LSTM, GRU, B-GRU), Convolutional Neural
Network (CNN) and Transfer Learning (TL) methods.
Participants: EPITA and ADAPT Centre (Ireland). -
* Dialect Identification. The goal of this task is to classify a given text into one of 26 classes,
aujourd'huicorresponding to various dialects of Arabic language. The implemented system based on
Technical environment: Keras, Python, NLTK, Gensim, Scikit-learn, twitterscraper.
Recurrent Neural Networks (BLSTM, BGRU) using hierarchical classification. We start with a
higher level of classification (8 classes) and then the finer-grained classification (26 classes).
Participants: EPITA and ADAPT Centre (Ireland). -
* Sentiment Analysis.
aujourd'huiThe goal of this task task is to classify a given tweet into one of seven
Technical environment: Keras, Python, NLTK, Gensim.
classes, corresponding to various levels of positive and negative sentiment intensity, that best
represents the mental state of the tweeter. -
Chatbots (In progress)aujourd'hui
. The goal of this task is to learn the machine to interact with users (i.e
Technical environment: Keras, Python, NLTK, Gensim, numpy.
to simulate human conversation). Our baseline system based on seq2seq. Currently, we are
applying the attention mechanism.
Participants: EPITA and ADAPT Centre (Ireland).
Miscellaneous
{ Giving computer science courses for Bachelor and Master degree classes in: Deep Learning for
Natural Language Processing, Statistical Machine Learning.
{ Interns Mentored: Gael de Francony, Victor Guichard, Antoine Sainson, Hugo Linsenmair,
Alexandre Majed, Xavier Cadet (students ING1, 3 months).
{ Wrote scientific papers: Semeval 2019 (USA), WANLP 2019 (Italy), Semeval 2018 (USA).
-
from the French CNU 27 (informatics) Qualification
2017 -
Ph.D. in computer science, France
University of Maine2016 -
Master in Systemes Intelligents, ranked 2nd
University Paris Dauphine2012
English French Arabic
- Intermediate - Bilingual - Bilingual
Computer skills
Tools: Keras, Tensorflow, Pytorch, Multiboost, Scikit-learn, NLTK, Gensim, Lia-tagg, Lemur,
Lucene, Boilerpipe, JusText, Docker, Spacy.
Programming: Python, Java, C, Perl, Bash, HTML, PHP, JavaScript, XML.