Migrate Alma's CRM data from Hubspot to Salesforce
- Lead the CRM Migration Project. Deployed the migration data pipelines in Pyspark to clean, transform, restructure, and migrate all CRM objects to Salesforce
- Objects: 4M Accounts, 2M Contacts, 150K Leads. ~400K Emails, Notes, Tasks, Events
- Built Hubspot and Salesforce APIs in Python. Used parquet for intermediate data storage
- Draft and deliver Data Strategy document for Prophesee, including one year detailed data roadmap
- Designed a new data architecture for Prophese’s Deep Learning models.
- POC new data storage format for all PSEE’s real world and simulation image data and metadata with storage in Parquet and access layer via Petastorm to Pytorch. Introduce guidelines to ensure and enforce data uniformity and structure.
- Re-Architected the blender image simulation pipeline with Airflow, and Kubernetes.
- Improved daily automation tasks for ML team including CI/CD, observability, alerting with DevOps team
Help advertisers to collect and activate their owned, paid and earned data through innovative solutions as DMP and data onboarding through all digital channels
- Managed the production audience processing and distribution engine of RTB/DMP, activating 100s of millions of daily crm and web users at main DSP platforms such as Xandr/Microsoft, Google
Display and Video, Facebook. Java/Hive/SQL codebase with tens of external API connectors.
- POC Kafka for navigation and CRM data consolidation project. Use Kafka connectors in production to sink data from Datahub to HDFS//Hive, write stream processors and various internal use cases.
- Oversaw multiple projects of constructing, testing, and maintaining client data pipelines: data ingestion, ELT/ETL, warehousing and analytics, with the goal of ensuring data integrity, accessibility, and performance. Closely collaborated with company stakeholders to address data related problems
and meet business needs.
Research project on identification of international bypass fraud in telecom networks by analyzing
CDR data within a social network framework.
- Leveraged GraphX component of Spark to represent CDR data as a graph data model
- Deployed a Selective Naive Bayes learning model based on features extracted from the graph, such as connected components, neighbor depth frequency, to classify between classes of real human usage and robot usage.
Work with public institutions in Kosovo and abroad to design software solutions in Health, Child
Protection, and Youth and Adolescence
- Led the implementation of the data warehouse and reporting tools to improve UNICEF MENA Life
Skills and Education programmes in Lebanon. (2015)
- In collaboration with the Ministry of Internal Affairs of Kosovo and UNICEF Innovations designed a
SMS Protocol and built an SMS Based Platform in Django/Python to help marginalized communities
report unregistered children using a simple phone and SMS.
- Implemented a Partner Cooperation Tracking tool for UNICEF Lebanon to track and monitor the
performance of implementing partner agreements
- Oversaw the implementation of efficient datalakes for Sofinco, LCL (Crédit Agricole brands), Allianz, Groupe Seb and other clients using Airflow, Pyspark, Hive.
- Coordinated subject prioritization and resource allocation with Data Science leads to productionalize machine learning pipelines (Spark ML, Airflow, PySpark). Implemented and deployed a cookie-email weighted scoring regression algorithm (MEW) processing 13 months of historical conversion (order) data with over 500M records.
- Worked closely with PMs and DevOps teams to design and implement part of Numberly’s Data Management Platform (DMP). Platform used by 30+ clients in banking, finance, retail, luxury
- Performed code reviews on pythonic data pipelines to ensure code standards are met
- Ensured GDPR compliance