David is an enthusiast about new technologies and systems with more than 11 years of experience in the IT sector specializing in Big Data and in designing and running Hybrid environments (On-premises & Cloud). Accomplished in architecting and implementing solutions for the Energy and Banking sectors where processing data on time is crucial. Talented in leading IT projects in healthcare involving clinical and molecular data, which requires especial attention to anonymization and ad hoc techniques. Recent project includes managing the Software release lifecycle as well as taking care of the multiple clusters on-premises and in private cloud.
-Support and manage the software release lifecycle of the bank for multiple projects.
–CI/CD pipeline technologies: Ansible, Jenkins, Kubernetes, Artifactory and Gitlab.
–Runbook creation and release data testing with Swagger API.
-ETL supervision and incidence resolution for different CD clusters.
–Spark cluster, Flink cluster, Kafka cluster and HDP/CDP cluster.
–Workflows of the ETL implemented with Autosys.
-Configuration and use of HDP/CDP platform with Hive, Tez and HDFS in on premises cluster.
-Administration and monitoring of Kafka cluster and its interaction with Flink cluster using Splunk.
-Scripting of streaming jobs in Scala using Flink and Kafka for internal client clusters with Kafka, IBM MQ, Cassandra DB and BI tools.
– Design and implementation of a Big Data system in AWS for processing daily electricity consumptions in Red Eléctrica de España. This involved AWS Glue, EMR, Lambda, Pyspark, Zeppelin, Redshift, Spectrum, SFTP and S3.
– Data model design and ETL for an electric marketer needing to perform analytical studies and forecasting. Used technologies are Cloudera CDH on top of AWS and Apache Airflow.
– ETL processes definition, implementation and internal tuition for the client. The project objective is to migrate data from SQL Server databases to Hadoop environment. Used technologies are Cloudera CDH on premises, Hive, Impala, Kudu, Spark Core and Spark SQL.
– Accountable for the governance on the company’s data.
• Clinical databases. (MySQL, SQLAlquemy, PDI Kettle)
• Raw data files, in the order of terabytes.
– IT providers selection and vendor relationship and communications, including status reporting and informal conversations.
– Management of a big data project for patients’ historical data retrieval and analysis.
• Technologies stack design for a big data project related to biological samples from donors. Used technolgies: GCP, OrientDB, GraphQL, NodeJS, React, BigQuery, R, Cloud SQL, Kubernetes, Helm and Docker.
• Project documentation, control and advisory.
– Accountable for IT procurement and administration.
• IT material including desktop PCs and servers. (Windows server, Active directory, ELK stack, LAMP stack, Jenkins)
• Cloud platforms, (GCP and AWS), budget control and general administration.
– General organizational consultancy for the startup. That included HHRR and business model.
– Funding research and client dealing process definition in a startup environment.
– CTO of the organization; which included online content platform management and IT responsible.
– Client campaigns organization and execution using Mailchimp, Bitly and Google Analytics.
– Negotiation at interdepartmental level of the integration and scope of new functionalities for the printer. This included dealing with customer experience, front end layer, SWQA and mechanical and electrical engineering groups.
– Design and implementation in C++ of new printer functionalities for latex technology printers with resistive printhead.
– Printer improvements detection in Latex 300 series printers.
– Design and results evaluation of performance experiments using Matplotlib, (Python), for pioneering processors targeted to HPC market.
– Research results reporting to IBM Thomas J. Watson Research Center.
– First accepted publication in the ROMOL group. http://adapt-workshop.org/2015/program.htm
– Development of real time mechanisms in C++ to speedup the IBM POWER7 microprocessor’s performance.
– Decision-making about used technologies in a European project aimed to massively process data for mobile computing.
– New computer architectures proposal to process streaming data coming from mobile computing, benefiting from mesh processor topology microarchitecture.
– Design and integration of a mobile application to represent in a digital map data about WIFI signal quality using Android SDK.
– Parallelization of machine learning algorithm kernels and their integration in a system.
– Creation and configuration of a server to store massive data coming from mobile devices using Java Storm, MongoDB and AWS EC2.
• IT and HPC responsible for a group specialized in computational mechanics. That included batch systems administration, computing libraries setting, Workstations and LAN administration.
• Modelling of solutions for numerical methods problems applying high performance computing (HPC) in C++. This involved using parallel libraries such as OpenMP, Pthreads, MPI, PETSc, etc.