Mukesh Arambakam

Senior Machine Learning Engineer and Software Developer

with with 7+ years of Software Development experience, including 5 years specializing in Natural Language Processing, Neural Machine Translation, and Generative AI, backed by a Master's in Data Science.

Get In Touch View CV

Core Technologies

Python Machine Learning NLP Generative AI FastMCP PyTorch Prompt Engineering HuggingFace LangChain LangGraph

Curriculum Vitae

Download CV

Professional Summary

Senior Machine Learning Engineer with with 7+ years of Software Development experience, including 5 years specializing in Natural Language Processing, Neural Machine Translation, and Generative AI, backed by a Master's in Data Science. Skilled in designing and delivering end-to-end ML solutions, including Data Preprocessing, Model Training, Evaluation and Optimization. Experience in fine-tuning LLMs with PEFT methods, building Agentic AI systems, and developing scalable, production-ready ML applications. Strong background in Machine Learning, Artificial Intelligence, Statistical Modeling, and Data Analytics.

Professional Experience

Senior Machine Learning Developer

Oracle Corporation • Dublin, Ireland

Nov. 2020 - Present

Launched an agentic AI system with LangGraph, supporting 10k+ LLM calls to Translation & LangDetect MCP servers.
Implemented 2 RAG systems with vector databases for domain-specific data extraction and translation enhancement.
Engineered 5+ prompt engineering solutions for back-translation, domain classification, post-editing, and data generation with sophisticated tool calling.
Used prompt chaining techniques for multi-step LLM workflows including translation evaluation, iterative improvement, pivot-translation and post-editing.
Developed a dataset analysis framework from scratch for MT training data statistics, working across 40+ language pairs.
Built a monolingual data generator for low-resource languages increasing synthetic data coverage by 30-40%.
Designed a cross-lingual vocabulary gap detection tool leveraging n-gram and embeddings, improving coverage by 20%.
Enhanced data-cleaning by reducing noise by 10% using alignment models, length ratios, and toxicity checks.
Used in-context learning methods: zero-shot and few-shot learning for domain classification & data categorization tasks.
Updated the Training pipeline to generate word alignment, length ratio models for data cleaning and evaluations.
Developed a Monolingual Engine Training pipeline and de-prioritized the tagged monolingual data during training.
Fine-tuned translation models using PEFT techniques for efficient adaptation to domain-specific tasks.
Applied quantization strategies (QLoRA) for LLM deployment, cutting resource usage by 40% in production.
Re-engineered translation evaluation repository with modern architectures, boosting performance by 25%.
Designed custom translation quality rubrics with linguists and applied LLM-as-a-Judge via direct assessment.
Integrated G-Eval metric and LLM-as-a-Judge framework for comprehensive LLM-based translation assessment.
Created visualization for evaluation scores using custom-built tools.
Expanded evaluation framework with 5 new metrics: Comet, XComet, BERTScore, CometQE, and length-ratio checks.
Assessed LLM translation consistency with Translate-then-Evaluate and LLM-as-a-Judge, raising accuracy by 20%.
Expanded language detection coverage from 45 to 192 languages.
Introduced a 3-stage hierarchical language detection architecture to differentiate sub-cultural languages.
Improved language detection accuracy by 90% for very short strings (<12 characters).
Refactored the codebase using robust system design patterns and FastAPI to significantly boost performance by 3x times.
Packaged and published ONNX models with architecture-aware configurations for cross-compatibility.
Used architecture-specific optimizations and model quantization to reduce memory footprint by 50%.

Applications Engineer

Oracle Corporation • Bangalore, India

Jun. 2019 - Aug. 2019

Coordinated with multiple global teams, business analysts, and stakeholders to develop a cross-platform application.
Lead developer in overhauling the project security flow of the Product and introduced 2 new features.
Implemented a micro-service which communicates between tenants in a Multi-Tenant Architecture.
Mentored 10 junior team members, and provided insights about the product's internal working and underlying tools.

Applications Developer

Oracle Corporation • Bangalore, India

Jul. 2017 - May. 2019

Certificate of Appreciation: For outstanding contribution to the Product's Development.
Assumed a pivotal role in the development of 3 core features of the product along with my team members.
Optimized the performance of an internal module's filter framework with the use of dynamic SQL queries by 30%.
Improved the performance of search framework by 20%, using query optimization techniques.

Education

Master of Science in Data Science

Trinity College of Dublin • Dublin, Ireland

2019 - 2020

Specialization in Data Science, Machine Learning and Artificial Intelligence

B.Tech in Computer Science and Engineering

Amrita School of Engineering • Coimbatore, India

2019 - 2020

Specialization in Data Structures, Algorithms, and System Design

Technical Skills

Programming Languages

Python Java R JavaScript C++ SQL PL/SQL

Machine Learning Libraries

HuggingFace PyTorch NLTK spaCy transformers OpenNMT scikit-learn TensorFlow fasttext

Other Python Libraries

Numpy SciPy Pandas FastAPI matplotlib elasticsearch fastalign

Generative AI & LLM Technologies

LangChain LangGraph FastMCP Agentic AI Tool Calling Prompt Engineering RAG LLM-as-a-Judge

LLM Fine-Tuning & Optimization

PEFT LoRA (Low-Rank Adaptation) QLoRA (Quantized LoRA) In-Context Learning ONNX

Infrastructure & DevOps

PEFT LoRA (Low-Rank Adaptation) QLoRA (Quantized LoRA) In-Context Learning ONNX

Projects

Home Lab Infrastructure

Sophisticated home lab using Proxmox and Docker LXCs, hosted over 10 applications including Home Assistant, Nextcloud, Immich, and Jellyfin. The environment features SSO authentication, custom DNS and DHCP for network management, and secure remote access via Tailscale VPN and Funnels.

Proxmox Docker OpenSource Deployements Networking

View Code

Image and Audio Captcha Solver

Parallelised captcha generator and solver with 93% accuracy built using TensorFlow’s CNN model.

Python Tensorflow CNN TTS OpenCV

View Code

Connect 4 - AI

A multi-agent implementation of the game Connect-4 using MCTS, Minimax and Expectimax algorithms.

Kubernetes Docker Prometheus Grafana GitLab CI Istio Ansible

View Code

Technical Blog

September 15, 2025 • 12 min read

Getting Started with Neural Machine Translation: A Complete Guide

Comprehensive guide to implementing neural machine translation systems from scratch using modern deep learning frameworks and best practices.

Machine Learning Read More

September 10, 2025 • 10 min read

Building Production-Ready ML APIs with FastAPI and Docker

Best practices for creating robust, scalable machine learning APIs using FastAPI, Docker containerization, and cloud deployment strategies.

Software Engineering Read More

September 5, 2025 • 15 min read

Efficient Fine-tuning of Large Language Models with LoRA

Deep dive into parameter-efficient fine-tuning techniques for large language models using LoRA and QLoRA methodologies.

Deep Learning Read More

September 15, 2025 • 12 min read • Machine Learning

Getting Started with Neural Machine Translation: A Complete Guide

Neural Machine Translation (NMT) has revolutionized how we approach language translation tasks. In this comprehensive guide, we'll explore the fundamentals of NMT and walk through implementing a basic system from scratch.

What is Neural Machine Translation?

Neural Machine Translation is an approach to machine translation that uses artificial neural networks to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

Key Components

Encoder-Decoder Architecture: The backbone of most NMT systems
Attention Mechanisms: Allowing the model to focus on relevant parts of the input
Subword Tokenization: Handling out-of-vocabulary words effectively
Beam Search: Generating high-quality translations during inference

Implementation Example

Here's a basic example of how to implement a simple NMT model using TensorFlow:

import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, Embedding

class NMTModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, hidden_units):
        super(NMTModel, self).__init__()
        self.embedding = Embedding(vocab_size, embedding_dim)
        self.encoder = LSTM(hidden_units, return_state=True)
        self.decoder = LSTM(hidden_units, return_sequences=True)
        self.output_layer = Dense(vocab_size, activation='softmax')
    
    def call(self, inputs):
        # Encoder
        encoder_outputs, state_h, state_c = self.encoder(inputs)
        # Decoder
        decoder_outputs = self.decoder(inputs, initial_state=[state_h, state_c])
        return self.output_layer(decoder_outputs)

Training Considerations

When training NMT models, several factors are crucial for success:

Data preprocessing and cleaning
Proper tokenization strategies
Learning rate scheduling
Regularization techniques
Evaluation metrics (BLEU, METEOR, etc.)

Conclusion

Neural Machine Translation represents a significant advancement in language processing. While this guide covers the basics, modern systems often incorporate transformer architectures, pre-trained models, and sophisticated training techniques for state-of-the-art performance.

September 10, 2025 • 10 min read • Software Engineering

Building Production-Ready ML APIs with FastAPI and Docker

Creating robust, scalable machine learning APIs is crucial for deploying ML models in production. This guide covers best practices using FastAPI, Docker, and cloud deployment strategies.

Why FastAPI for ML APIs?

FastAPI provides several advantages for ML applications:

High performance comparable to NodeJS and Go
Automatic API documentation with Swagger UI
Built-in data validation using Pydantic
Async support for concurrent requests

Basic ML API Structure

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Model API", version="1.0.0")

# Load model at startup
model = joblib.load("model.pkl")

class PredictionRequest(BaseModel):
    features: list[float]
    
class PredictionResponse(BaseModel):
    prediction: float
    confidence: float

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        features = np.array(request.features).reshape(1, -1)
        prediction = model.predict(features)[0]
        confidence = max(model.predict_proba(features)[0])
        
        return PredictionResponse(
            prediction=prediction,
            confidence=confidence
        )
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

Containerization with Docker

Dockerizing your ML API ensures consistent deployment across environments:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Scalability Considerations

Model Loading: Load models once at startup, not per request
Caching: Implement Redis for frequently requested predictions
Async Processing: Use background tasks for heavy computations
Load Balancing: Deploy multiple instances behind a load balancer

Monitoring and Logging

Production ML APIs require comprehensive monitoring:

Request/response logging
Model performance metrics
Health checks and uptime monitoring
Alert systems for anomalies

Deployment Strategies

Consider these deployment options based on your needs:

Kubernetes: For complex, multi-service deployments
AWS Lambda: For lightweight, serverless APIs
Google Cloud Run: Managed containerized deployments
Traditional VMs: For maximum control and customization

September 5, 2025 • 15 min read • Deep Learning

Efficient Fine-tuning of Large Language Models with LoRA

Parameter-efficient fine-tuning techniques like LoRA (Low-Rank Adaptation) have made it possible to adapt large language models for specific tasks without the computational overhead of full fine-tuning.

What is LoRA?

LoRA is a technique that reduces the number of trainable parameters for downstream tasks by learning pairs of rank-decomposition matrices while freezing the original weights. This approach can reduce trainable parameters by 10,000x and GPU memory requirement by 3x.

How LoRA Works

Instead of updating the full weight matrix W, LoRA learns a low-rank decomposition:

W = W₀ + ΔW = W₀ + BA
Where B ∈ ℝᵈˣʳ and A ∈ ℝʳˣᵏ
r is the rank, typically much smaller than d and k

Implementation with Hugging Face PEFT

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=16,  # Rank
    lora_alpha=32,  # LoRA scaling parameter
    lora_dropout=0.1,  # LoRA dropout
    target_modules=["c_attn", "c_proj"]  # Target modules
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 1,572,864 || all params: 117,677,056 || trainable%: 1.34

QLoRA: Quantized LoRA

QLoRA further reduces memory requirements by using 4-bit quantization:

from transformers import BitsAndBytesConfig
import torch

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/DialoGPT-medium",
    quantization_config=bnb_config,
    device_map="auto"
)

Training Configuration

Optimal training requires careful configuration:

Learning Rate: Typically 1e-4 to 5e-4 for LoRA
Batch Size: Start with smaller batches due to memory constraints
Rank Selection: Balance between performance and efficiency
Target Modules: Focus on attention and projection layers

Advantages of LoRA

Dramatically reduced memory requirements
Faster training and inference
Preserves pre-trained knowledge
Easy to switch between different task adaptations
Can be combined with other techniques (QLoRA, AdaLoRA)

Best Practices

Start with rank values between 8-64
Use gradient checkpointing to save memory
Monitor validation loss to prevent overfitting
Consider task-specific hyperparameter tuning
Experiment with different target modules

Conclusion

LoRA and QLoRA have democratized fine-tuning of large language models, making it accessible to researchers and practitioners with limited computational resources. These techniques represent a significant step toward more efficient and sustainable AI development.

Get In Touch

Email

amukesh.mk@gmail.com

Phone

+353 89 4960 450

Location

Dublin, Ireland

Availability

Available for consultation

Mukesh Arambakam

Senior Machine Learning Engineer and Software Developer

Core Technologies

Curriculum Vitae

Professional Summary

Professional Experience

Senior Machine Learning Developer

Applications Engineer

Applications Developer

Education

Master of Science in Data Science

B.Tech in Computer Science and Engineering

Technical Skills

Programming Languages

Machine Learning Libraries

Other Python Libraries

Generative AI & LLM Technologies

LLM Fine-Tuning & Optimization

Infrastructure & DevOps

Projects

Home Lab Infrastructure

Image and Audio Captcha Solver

Connect 4 - AI

Technical Blog

Getting Started with Neural Machine Translation: A Complete Guide

Building Production-Ready ML APIs with FastAPI and Docker

Efficient Fine-tuning of Large Language Models with LoRA

What is Neural Machine Translation?

Key Components

Implementation Example

Training Considerations

Conclusion

Why FastAPI for ML APIs?

Basic ML API Structure

Containerization with Docker

Scalability Considerations

Monitoring and Logging

Deployment Strategies

What is LoRA?

How LoRA Works

Implementation with Hugging Face PEFT

QLoRA: Quantized LoRA

Training Configuration

Advantages of LoRA

Best Practices

Conclusion

Get In Touch

Email

Phone

Location

Availability

Connect With Me