Fahad (Md Golam Tawhid Fahad) is a Computer Science graduate from BRAC University and a software engineer based in Dhaka, Bangladesh. He works as Founding Engineer at ClassTablet and Junior Software Engineer at Nyntax, with experience in full-stack development, AI/ML, and computer vision research.

What does Fahad specialize in?

Fahad specializes in full-stack engineering (Next.js, NestJS, Django, Flask), backend architecture, TypeScript monorepos, computer vision, visual speech recognition, and biomedical ML. He builds multi-tenant SaaS platforms, secure web applications, and research-driven AI systems.

Where does Fahad work?

Fahad is Founding Engineer at ClassTablet (multi-tenant education platform, hybrid, Dhaka) and Junior Software Engineer at Nyntax (on-site, Dhaka). He previously conducted research at BIOSE, BRAC University (June 2024 – December 2025).

What is Fahad's education?

Fahad earned a B.Sc. in Computer Science from BRAC University (January 2022 – January 2026) with a CGPA of 3.70. He also completed HSC in Science from Amrita Lal Dey College, Barisal (GPA 5.00).

What research has Fahad done?

Fahad's research includes an undergraduate thesis on self-supervised multilingual visual speech recognition (Conformer-based VSR), a systematic review of vision-based fall detection, and a multimodal framework for stress detection using wearables and computer vision at BIOSE, BRAC University.

How can I contact Fahad?

You can contact Fahad via email at tawhidfahad199@gmail.com, LinkedIn at linkedin.com/in/g-t-fahad, or GitHub at github.com/Golam-Tawhid. His portfolio contact form is available at gtfahad.tech/#contact.

Is Fahad open to job opportunities?

Yes. Fahad is open to software engineering and machine learning roles. His focus areas include full-stack development, backend systems, computer vision, and applied AI for production products.

Self-Supervised Learning for Multilingual Visual Speech Recognition

Visual Speech Recognition (VSR) aims to decode spoken words from lip movements alone. My thesis explores adaptive self-supervision for cross-language generalization in a multilingual Conformer-based architecture.

The problem

Many languages lack large labeled VSR datasets. Models trained on high-resource languages often fail to transfer to low-resource settings where labeled visual speech data is scarce.

Approach

Self-supervised pre-training learns representations from unlabeled video before fine-tuning on limited labeled data. The Conformer architecture combines convolution and self-attention, useful for capturing both local lip dynamics and longer temporal context.

What I'm measuring

Cross-language transfer after self-supervised pre-training
Robustness under varied lighting and speaker conditions
Comparison against fully supervised baselines per language

Why it matters

Accessible speech interfaces shouldn't depend on whether your language has million-hour labeled corpora. Self-supervision is one path toward more equitable VSR systems.