Wednesday, October 29, 2025

Why Validating Your Dataset is Critical for AI Training

Lessons Learned

When working with AI and machine learning models, it’s easy to focus on the latest architecture, model size, or GPU optimizations. However often, the biggest performance bottleneck isn’t the model itself. It actually could be your data.

During recent experiments with fine-tuning a sentiment analysis model on the IMDb dataset, I ran into a classic, but instructive, mistake: after slicing 10% of the dataset for faster training, my model consistently predicted negative sentiment for all reviews, even when the text was clearly and obviously positive.

This post explores why validating your dataset is critical, how small mistakes can slow training, and some practical strategies for ensuring your model learns effectively.

Understand the Data First

Before any model sees your data, you need to understand it thoroughly:

What columns exist? (Text, labels, numerical features?)
How is the data distributed? Are classes balanced?
Are there ordering patterns? Many public datasets, including IMDb, have sorted labels, and in my case, all negative reviews first.

I learned that slicing a sequential portion of IMDb without shuffling pulled mostly negative reviews. The model never saw enough positive examples, which led to completely biased predictions.

Validate Label Distribution

Checking the label distribution is a critical early step:


import pandas as pd
print(pd.Series(dataset['train']['label']).value_counts())

Imbalanced or skewed data can cause models to overfit the majority class.
For small slices, even minor imbalances can break predictions entirely.

It is recommended to shuffle before slicing, especially for ordered datasets.

Increase Variance to Improve Learning

Variance is key for models to generalize:

Shuffling: Randomizes the order of data to avoid sequences affecting learning.
Stratified sampling: Ensures each split contains proportional examples of all classes.
Data augmentation: For images or text, consider slight deviations, synonyms, or paraphrases.

Higher variance prevents the model from learning shortcuts, or cheating and improves prediction accuracy.

Time Efficiency Through Proper Dataset Preparation

Skipping proper dataset validation costs more time than it saves:

Wasted epochs: Training on biased slices leads to poor results, requiring more retraining.
Misleading evaluation: Early metrics can be deceptive if the validation set is skewed.
Debugging complexity: You might chase device errors, parameters, or code bugs that are really just dataset issues.

A few minutes spent validating your dataset can save you hours, or even days, of wasted training.

Practical Checklist for Dataset Validation

Inspect your dataset columns and types.
Check the label counts and distribution.
Shuffle the data before sampling small slices.
Use stratified splits for train/test/validation.
You can balance data to increase variance.
Confirm that tokenization, padding, and special tokens are applied consistently.

Conclusion

A model is only as good as the data you feed it. Properly validating your dataset:

Ensures that your model actually learns meaningful patterns.
Reduces wasted training time on skewed or biased data.
Prevents misleading results during evaluation and testing.

By taking the time to understand your dataset, increase variance, and balance your splits, you maximize training efficiency and improve the reliability of predictions.

Remember, before you tune parameters, add GPUs, or try complex architectures: check your data first!

Sunday, October 26, 2025

Exploring the Foundations of Generative AI

My DeepLearning.AI & AWS Journey

This weekend, I completed the first module of the Generative AI with Large Language Models course, created by DeepLearning.AI and AWS. What an incredible start to the journey, it’s one thing to use AI tools every day, but it’s another to truly understand the science, structure, and evolution behind them.

The Evolution of Generative AI

The course began with a look back at how far we’ve come. From the early days of recurrent neural networks (RNNs) models to the revolutionary moment when self-attention was introduced, the change has been nothing short of extraordinary.

Self-attention changed everything. It allowed models to focus on relationships between words, regardless of their position in a sentence, giving rise to the Transformer architecture that underpins today’s generative AI models like GPT and BERT.

Inside the Transformer

I learned how a Transformer is structured and how each part plays a unique role in processing language:

Embedding Layer: Converts words (tokens) into numerical vectors so that models can “understand” them in mathematical form.
Self-Attention Layer: Determines which words are most relevant to each other in a given context.
Feed-Forward Network: Transforms and refines the information learned from attention to make predictions more accurate.
Softmax Output Layer: Produces the probability distributions, helping the model decide what the next most likely word or token should be.

Together, these layers form a system that can not only understand context but generate human-like language with astonishing fluency.

Understanding Model Types

This module also clarified the differences between various LLM architectures:

Encoder-only models (like BERT) are great at understanding and classifying text.
Decoder-only models (like GPT) excel at generating coherent and creative text.
Encoder-decoder models (like T5) handle tasks like translation and summarization where both understanding and generation are needed.

Each one uses different learning styles, either its masked, causal, or sequence-to-sequence, and can be bi-directional or unidirectional, depending on whether they read context from both sides or one direction at a time.

Prompt Engineering: Guiding the Model

Prompt engineering and pre-training turned out to be some of the most exciting concepts.

I learned how zero-shot, one-shot, and few-shot prompting can influence model outputs, just like giving a student examples before asking them to solve a problem. Through in-context learning, we can guide models toward more accurate and relevant responses without changing their underlying parameters.

Generation Techniques and Decoding

How do LLM's decide what to say?

I explored how generation configuration and training parameters like top-k, top-p, and temperature control randomness, diversity, and creativity in model responses.

It’s fascinating how these small tweaks here can shift a model’s tone from factual to imaginative, or from repetitive to creative.

From Idea to Production

I also gained insight into the typical project lifecycle of implementing an LLM in the real world. From data collection and fine-tuning to deployment and optimization, understanding how enterprises bring AI solutions to life was incredibly valuable. It will definitely help me connect the technical with the strategy, an essential skill for anyone aspiring to design or lead AI-driven projects.

The Power and Challenge of Optimization

The most intriguing part of the course was exploring optimization and compute performance.
I learned how hardware limitations, such as memory and compute capacity, can shape model design and training efficiency. Diving into CUDA, GPUs, and how tasks are distributed across devices opened a new world of understanding.

Concepts like quantization (balancing precision and efficiency), BFLOAT16 (a breakthrough in numerical representation), and distributed training methods like Data Parallelism, DDP, and FSDP showed just how critical optimization is to modern AI.

Reading about the Chinchilla scaling laws was the icing on the cake, it’s amazing to see how research continues to push efficiency and performance even further.

Final Thoughts

Even though this first module represented just one week of learning, it felt like a deep dive into the core of modern generative AI. The course helped me cover everything from the roots of generative models to the inner workings of Transformers and the realities of large-scale training.

I managed to complete the week’s module in just two days, so I’ll be revisiting my notes again for sure. There are two other modules that are not free, just a heads up to those that are interested. I will definitely be purchasing the next two modules to continue this learning. I think it really benefitted me as one of my goals is to be able to hold conversations about generative AI.

This was a huge step forward in my journey toward understanding and working confidently in the world of AI, and I’m incredibly excited to continue sharing what comes next.

The hands-on phase begins soon and I can’t wait to bring these ideas to life.

Saturday, October 25, 2025

Building My AI Learning Plan

From Foundation to Experience to Certification

Transitioning into AI architecture requires more than curiosity, it demands structure, clarity, and a long-term plan. Early in my journey, I realized that to go from “interested” to “competent,” I needed a roadmap, one that would not only build technical understanding but also position me for the kinds of roles shaping the AI landscape today.

Why a Plan Was Necessary

AI is moving fast, and it’s easy to get lost in the noise. I wanted to learn intentionally, not by chasing random tutorials, but by following a sequence that builds from fundamentals, to hands-on experience, to certification readiness.

My goal is to obtain the ability to hold meaningful conversations about how AI systems work, understand how to design or guide AI-powered architectures, and be credible in technical discussions that shape product or enterprise solutions.

The Structure of My Plan

This won't come easy. As a starting point, I have come up with a 12-week roadmap is designed to evolve in three phases:

Building the Foundation

The journey begins with structured learning. I searched for courses that explain how generative AI evolved, what powers large language models, and how they process information. This phase ensures I’m fluent in the core concepts behind today’s AI systems. In my previous post, I discussed DeepLearning.AI's Coursera course (which I have already started on). My next post will be discussing the learnings.

Hands-On Experience

Learning theory isn’t enough. This phase is about doing: working with tokenization, datasets, model fine-tuning, and inference. There are lots of free ways to start experimenting and my research has uncovered a few gems. By experimenting in Jupyter and Colab, I will be able to see how models behave, where performance tradeoffs exist, and what choices matter when building real-world pipelines.
This is where abstract ideas start to connect to architecture thinking. It will help me understand how data flows, how compute is optimized, and how results are delivered efficiently.

Architecture and Certification

The final phase shifts toward design and validation. This phase will expose me to how multiple components (models, data layers, APIs, compute infrastructure) come together to form a complete AI solution. It’s about thinking like an architect: scalable systems, responsible AI principles, and aligning infrastructure with goals.

Ultimately, this will help me prepare for certifications like NVIDIA’s NCP-AIIO which solidify the practical and theoretical understanding needed to operate in production environments. However, I may just end up being only confident enough for NCA-AIIO, we will see.

How This Plan Aligns with AI-Driven Roles

Across AI and solutions-focused job descriptions, certain patterns appear again and again. These roles often require:

Conceptual depth — a solid understanding of AI principles, model architectures, and current trends.
Hands-on familiarity — the ability to experiment, evaluate, and guide engineering decisions based on real data.
Architectural vision — seeing the big picture: how systems fit together, scale, and serve business goals.
Communication fluency — translating complex AI ideas into clear insights for teams and stakeholders.

Each phase of my plan deliberately supports one or more of these traits. Foundational learning builds conceptual strength. The hands-on projects grow technical fluency. The architectural phase refines system-level thinking. And preparing for certification ensures credibility and alignment with industry standards.

And We Are Off

This plan is my way of staying focused in a fast-moving field. It gives me structure, but also room to explore. It is imperative to test, learn, and adapt as I grow. I am big on having a structure and discipline in such a complex, time-bound plan. I am shooting for a very aggressive next 12 weeks, but please understand that things do come up and could certainly become more.

The next post will dive into the learnings from the Coursera course. I truly believe understanding the foundations of Generative AI will be a crucial step towards the success of my plan. I look forward to sharing more of my journey with you in the coming days.

Friday, October 24, 2025

Starting My Journey Into AI Architecture

From Experience to the Cutting Edge

I have spent over a decade in performance engineering and a year in architecture, in the complicated and busy financial services industry. My current role is a mix of solutions and enterprise architecture. I’ve designed distributed systems, optimized performance, and helped teams solve complex technical problems. But recently, I decided it’s time for a new challenge: diving headfirst into the world of AI and AI architecture.

This blog post is the first of many in what I hope will be a transformative journey, documenting what I’m learning, the hurdles I face, and the plan I’ve put in place to get there.

I am warning you now, I am not the best writer!

Why AI, and Why Now?

AI is everywhere! It's transforming industries, creating new possibilities, and fundamentally changing how we interact with technology. I have come to realize that to remain at the forefront as an architect (and the technology-fanboy in me), I need to understand AI not just as a user, but as an architect.

I am particularly interested in large language models (LLMs) and generative AI, and I want to gain the skills to design AI systems that are scalable, secure, and effective. Real skills that companies like leading organizations like NVIDIA, OpenAI, and Anthropic value.

Needing a Plan

To make this transition meaningful, I recognize I need a structured plan. I’m starting from zero in terms of hands-on AI experience, so the goal isn’t just to tinker but rather to build real understanding:

Understanding how AI works under the hood, so I can hold intelligent conversations about architecture and model design.
Gaining enough experience to navigate and influence engineering teams effectively
Preparing to pursue certifications, such as NVIDIA's AI Infrastructure and Operations, to demonstrate and market my expertise in AI systems.

Learning through Courses and Labs

To get there, I need to map out a learning path that balances theory with practice:

I will definitely be taking the DeepLearning.AI Generative AI with LLMs Coursera course, based on what I have seen online it provides a clear view into the evolution of generative AI, its capabilities, and the technologies driving it.
Beyond the course, I plan to engage with hands-on labs to visualize how AI models work, understand tokenization, training dynamics, and inference, and explore how models make decisions under the hood.

The aim is to be an architect who understands what the business and engineers need to succeed, bridging the gap between architecture, AI capabilities, and enterprise implementation.

The Journey Ahead

I’m currently putting together a detailed plan for the next several weeks, mapping out milestones, skills to acquire, and certifications to pursue. This journey is long, but it’s also incredibly exciting. I look forward to sharing the process, the wins, and the lessons learned along the way. I hope that others can benefit from my story.

This is just the beginning.