Main image of article How to Become an AI Researcher

There’s a huge need for artificial intelligence (AI) researchers in the tech industry and other fields. AI continues to grow, it’s creating an ongoing need for more experts to push the cutting edge of this technology. But how do you actually land a position as an AI researcher?

Before we get into it, we need to mention two points:

  1. First, plan to study. A lot. This is no easy undertaking, and there are no shortcuts.
  2. Second, plan to specialize… but not at the outset. After learning the broad foundations of AI, you’ll eventually need to pick a specialty.

It’s also important to understand that AI research takes place in two major areas: a.) academic settings and b.) private industry. There’s certainly overlap, as experts in one area tend to drift into the other; for example, professors working on AI in a university context might find themselves hired by a tech giant such as Google for their AI division. Where the research mainly takes place has a big impact on the type of research done.

Research done in academia tends to be less rushed, given its goal of advancing knowledge without an obvious profit motive. Such research is generally performed by people with at least a Master’s degree, usually a PhD. Research in private industry tends to be faster, with the goal of turning out a product and making money. While people doing research in private industry are usually encouraged to have at least a Master’s degree, there are plenty of such researchers who only have a bachelor’s degree.

Let’s talk about the path to becoming an AI researcher.

Step 1: Learning the Foundations

Plan to study math, and lots of it. Today’s AI has deep roots in advanced mathematics, especially as part of machine learning. That means you’ll need to learn:

  • Linear algebra: This is the mathematics of vectors and matrices, which are fundamental to understanding how neural networks work.
  • Probability and statistics: This is key to modeling uncertainty and learning methods.
  • Calculus, especially multivariable calculus, is crucial to understanding how neural networks learn.
  • Optimization theory: This is key to understanding how to improve model efficiency.
  • Information theory: This is vital to understanding deep learning’s role in compression and generalization.
  • Real analysis: This is important in the field of theoretical machine learning.

Plan to take college courses in the above, including likely three levels of calculus.

Step 2: Learn About Machine Learning and Deep Learning

Machine learning (ML) is the basis of most AI, wherein developers teach computers to make predictions from data. Deep learning is a subset of ML that uses neural networks that are more-or-less modeled after the human brain. Here are topics within ML and DL that you need to learn:

  • Regression: This is a predictive modeling technique used in machine learning. It builds on existing data, and uses calculus and linear algebra to make predictions. For example, in medicine, this type of AI can be used to predict drug dosages based on things like patient weight and metabolism. Or in finance, it can be used to predict credit card risk based on a borrower’s financial history.
  • Classification: This involves attaching so-called labels to data. For example, an AI spam detector would attach a label “spam” or “not spam” based on the text of an email. Or in image classification, it might identify an image with the label “cat” or “dog.”
  • Clustering: This is a method of machine learning that applies categories that the AI develops on its own. (As opposed to classification methods where a human initially provides the labels.)
  • How neural networks work: Although the concept of a neural network has been around for decades, today’s neural networks are built using special techniques that allow them to be trained quickly, provided significant resources. While big and resource-intensive, they serve as the foundation of many of today’s AI models.
  • Back propagation: This is a method of training a neural network by adjusting what are called weights. With this approach, the neural networks start with a prediction; then a test is run, and the system checks how far off the prediction was from the actual result, and next how much of each factor contributed to the error. The system then makes adjustments, and repeats as it gradually improves the model.

Some universities teach these topics, and you can also find books about them.

Step 3: Start Learning About the Key Subfields of AI

These include:

  • Natural Language Processing (NLP): This is the type of AI that allows computers to understand and generate human language. By now, millions of people around the world are familiar with ChatGPT, which is a great example of NLP at work. Behind the scenes, ChatGPT uses what’s called a Large Language Model (LLM) to perform its duties. Other areas of study that fall under NLP include translation (such as Google Translate), sentiment analysis (for example, Amazon deciding whether a review is positive or negative), question and answer (such as chatbots), and speech to text.
  • Computer Vision: This is a broad field that includes image classification (such as determining what animal an image contains), facial recognition, and object detection (which is similar to image classification, except it determines what part of an image is a specific object, and what is just “background”).
  • Reinforcement Learning: This is a type of machine learning where AI effectively learns by trial and error. As the AI attempts various tasks, it receives either an award or a penalty. It uses a method called sequential decision making, where each decision affects a future outcome. It’s an iterative process, and over time the AI learns.
  • Generative AI: This is a big part of ChatGPT and similar AI tools, in that the tool learns how to generate new content. It includes text generation, along with image generation, and even music generation.
  • AI Ethics: This is an important field within AI, as it deals with ethical questions surrounding the technology, including removing bias and encouraging fairness. (For example, there’s a well-known story about the AI used in early self-driving cars where the models were initially only recognizing people with light skin. That’s clearly a big problem that needed to be fixed.) The field also deals with making sure AI doesn’t give out harmful information, such as to people who are struggling with mental health issues. The general idea is known as “responsible AI.”

It’s important to learn the foundations of each of these subfields in AI; you’ll start to get a feel for where your main interests lie. When you reach that point, you can start picking a specialty.

Step 4: Learn How the Different Models Work

There are many types of models used in AI, and currently the most well-known is the large language model (LLM). Regardless of your specialty, you’ll need to learn how models work. You’ll probably want to start with LLMs.

Start by reading this paper titled “Attention is All You Need” by Vaswani et al. This is the original paper that laid the foundation for what’s known as the Transformer Architecture, which revolutionized the field of AI, especially Natural Language Processing. Transformers power today’s models, enabling them to process and generate human-sounding text. ChatGPT uses this type of model, and as you’re certainly aware, it’s been a game changer in AI. Studying this paper will also give you insights into a method known as self-attention mechanisms. Make sure you read the entire paper, and understand every bit of it.

Next you’ll need to practice using LLMs until you understand how they work. You can play with OpenAI’s GPT models using their API, and you can use libraries such as PyTorch to install and run LLMs locally on your own computer.

After you’re comfortable using LLMs, it’s time to start looking at them under the hood. You’re going to want to take time to learn how they’re built and pre-trained. After that, you’ll need to learn how they’re fine-tuned.

Some of the best resources for learning such topics is to study courses created by Andrew Ng on Coursera. You can also find loads of information on the Hugging Face website, including this excellent blog post called “How to Train a New Language Model From Scratch Using Transformers and Tokenizers.” This is not an easy read, but it’s not very long, and you’ll want to but take the time to study it as much as you can.

Step 5: Specializing in a Subfield

As you start to decide what specialty you’ll go into, remember that you’ll be taking a deep, deep dive into that subfield, and you won’t be working as much in the other subfields.

How do you pick an area to specialize in? In some cases, it will get picked for you; for example, if you work under a researcher and specialize in the same field. In other cases, you might find that one area of research excites you more than another. Be open-minded as you work towards your goal, try different areas, and see if one area jumps out. Remember, you’re going to essentially be devoting much of your career to this area you’ve chosen, so you’ll want to choose one you’re passionate about.

Step 6: Performing Research

By now you’re ready to start assisting in research. The first step is to find a research mentor. For example, if you’re studying at a university, you’ll want to meet with professors who are doing AI research, and see if you enjoy working with any one in particular.

If you’re outside a university setting, finding a suitable mentor can prove a little trickier, as you’ll likely have to apply for positions. But stay strong as you send out resumes and interview; know that the right job will find you. Become active on LinkedIn and various AI forums online, then make a point of meeting as many people as you can. Don’t be shy to say that you’re an aspiring researcher looking to mentor under an established researcher—remember, researchers need assistants too, and will be glad to hear people are interested.

Next, once you’re working under a researcher, learn everything you can about how they do their job. Observe their critical thinking skills. (For example, what if they are determined to prove something that they ultimately discover is false? A good researcher, while possibly bummed out, will accept the findings.) Have daily discussions with them, focusing not only on how they think but their soft skills as well. Do they go home and obsess over their work, or do they manage to find a good work-life balance? Find out what journals they read, and read the same journals. When speakers come to visit, professors usually go have dinner with them; you should go along  too. Immerse yourself in the entire culture. This is your future career, after all.

And that’s where we end our advice, as we can’t teach you how to do research (we’re not researchers). You will learn it from the people you mentor under, and eventually they’ll cut you loose and you’ll be performing research yourself.

Step 7: Plan to Publish

There’s an old saying in academia, “Publish or perish,” and to this day, professors live by it. But it really is important both in academia and in private industry, because publishing your research helps establish your credibility in the field, and it helps advance the field by allowing others to continue with your research in other directions. It also opens opportunities by getting the attention of people that work for important companies, and in academia can even lead to grants. It’s all about sharing your findings.

These days “publish” can mean many things, including:

  • Submitting articles to professional research journals.
  • Submitting the same articles to online sites.
  • Writing blogs and articles about your work.

Many researchers do all three of these types of publishing. (And as a side note, that means also keeping up good writing skills.)

Conclusion

Becoming an AI researcher is an exciting, career-long endeavor. It’s not just about coding AI, but also about how much you’re willing to learn, explore, experiment, and contribute to the vast and growing body of AI knowledge.