By Ho (Jacques) Chan
When I began writing this guide, I wanted to educate the public about what AI means. After I read more about what others are doing in similar tasks and how they explain it, I realized why there is almost no bridge between actual scientists or engineers and the public. The action of explaining your work does not give you or the public any benefits or rewards neither in reputation nor in experiences! In reality it opens your work up to the public to misinterpret your message without careful guidance or knowledge of technique and experience! Recent events proved it is never a good idea for anyone in tech to speak about scientific concepts to the public openly without risking losing their jobs or incurring severe damage to their reputations. Look at Yann LeCun’s Twitter incident as a case study. But I have never failed on my promise of trying to make AI more explainable. I wish every reader a happy journey into exploring a better intelligence: human, machine and collective.
Machine Learning 101
To complete a machine learning project in the most time-effective manner, optimizing money and performing well, establish clear objectives and tasks. Then provide a large training dataset and testing schema. I first want to discuss misconceptions that most commonly lead to mistakes in AI project developments, and summarize major types of engineering objectives to give you a quick and dirty understanding of what building AI entails.
Before I dive into some jargon, let me give you the most human definition possible. Our relationship with AI can be likened to humans living in ancient times. Given enough time, we figured out we can cut wood to make fire. We learned we should stay close to fire because fire is warm and rewarding to get close to. But we should also probably stay away from the wolf because you observed the unfortunate end that befell all those who went close to the wolf. Similarly, the reinforcement system introduces rewards and penalties, just like any other machine learning system.
A quick glossary of terms I use to acquaint and recap you with the jargon in this field:
- Expert system: The simplest and oldest way to build AI. The algorithm relies on a set of if-else logic to derive conclusions based on shallow processed data. Rules preferably are set up with experts with domain knowledge.
- Statistical machine learning: Using statistical features to learn relations from collected / processed features that lead to conclusions.
- Deep learning (DL): The definition of the term “deep” remains controversial. In general we know what models would consider deep learning models, but are not clear about what models are not. The following mechanisms of learning are in general considered to be deep learning as they are widely studied as such especially when you pair their structure with the abilities they are designed to provide:
- Convolutional neural net (CNN): automatic extract useful features + learning feature correlations. CNN is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
- Attention mechanism: paying attention to certain areas or features that are most important to conclusions or directing focus and attention to certain factors over others when processing data.
- Learning, convergence and stabilization under higher dimensions.
Reinforcement learning (RL): describes a process where an agent is learning hidden rules in an environment by exploring and gathering rewards, unlike deep learning where the agent only observes rewards and penalties from samples. RL is the best training schema if you cn fully define the simulated environment. For example, you can build a 100 precent accurate physical simulation with gravity, obstacles, and walls and put an agent into the environment to explore how to go from point a to point b with fastest speed. The result is a walking robot like Boston Dynamics’ Spot Robot below.
As we have seen there is Reinforcement Learning (RL), Deep Learning (DL) and rule based learning. The applications of a self-driving car is the best, because it can be trained on all three type of tasks. Any task can be trained at both RL and DL. For any task and objective training, there are good and bad methods relative to the task, but most likely a machine’s task can be trained at all three methods. Going back to the metaphor of fire. If you want an agent to learn not get burned from the fire, which is a subtask, you can tell it to not go near fire (rule based), or let it observe an object get burned from the fire (Deep Learning), or let it explore and get burned from fire (Reinforcement Learning). In this case the fire is warm (gives rewards) but the agent should not stay too close or it will get burned (penalty). So a good agent should learn to stay close but not too close to the fire. The rule based systems are not equipped to always stay away from the fire and then both Deep Learning and Reinforcement Learning can enable effective task deployment as needed. This is the beauty of human evolution and machine intelligence.
Knowing your Client Personas
“Exceeding customers’ expectations” is easier said than done. It begins with gaining a complete understanding of their business problems. Just like it is with market research,
Your clients can commonly fall into the following categories:
- Companies looking to promote themselves during digital transformation
- Over ambitious and unrealistic business teams
- Conservative and old school engineers / business people who are forced to adapt to new technologies
- Professional scientists or research-oriented engineering teams exploring new alternatives and solutions
But often your clients or individuals/key stakeholders will use or apply the solution offered by you or your data scientist. And these stakeholders can be grouped into categories such as low, medium or advanced in their use of ML/AI techniques.
Building a Better Data Engineering Team
As a company that builds and provides varied solutions, it is very important for you to understand what you can and cannot do within your constraints. There is no clear good or bad answer to your position, however, there may be clear good or bad positions to the clients you service.
Running and constructing an AI project is much like running a medieval army. You can arrange for a few of the best equipped knights and mercenaries or arm large numbers of your farmers with pitchforks. The results would be very different depending on the situations you are in.
A good and disciplined engineer or scientist is hard to come by, and even harder to keep, much like a knight. Their “upkeep” cost is commonly misunderstood to be limited to their salary expectations. In fact, the most common hindrances I encountered in recruiting data scientists are the following:
- Lack of a good leader: the team requires a strong, and compassionate leader for members to learn and train from:
- The leaders are less likely to teach the next member in line, if the project is short and technology is less of a consideration. Instead, the leader would more likely use members as tools for tedious tasks, which decreases the engineering efficiency overall.
- The leader acts as an “anchor” for keeping the best players in the team with minimal turnover. Best engineers and scientists tend to follow the path of learning as their primary objective.
- Company and project attraction: companies that maintain a good engineering culture of well-maintained and tested code, encourage research and building productive tools for users are more welcome for engineers or scientists to stay.
- Sense of security: Top engineers and scientists adapt to the environments of R and D. Budgets and business objectives are constantly at odds in an ineffective bureaucracy structure. Scientists and engineers are less likely to put in the work if the project is not long lasting.
Don’t evaluate engineers by “years of experience” but by their ability to learn, adapt and work within constraints to innovate / succeed and share knowledge quickly and lead the way for others.
Understand the Non-Technical Parts of AI
- Machine Learning does not “create” intelligence
- Understand AI “Intuition”: separate AI fantasy from reality
- Understand normal human perceptions of AI
- Understand the conflict between what your customer needs versus technical superiority: yes, there’s a difference!
A common misconception is that machine learning creates intelligence! This is not the case. As machine learning scientists / engineers, we study and design models to primarily do two things. Observe samples from datasets and learn useful patterns for inference.
- The model itself is designed to memorize as much pattern as possible so it can perform useful tasks in testing time.
- The model itself does not create any information or knowledge.
Transfer knowledge between “worlds”, domains and tasks.
- For complex and smarter types of machine learning models, like deep learning, recognizing patterns from a brand new task a machine has never seen is almost impossible, due to the machine requiring to see almost impossible amounts of data to learn the patterns. Therefore it is critical to study how to teach models to first adapt to a specific domain, then adopt the knowledge down to a specific task. The procedures commonly follow the order of pre-train (acquire general knowledge), domain adaptation (adapt to domain knowledge) and fine-tune (adapt to task).
So to drill those ideas with laser clarity:
- We do not teach machines what the pattern is, we teach the machine how to find it:
- A calculator is an AI, as it has the ability to follow rules of calculation that appear to be intelligent. It is not machine learning due to not learning any rules / patterns, only following them. In fact, teaching a machine logic, common sense and math are few of the hardest tasks that humans can do better than machines.
- If the knowledge is not there, we have nothing to learn:
- We are engineers and scientists, you cannot pull a cat out of the hat if it is not there in the first place. This means without sufficient data to teach the machine, it cannot achieve much no matter how smart it is. A machine learning project is much like teaching a 6-year-old; it requires many examples and already gathered knowledge to learn from. The child you teach cannot grow up to be a doctor, pilot, driver, or whatever you intend to achieve if you simply let it play in the backyard or sleep all day.
To achieve “above par” performance for your task, the model itself is only a small piece of the puzzle, the upper limits of your model depend on many interconnected criteria:
- The task complexity: Are the patterns clear enough to be captured?
- How good your labels are. Can the task be performed by humans with very high agreement with each other?
- How much bias it has. Machines can learn fast with exceptional memorization of your data. So with a small dataset to learn from, any biases from data will be captured and magnified.
- The size and diversity of the dataset: Is the dataset big enough to cover most of the cases for the patterns?
- The “smartness” of the model: Is the model good enough to discover patterns and retrieve them for inference in reliable ways? The study of these problems can be roughly divided into multiple sub-categories:
- Knowledge preservation: The model requires much memory to store enough patterns, i.e., “hold the knowledge”.
- Extract: Useful patterns related to your tasks from raw data streams are extracted.
- Robust retrieval: The model has the ability to retrieve robust patterns that can lead to task outputs at the time of need (inference).
- Adaptation and transfer: The model does not just learn or store existing knowledge, it has to transfer the knowledge to unseen samples. The “purely unseen” is impossible to achieve, again, remember rabbits can’t be pulled out of empty hats. But the model should be able to infer based on more than first degree relationships for unseen samples.
- Existing knowledge in the research / market.
To summarize, the machine learning model itself does not create knowledge, it learns from knowledge and uses it for inference.
Why are we doing this? Why build models?
In short, remember the top takeaways for deploying your AI project entail:
- Understanding what machine learning is and how it relates with the other tasks like deep learning and reinforcement learning;
- Knowing your client personas and understanding and categorizing their unique needs to service them fully;
- Building a strong data engineering team with the right talent mix
- Understanding the non-technical aspects of AI including the qualitative considerations and implications to succeed in this space.
We design models to learn and transfer information. We do not build or invent information out of thin air. Information much like energy, does not come from nowhere. The scientists and engineers are not magicians. Doing well in one task may entail doing badly in another. The model training is designed to fit a task for a dataset, so make sure you define both clearly. The role of intuition? It is more like correlation. Building effective models relies on infinite self-play and deep learning, leveraging mission critical systems and explainable AI. I must confess I have not in my analytical framework of building and deploying successful AI/ ML projects discussed the role of consumer behaviour analytics in adding more depth and value to the insights and stories that are delivered to your clients, because that’s a whole separate sub-topic.
I hope, however, I have inspired you enough (or intrigued you a little) to help you get a basic business sense for running machine learning projects with all the logistics involved in setting yourself up for success at least initially. Good luck building your unique path in what can be a much misunderstood fairly recent industry!
Ho Chan has only been a researcher and software engineer for a couple of years, but has made an impression with high-value client deliverables and rapid software testing projects, and has been published and showcased at top global conferences designed to deploy AI-infrastructure at scale.