Android Performance Optimization Methodology

As a developer, performance optimization is an unavoidable topic that we will certainly encounter during daily development. Android performance optimization is actually quite mature, with mature routines, mature methodologies, mature open source frameworks, etc.

For developers with less experience in performance optimization, there may be few opportunities to learn or summarize these mature routines, methodologies, or frameworks. So as a developer who has been doing performance optimization for many years, I will summarize some of the methodologies in this article for everyone's reference.

The Essence of Performance Optimization

First, let me introduce the essence of performance optimization. My understanding of its essence is: The essence of performance optimization is the reasonable and sufficient use of hardware resources to make the program perform better. And the purpose of the program performing better is to gain more retention, usage time, reputation, profit and other returns from customers.

So based on the essence, the two most important things about performance optimization are:

Reasonable and sufficient use of hardware resources
Make the program perform better and gain returns

Let's talk about these two things below.

Reasonable and Sufficient Use of Hardware Resources

Sufficient means making full use of the resources of the hardware, but sufficient is not necessarily reasonable. For example, we suddenly opened hundreds of threads at once, the CPU was fully utilized, but it was not reasonable. So reasonable means that the hardware resources exploited can have a positive effect on the performance of the program.

Hardware resources include: CPU, memory, disk, battery, traffic (not a hardware resource, but also one of the resources that need to be used reasonably) and so on.

Here are some examples of reasonable and sufficient use of hardware resources:

The CPU usage is high, but not in an overloaded state, and the CPU resources are mainly used by the current scenario, rather than being scattered and consumed by various businesses in the application. For example, when we optimize page loading speed, the speed is closely related to the CPU. So first we need to ensure that the CPU is fully utilized. We can use multi-threading, preload related resources before page loading, etc. to maximize the phone's CPU. However, when loading the page, we need to reasonably ensure that the CPU resources are mainly used by page-related logic, such as component creation, data acquisition, page rendering, etc. As for other logic that is less related to the current page opening scene, such as periodic tasks, monitoring, or some preloading, etc., they can be closed or delayed to reduce consumption of the CPU by unrelated tasks.
Memory resources are used sufficiently but also reasonably to keep exceptions like OOM under control. For example, when we do memory optimization, less memory does not mean better. On the contrary, more memory occupied may make the program faster. But memory usage should not be too high either. So we can control the memory usage to be sufficient and reasonable based on the OOM rate of different tier devices. On low-end devices, we can reduce memory usage through feature degradation and other optimizations. On high-end devices, we can appropriately increase memory usage to make the program perform better.
......

Make the Program Perform Better and Gain Returns

We have many direct metrics to measure the returns gained from performance optimization, such as pss, Java memory usage, native memory usage for memory optimization; launch speed, page opening speed for speed optimization; frame rate for lag optimization, and so on. Mastering these metrics is important. We need to know how to correctly and low-overhead monitor these metrics.

In addition to the direct metrics above, we also need to understand the ultimate metrics that reflect performance optimization results - user retention rate, usage time, conversion rate, rating, etc. Sometimes these metrics are the ultimate data to measure our performance optimization results. For example, when we optimize memory usage and reduce the pss by 100M, just having 100M less memory usage does not have much benefit in itself. If this 100M reduction is reflected in the app's survival time and increased conversion rate, then this 100M optimization is worthwhile. It will also make it easier for us to get recognition when reporting our output upwards.

How to Do Performance Optimization Well

After explaining the essence of performance optimization, I will talk about how to do performance optimization well. I will explain it mainly from the following three aspects:

Knowledge reserves
Thinking perspectives and ways
Form a complete closed loop

Knowledge Reserves

To do performance optimization well, especially original, systematic, or very effective optimizations, it cannot be achieved simply by reading some articles online and mimicking them. We need to have solid knowledge reserves, and then analyze our application and find optimization points through in-depth thinking based on this knowledge reserve. I will still give some examples to illustrate how knowledge at the hardware, system and software levels helps us do performance optimization well.

Hardware Level

At the hardware level, we need to have some understanding of processor architecture and memory hierarchy. If we don't know how many cores the CPU has, which are big cores, which are small cores, we won't come up with optimization solutions like binding core threads to big cores to improve performance. If we don't understand the storage structure design of registers, caches, main memory, etc., we won't be able to improve performance based on these features, such as keeping core data in caches as much as possible to improve speed-related performance.

System Level

Familiarity and understanding of the operating system is also indispensable knowledge to help us do performance optimization well. Here I list some of the knowledge that needs to be mastered at the system level, but it's not exhaustive. Linux knowledge includes process management and scheduling, memory management, virtual memory, locks, IPC communication, etc. Android system knowledge includes the virtual machine, core services like ams, wms, etc., rendering, and some core processes like startup, opening activities, installation, etc.

If we don't understand Linux's process scheduling system, we won't be able to fully utilize process priorities to help improve performance. If we are not familiar with Android's virtual machine, optimizations related to the virtual machine, such as oom optimization or gc optimization, cannot be carried out very well.

Software Level

At the software level refers to the App we develop. In performance optimization, we need to be as familiar with the application we develop as possible. For example, we need to know what threads our App has, what they do, the CPU consumption of these threads, how much memory they occupy, which businesses occupy them, cache hit rate, etc. We need to know what businesses our App has, what they are used for, the usage rate, resource consumption, etc.

In addition to the knowledge at the above three levels, more knowledge is needed to deeply optimize performance well, such as assembly language, compilers, programming languages, reverse engineering, etc. For example, writing code in C++ runs faster than Java, so we can improve performance by replacing some Java business logic with C++. Optimizations like inlining and dead code elimination during compilation can reduce the package size. Reverse engineering also has great use in performance optimization. We can modify system logic through reverse engineering to make the program perform better.

As you can see, doing performance optimization well requires a huge knowledge base. That's why performance optimization can demonstrate the depth and breadth of a developer's skills. That's also why performance optimization related knowledge is definitely asked in interviews. This knowledge base cannot be formed all at once. We need to slowly learn and accumulate it.

Thinking Perspectives and Ways

After talking about knowledge reserves, let's talk about thinking perspectives and ways. Note that it has no before-after relationship with knowledge reserves. We don't have to wait until we have enough technical knowledge before we can think about how to think. Thinking perspectives and ways are reflected in all stages of our development life cycle. Even new junior developers can practice thinking from different perspectives and ways. Below I will share some of my insights on thinking perspectives and ways when doing performance optimization. To help everyone understand more intuitively, I will use startup optimization as an example to explain.

Thinking Perspectives

Here I mainly introduce three perspectives - application layer, system layer, and hardware layer - to explain my thinking on startup speed optimization.

Application Layer Perspective

When optimizing startup speed from the application layer perspective, I would consider the usage rate, necessity, etc. of the loaded businesses from the business dimension, and then prioritize them so that only first screen usage or high usage rate businesses are loaded at startup. Then I can design a startup framework to manage tasks. The startup framework needs to have good design of priorities and be able to statistically track the usage rates or other performance aspects of these initialization tasks, such as the probability of these tasks being used after initialization, or the manifestation and help of performance improvement for the business after initialization.

Thinking from the application layer perspective is mainly based on business control or business optimization to improve performance.

System Layer Perspective

There are also many points to consider when optimizing startup from the system layer perspective, such as the thread and thread priority dimensions. During startup, how to properly control the number of threads, how to increase the priority of the main thread, how to reduce unrelated threads like the gc thread during startup, etc.

Hardware Layer Perspective

When considering startup optimization from the hardware layer perspective, we can consider optimization in terms of CPU utilization, cache hit rate, etc.

In addition to the perspectives mentioned above, we can have more perspectives. For example, think outside the device itself to see if other devices can help accelerate startup. Google Play has similar optimizations where it uploads some pre-compiled machine code from other machines. When the same device downloads the app, it will also download these pre-compiled machine code. The commonly used server-side rendering technique also lets the server pre-render the interface and then directly display static modules to improve page opening speed. Or think from the user's point of view - what kind of optimization will improve the user's perception. Sometimes when we optimize startup or page opening speed, we will give users a fake static page to make them perceive it has opened, and then bind the real data.

Having more and more comprehensive thinking perspectives can help us come up with more optimization solutions when doing performance optimization.

Thinking Ways

In addition to exercising our ability to think from different perspectives, we can also practice different ways of thinking about problems - top-down and bottom-up.

Top-down

When doing startup optimization, the top-down optimization idea may be to start directly from startup, then analyze the links in the startup process, find time-consuming functions, and put time-consuming functions in child threads or lazy load them. But this approach can make optimization incomplete. For example, putting time-consuming tasks in child threads did speed up high-end devices. But on low-end devices, it may slow down startup because low-end devices have poor CPUs, and too many threads lead to CPU overload, so the main thread gets less running time. Secondly, if viewed from the top layer, a function taking a long time to execute may not be a problem with the function itself, but may be because the function has not obtained CPU time for a long time.

Top-down thinking makes it easy for us to ignore the essence and lead to optimization effects that are not significant or complete.

Bottom-up

Bottom-up thinking about startup optimization does not directly analyze the startup chain to find slow functions, but thinks directly about how to reasonably and sufficiently utilize CPU resources during startup. At this point we will have many solutions, such as we may realize that different device models have different CPU capabilities, so we will optimize separately for high-end and low-end devices. On high-end devices, we find ways to make CPU utilization higher. On low-end devices, we avoid CPU overload, while optimizing with knowledge of slow functions, threads, locks, etc., and can formulate a systematic and complete set of startup optimization solutions.

Complete Closed Loop

The above is about how to optimize. Optimization is important, but not everything. In actual performance optimization, we need to do monitoring, optimization, anti-degradation, data collection of returns, etc. Doing these parts well can form a complete closed loop. Let me explain these parts one by one:

Monitoring: Fully monitoring the performance metrics of the application. Just metric monitoring is not enough. We also need to do attribution monitoring as much as possible. For example, in addition to monitoring memory metrics of our app, we should also be able to monitor the memory usage percentage of each business, large collections, large images, large objects and other attribution items. And our monitoring should also be designed with performance in mind. Comprehensive monitoring allows us to discover and resolve anomalies more efficiently.
Optimization: Optimization is what I mentioned earlier - reasonable and sufficient use of hardware resources to make the program perform better.
Anti-degradation: There are also many things that can be done for anti-degradation, including establishing comprehensive offline performance testing and online monitoring alarms. For example, for memory, we can run daily memory leak monkeys offline and treat them in advance. This is anti-degradation.
Data collection of returns: Learn to use A/B testing properly and pay attention to metrics that reflect core values. For example, when optimizing memory usage, blindly pursuing lower memory usage of the application is not optimal. More memory usage may make our program run faster and improve user experience. So we need to look at crash rate, retention rate and other metrics that reflect core experience value to determine if memory needs to continue to be optimized and how much.

Summary

The above are my insights and methodologies summarized from many years of performance optimization experience. Only with an understanding of these methodologies can we smoothly carry out performance optimizations.

This article does not introduce specific optimization solutions because performance optimization solutions cannot be fully covered in one article.