tuning guide (kunpeng 920) - huawei cloud

23
Kunpeng BoostKit for ARM Native Tuning Guide (Kunpeng 920) Issue 09 Date 2021-06-30 HUAWEI TECHNOLOGIES CO., LTD.

Upload: others

Post on 04-Jan-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Kunpeng BoostKit for ARM Native

Tuning Guide (Kunpeng 920)

Issue 09

Date 2021-06-30

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Contents

1 Introduction.............................................................................................................................. 11.1 Kunpeng BoostKit for ARM Native Introduction.......................................................................................................... 11.2 Hardware Configuration....................................................................................................................................................... 11.3 Tuning Guidelines.................................................................................................................................................................... 21.4 Tuning Flow............................................................................................................................................................................... 2

2 AVD Tuning............................................................................................................................... 32.1 Changing the Maximum Number of Files Accessed by a Process..........................................................................32.2 Optimizing the Boot Mode (Binding NUMA Cores)................................................................................................... 42.3 Increasing the AVD Heap Memory Size ......................................................................................................................... 5

3 Android Container Tuning..................................................................................................... 83.1 Optimizing the Boot Mode (Binding NUMA Cores)................................................................................................... 83.2 Restricting the Number of CPU Cores Visible to a Container..................................................................................93.3 Configuring the Maximum Process ID............................................................................................................................. 93.4 Configuring the Maximum Number of System Threads......................................................................................... 103.5 Configuring the Maximum Number of inotify Instances........................................................................................ 10

4 Rendering Performance Tuning......................................................................................... 114.1 Reducing the AVD Resolution.......................................................................................................................................... 114.2 Reducing the Frame Rate...................................................................................................................................................114.3 Binding a Rendering Program to NUMA Nodes........................................................................................................ 124.4 Optimizing x11vnc................................................................................................................................................................ 134.5 Improving the PCIe Performance on Huawei Kunpeng 920 Processors............................................................ 134.6 Improving Periodic Frame Rate Decrease Caused by Tremendous Concurrent GPU Tasks.........................15

5 H.264 Encoding Performance Tuning............................................................................... 175.1 Eliminating Intermittent Performance Deterioration of H.264 Coding..............................................................175.2 Improving the Stream Obtaining Efficiency.................................................................................................................17

A Appendix................................................................................................................................. 19A.1 Change History...................................................................................................................................................................... 19

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) Contents

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

1 Introduction

1.1 Kunpeng BoostKit for ARM Native Introduction

1.2 Hardware Configuration

1.3 Tuning Guidelines

1.4 Tuning Flow

1.1 Kunpeng BoostKit for ARM Native IntroductionThe Kunpeng BoostKit for ARM Native enables Android apps to run on servers,that is, to migrate to the cloud. With the development of wireless networks, moreand more mobile apps are migrated to the cloud. The Android cloud will be animportant new direction. The typical scenarios include Android app test hosting,cloud gaming, and mobile office. The Kunpeng BoostKit for ARM Native has thefollowing main components:

1. Hardware: TaiShan series servers

2. OSs: Ubuntu and CentOS (supported by the AVD solution)

3. Virtualization software: Google QEMU

4. Guest OS: Android

This document describes how to conduct performance tuning of the components.

1.2 Hardware Configuration

Table 1-1 Hardware configuration suggestions

Hardware

Suggestions

Drive Using SSDs to store Android images and instance data can improvethe boot and loading speed of Android VMs and applications,especially during concurrent boots.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 1 Introduction

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 1

Page 5: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Hardware

Suggestions

GPU Using AMD GPUs for rendering-intensive programs to improverendering performance and CPU utilization. GPUs provide higherrendering performance than CPUs.

1.3 Tuning GuidelinesObserve the following guidelines when tuning the performance:

1. Analyze resource bottlenecks from multiple aspects to identify the root cause.The poor system performance of an aspect may be caused by the problem ofother aspects. For example, high CPU usage may be caused by insufficientmemory capacity and the CPU resources are exhausted by memoryscheduling.

2. Adjust only one parameter of a specific aspect that affects the performance ata time. It is difficult to determine the parameter that causes the impact onperformance when multiple parameters are adjusted at the same time.

3. During system performance analysis, the performance analysis tool occupiescertain system resources, such as CPU and memory resources. The running ofthe analysis tool may cause a more serious resource bottleneck in someaspects of the system.

1.4 Tuning FlowIdentify problems, find performance bottlenecks, and determine a tuning methodbased on the bottleneck level.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 1 Introduction

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 2

Page 6: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

2 AVD Tuning

2.1 Changing the Maximum Number of Files Accessed by a Process

2.2 Optimizing the Boot Mode (Binding NUMA Cores)

2.3 Increasing the AVD Heap Memory Size

2.1 Changing the Maximum Number of Files Accessedby a Process

PurposeIn the QEMU solution, when the number of concurrent requests increases, thenumber of files accessed by the Xorg process may easily reach the upper limit1024. As a result, the UNIX socket cannot accept new connections. You can changethe upper limit of files that can be accessed to resolve this problem. Therecommended upper limit is 65535.

ProcedureThe settings performed by using the following method take effect permanently.

Step 1 Open the configuration file.vi /etc/security/limits.conf

Step 2 Add the following code and restart the system:root soft nofile 65535root hard nofile 65535* soft nofile 65535* hard nofile 65535

File after modification:

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 2 AVD Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 3

Page 7: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

----End

2.2 Optimizing the Boot Mode (Binding NUMA Cores)

Purpose

To improve the Android virtual device (AVD) performance, add the numactlparameter in the boot parameters to bind the CPU and memory to the sameNUMA node.

Procedure

Step 1 Set NUMA core binding during the boot process.numactl -N 0 -m 0 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2

To bind the AVD to specific cores, run the following command:

numactl -C 0-3 -m 0 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2

NO TE

● numactl -N 0 -m 0 binds the program to CPU node 0 and memory node 0.

● Four cores form a cluster. Binding cores by cluster helps improve performance. Forexample, cores 0 to 3 form a cluster, and cores 4 to 7 form another cluster.

● numactl -C 0-3 -m 0 binds the program to cores 0 to 3 and memory node 0. In thiscase, the program can use a maximum of four cores in a cluster.

Table 2-1 numactl parameter description

Parameter

Description

-N Binds a CPU node.

-m Binds a memory node.

-C Binds a CPU core.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 2 AVD Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 4

Page 8: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Step 2 Query the system NUMA information.numactl --show

In this example, there are four NUMA nodes and 64 cores in total. Each CPU nodehas 16 cores.

Step 3 Bind the VMs or containers based on the system configuration.

For example, the current system has 64 cores numbered from 0 to 63. You areadvised to bind every four cores to a VM or container. Run the followingcommands (replace the content following -m based on actual situation):

numactl -C 0-3 -m 0 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2numactl -C 4-7 -m 0 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2numactl -C 8-11 -m 0 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2...numactl -C 16-19 -m 1 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2numactl -C 32-35 -m 2 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2numactl -C 48-51 -m 3 emulator -avd test_2 -no-window -cores 2 -writable-system -gpu host -qemu --enable-kvm -m 1024 -vnc :2

CA UTION

The cores bound to a VM are not exclusively used by the VM. The cores can alsobe bound to other VMs. For example, after 16 VMs are started (each VM is boundto four cores), you can bind the 17th VM from cores 0 to 3.

----End

2.3 Increasing the AVD Heap Memory Size

Purpose

In Android, the available heap limit for an app can be specified as follows:

1. If android: largeHeap="true" is not set in the manifest file, the availableheap limit is specified by dalvik.vm.heapgrowthlimit.

2. If android: largeHeap="true" is set in the manifest file, the available heaplimit is specified by dalvik.vm.heapsize.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 2 AVD Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 5

Page 9: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

dalvik.vm.heapgrowthlimit and dalvik.vm.heapsize are properties in /system/build.prop.

The memory that can be used by an app in Android is limited by the heap size(the stack size is fixed). If the heap of the AVD created is insufficient, anOutOfMemoryError (OOM) may occur on the memory-intensive app. For the appswith an OOM, you need to increase the heap size of the AVD.

As shown in the preceding figure, the activity of com.netease.l10 is forciblyterminated and the app exits due to the OOM.

GuidelinesThe VM heap size selected when an AVD is created may not take effect. You canrun the cat /system/build.prop command in adb shell to query the heap size. Asshown in the following figure, the VM heap size is 256 MB.

Therefore, you may need to modify build.prop as follows:

The parameters are described as follows:

● dalvik.vm.heapgrowthlimit specifies the maximum heap size that can beused by common apps.

● dalvik.vm.heapsize specifies the maximum heap size when android:largeHeap="true" is set in the manifest file.

Theoretically, you only need to modify dalvik.vm.heapsize because thelargeHeap property will be added when a memory-intensive app is compiled. Toensure optimal tuning effect, increase the values of the two parameters.

Procedure

Step 1 Obtain the build.prop file from the VM.adb pull /system/build.prop

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 2 AVD Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 6

Page 10: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Step 2 Modify the values of dalvik.vm.heapgrowthlimit and dalvik.vm.heapsizedescribed in Guidelines.

Step 3 Obtain the rights of the root user.adb root

Step 4 Send the build.prop file to the VM.adb shell mount -o rw,remount /systemadb push build.prop /system

Step 5 Restart the AVD.

----End

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 2 AVD Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 7

Page 11: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

3 Android Container Tuning

3.1 Optimizing the Boot Mode (Binding NUMA Cores)

3.2 Restricting the Number of CPU Cores Visible to a Container

3.3 Configuring the Maximum Process ID

3.4 Configuring the Maximum Number of System Threads

3.5 Configuring the Maximum Number of inotify Instances

3.1 Optimizing the Boot Mode (Binding NUMA Cores)

PurposeAfter a container is started, the cgroups mechanism is used to bind the containerand the corresponding process to a physical core and the NUMA memory nodecorresponding to the core. The cgroups mechanism improves the containerperformance and reduces the CPU resources used by games.

To accelerate the boot of a container, bind the subprocess of the container to asingle core.

ProcedureBind containers to NUMA cores.

docker run

Add the following parameters:

● --cpuset-cpus="0-3" //Bind cores 0 to 3.● --cpuset-mems="0"//Bind the memory on NUMA node 0.● --memory="3584M" //Allocate 3584 MB memory.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 3 Android Container Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 8

Page 12: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

3.2 Restricting the Number of CPU Cores Visible to aContainer

PurposeFor some games that have a large number of threads, the CPU overhead isdetermined by the number of CPU cores visible to the system. To reduce the CPUoverhead, decrease the number of CPU cores visible to the system.

Procedure

Step 1 If the container is bound to only one core, create a present file with content of 1and map the file to /sys/devices/system/cpu/present in Docker.

The game starts a certain number of UnityWorker threads based on /sys/devices/system/cpu/present. Each container is bound to only one physical core. Therefore,you need to set /sys/devices/system/cpu/present in the container to 1. Create apresent file with content of 1 on the host machine and map the file to Docker(add -v /home/ubuntu/vpresent/present$cores:/sys/devices/system/cpu/present \ to the docker run command in the robox script).

Step 2 Set the following two parameters in the similar way:

/sys/devices/system/cpu/possible

/sys/devices/system/cpu/online

----End

3.3 Configuring the Maximum Process ID

PurposeAfter a container is started, the container threads will be displayed on the hostmachine. In high concurrency scenarios, the number of PIDs may reach the upperlimit.

ProcedureRun the echo 655360 > /proc/sys/kernel/pid_max command as the root user toset the maximum process ID.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 3 Android Container Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 9

Page 13: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

3.4 Configuring the Maximum Number of SystemThreads

PurposeWhen multiple containers are running concurrently, the number of threads of allcontainers is displayed on the host machine. In high concurrency scenarios, thenumber of threads may reach the upper limit.

ProcedureChange the value of UserTasksMax to 655360 in the /etc/systemd/logind.conffile to set the maximum number of user threads. The change takes effect onlyafter the system is restarted.

3.5 Configuring the Maximum Number of inotifyInstances

PurposeWhen multiple containers are running concurrently, the number of inotifyinstances of all containers is displayed on the host machine. In high concurrencyscenarios, the number of inotify instances may reach the upper limit.

ProcedureChange the value of /proc/sys/fs/inotify/max_user_instances to 8192. If thedefault value 128 is used, only 64 Android containers can run.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 3 Android Container Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 10

Page 14: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

4 Rendering Performance Tuning

4.1 Reducing the AVD Resolution

4.2 Reducing the Frame Rate

4.3 Binding a Rendering Program to NUMA Nodes

4.4 Optimizing x11vnc

4.5 Improving the PCIe Performance on Huawei Kunpeng 920 Processors

4.6 Improving Periodic Frame Rate Decrease Caused by Tremendous ConcurrentGPU Tasks

4.1 Reducing the AVD Resolution

Purpose

The rendering loads increase with the resolution. The 320p or 480p resolution isrecommended if users have no special requirements.

Procedure

When creating an AVD, use the --skin parameter to specify the resolution.

Example:

android create avd --name test_2 --target android-24 --abi arm64-v8a --device "Nexus 4" --skin 720x480 --sdcard 500M --force

4.2 Reducing the Frame Rate

Purpose

A higher frame rate increases the rendering overheads. Reduce the frame rate aslong as user experience is not affected.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 11

Page 15: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

NO TE

Currently, the frame rate modification applies only to Kbox-related solutions.

Procedure

Step 1 Run the docker inspect container-name |grep -i merged command (container-name indicates the container name) to find the mapping path of the container.

Step 2 Change the value of ro.hardware.fps in the /system/build.prop file in thecontainer to the required value.

Step 3 Run the docker restart XXX command (XXX indicates the container name) torestart the container. The modification of the Kbox frame rate takes effect afterthe container is restarted.

CA UTION

Currently, the frame rate specified by Kbox is 30±2 fps. If the frame rate ischanged to another value, other problems may occur.

----End

4.3 Binding a Rendering Program to NUMA Nodes

Purpose

Bind the GPU to the NUMA node where the GPU is located to preventperformance loss caused by non-local PCIe access.

Procedure

Step 1 Run the lspci command to obtain the bus_id of the GPU, as shown in thefollowing figure.

Step 2 Run the lspci –vvvs bus_id command to check the NUMA node where the GPU islocated, as shown in the following figure.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 12

Page 16: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Step 3 Run the numactl command to bind cores. For details, see 2.2 Optimizing theBoot Mode (Binding NUMA Cores).

----End

4.4 Optimizing x11vnc

Purpose1. Symptom 1: When the server runs x11vnc and the client uses VNC to remotely

access and operate an Android AVD running Ubuntu, the Android AVDresponds slowly. If a screen is simulated, the Android AVD responds normally.In normal cases, the screen is not simulated.

2. Symptom 2: VNC remote operations cannot be performed smoothly when thenetwork bandwidth is low.

3. Symptom 3: When x11vnc is used to connect to an Ubuntu desktop toperform operations on an AVD, the connection is easily interrupted, resultingin poor user experience.

Procedure1. If symptom 1 occurs, use the following method:

In Section Monitor of the xorg.conf file, set the option enable to true.

2. If symptom 2 occurs, use the following method:

Decrease the desktop resolution or increase the VNC compression ratio toimprove the smoothness.

3. If symptom 3 occurs, use the following method:

Upgrade x11vnc to the latest version (download and compile https://github.com/LibVNC/x11vnc).

4.5 Improving the PCIe Performance on HuaweiKunpeng 920 Processors

Purpose

Huawei Kunpeng 920 processors support PCIe Write Combining (WC). This featureis enabled by default for kernels earlier than 4.19.29. For example, if you useUbuntu 18.04.1, whose default kernel version is 4.15, you do not need to makeany modification. If the kernel version is later than 4.19.29, enable the WC featureto improve the DMA transfer performance.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 13

Page 17: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Procedure

Step 1 Run the uname -a command to check the kernel version.

Step 2 If the OS kernel version is later than 4.19.29, modify the code as follows:diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.hindex 97fc498d..bfe1639d 100644--- a/include/drm/drm_cache.h+++ b/include/drm/drm_cache.h@@ -47,24 +47,6 @@ static inline bool drm_arch_can_wc_memory(void) return false; #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) return false;-#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)- /*- * The DRM driver stack is designed to work with cache coherent devices- * only, but permits an optimization to be enabled in some cases, where- * for some buffers, both the CPU and the GPU use uncached mappings,- * removing the need for DMA snooping and allocation in the CPU caches.- *- * The use of uncached GPU mappings relies on the correct implementation- * of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU- * will use cached mappings nonetheless. On x86 platforms, this does not- * seem to matter, as uncached CPU mappings will snoop the caches in any- * case. However, on ARM and arm64, enabling this optimization on a- * platform where NoSnoop is ignored results in loss of coherency, which- * breaks correct operation of the device. Since we have no way of- * detecting whether NoSnoop works or not, just disable this- * optimization entirely for ARM and arm64.- */- return false; #else return true; #endif

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 14

Page 18: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

NO TE

This patch enables the WC feature of the DMA memory. Huawei Kunpeng 920 processorssupport this feature, but other ARM64-based CPUs do not support this feature.

----End

4.6 Improving Periodic Frame Rate Decrease Caused byTremendous Concurrent GPU Tasks

PurposeWhen there are too many concurrent GPU tasks, the GPU memory is swappedafter it is used up. The swap memory in the kernel GPU driver holds a spin lock. Ifthere are too many concurrent waiting tasks, there are too many waiting threads,causing a decrease in the frame rate. Limiting the number of times that the GPUmemory is swapped per second in the kernel driver can solve this problem.

ProcedureLimit the number of times that the GPU memory is swapped per second in thekernel driver.

Step 1 Modify the GPU driver in the Linux kernel code. The following figure shows themodified file and added code.]+++ b/drivers/gpu/drm/ttm/ttm_bo.c]@@ -708, 6 +708,8 @@ static bool ttm._bo_evict_swapout_allowable (struct ttm._buffer_object *bo, return ret;}+static uint64_t _tsec = 0;+static uint64_t _calls_per_sec = 0;static int ttm_mein._evict;_first; (struct ttm_bo_device *bdev,uint32_t mem_typezconst struct ttm_place *place,-720,6 +722,21 @@ static int ttm._niein_evict_first (struct ttm_bo_device *bdev, unsigned i; int ret;+ {+ struct timeval t;+ do_gettimeofday(&t);++ if (_tsec != Z.tv_sec) {+ //printk(KERN_DEBUG "LL: lock: %p, time: %#llxz %#llx\nn

z &glob->lru_lock, _tsec, _calls_per_sec);+ _tsec = z.tv_sec;+ _calls_per_sec = 0;+ }++ if (++_calls_per_sec > 100)+ return -EBUSY;++ }spin_lock(&glob->lru_lock);for (i = 0; i < TTM_14AX_BO_PRIORITY; ++i) { list_for_each_ent;ry (bo, &man->lru[i], Iru) {

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 15

Page 19: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Step 2 Compile and install the modified kernel and restart the system for themodification to take effect.

----End

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 4 Rendering Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 16

Page 20: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

5 H.264 Encoding Performance Tuning

5.1 Eliminating Intermittent Performance Deterioration of H.264 Coding

5.2 Improving the Stream Obtaining Efficiency

5.1 Eliminating Intermittent Performance Deteriorationof H.264 Coding

PurposeThe dynamic power adjustment function enabled by the GPU by default causesintermittent underclocking of the GPU encoding module, which results inintermittent performance deterioration.

ProcedureSet the GPU performance mode to high to eliminate the large delay of severalframes during encoding.

echo high > /sys/class/drm/card1/device/power_dpm_force_performance_levelecho high > /sys/class/drm/card2/device/power_dpm_force_performance_levelecho high > /sys/class/drm/card3/device/power_dpm_force_performance_level

NO TE

If the system restarts, you need to reset the GPU performance mode again.

5.2 Improving the Stream Obtaining Efficiency

PurposeOpenGL obtains the rendering result through the glReadPixels interface. Thisinterface forces the GPU to complete all tasks, resulting in low efficiency.

ProcedureQEMU solution:

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 5 H.264 Encoding Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 17

Page 21: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Do not enable VNC when starting AVD. (This is a workaround.)

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) 5 H.264 Encoding Performance Tuning

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 18

Page 22: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

A Appendix

A.1 Change HistoryDate Description

2021-06-30 This issue is the ninth official release.Deleted Anbox-related content.

2021-03-31 This issue is the eighth official release.Changed the product name from "Kunpeng ARM nativesolution" to "Kunpeng BoostKit for ARM Native".

2021-03-11 This issue is the seventh official release.Modified 4.3 Binding a Rendering Program to NUMANodes.

2020-12-30 This issue is the sixth official release.Changed the solution name to "Kunpeng ARM nativesolution."

2020-12-17 This issue is the fifth official release.Modified the format.

2020-09-21 This issue is the fourth official release.Changed the solution name to "cloud native solution."

2020-08-06 This issue is the third official release.● Deleted the description of tuning the H.264

encoding delay performance.● Removed the encTurbo Software User Guide

(Ubuntu 18.04).

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) A Appendix

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 19

Page 23: Tuning Guide (Kunpeng 920) - HUAWEI CLOUD

Date Description

2020-04-30 This issue is the second official release.Cloud Native Solution Tuning Guide (Kunpeng 920):Added the reference document links in the descriptionof tuning the H.264 encoding delay performance, andoptimized figures.

2020-03-20 This issue is the first official release.

Kunpeng BoostKit for ARM NativeTuning Guide (Kunpeng 920) A Appendix

Issue 09 (2021-06-30) Copyright © Huawei Technologies Co., Ltd. 20