hsa kernel code (kfd v0.6)

58
HSA Kernel Code (KFD v0.6) Advisor: 徐徐徐徐徐 Student: 徐徐徐 2014/7/25

Upload: -

Post on 28-May-2015

437 views

Category:

Software


7 download

DESCRIPTION

Trace the Linux kernel code of AMD HSA KFD driver. Source code: https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD

TRANSCRIPT

Page 1: HSA Kernel Code (KFD v0.6)

HSA Kernel Code(KFD v0.6)

Advisor: 徐慰中教授Student: 黃昱儒2014/7/25

Page 2: HSA Kernel Code (KFD v0.6)

Agenda

● Introduction to HSAo hUMAo User Level Queueing

● HSA Drivero Concepts

▪ Flow Overview▪ User & Hardware Queues

o Source Code Detail● IOMMU

o Concepts▪ GCR3▪ PPR

o Source Code Detail

Page 3: HSA Kernel Code (KFD v0.6)

hUMA

Page 4: HSA Kernel Code (KFD v0.6)

User Level Queuing - Before HSA

Page 5: HSA Kernel Code (KFD v0.6)

User Level Queuing

Page 6: HSA Kernel Code (KFD v0.6)

Application 1Queue 1

HSA Device

Application 1Queue 2

Application 3Queue 1

Application 3Queue 1

HSA device access application’s ring

Application kick doorbell

IOMMU address translation (VA->PA)

1. AQL Packet2. Ring 3. Doorbell

Page 7: HSA Kernel Code (KFD v0.6)

HSA Software Stack

Page 8: HSA Kernel Code (KFD v0.6)

HSA Software Stack

HSA-aware Kernel

KFD IOMMU Driver

Runtime Library

● open(“/dev/kfd”)● ioctl(KFD_IOC_SET_MEMORY_POLICY)● ioctl(KFD_IOC_CREATE_QUEUE)● ioctl(KFD_IOC_DESTROY_QUEUE)

Application

HSA Device IOMMU

Page 9: HSA Kernel Code (KFD v0.6)

Agenda

● Introduction to HSAo hUMAo User Level Queueing

● HSA Drivero Concepts

▪ Flow Overview▪ User & Hardware Queues

o Source Code Detail● IOMMU

o Concepts▪ GCR3▪ PPR

o Source Code Detail

Page 10: HSA Kernel Code (KFD v0.6)

Concepts - HSA Run Flow

Create user queuesCreate HW queue with user

queue information

Enqueu AQL packets, kick doorbell, and wait

signal

Nothing

Application finish and destroy queues

Release HW queue

Application KFD Driver

Initialization

Computation

Finish

User - HW interaction

Page 11: HSA Kernel Code (KFD v0.6)

Scheduled Policy

1. Hardware scheduler and allows oversubscription (more queues than HW slots)

2. HW scheduling but does not allow oversubscription, so create_queue requests fail when we run out of HW slots

3. Not use HW scheduling, so the driver manually assigns queues to HW slots by programming registers

Page 12: HSA Kernel Code (KFD v0.6)

HSA GPU’s configuration register mmio address

Free hardware queue_id bitmap

doorbell

ring_base_address

pasid=0queue_id=0

doorbell

ring_base_address

pasid=0queue_id=1

doorbell

ring_base_address

pasid=1queue_id=0

doorbell

ring_base_address

pasid=1queue_id=1

queue acquire register

Physical Address

Software Scheduler

(pipe, queue)

Page 13: HSA Kernel Code (KFD v0.6)

HSA GPU’s configuration register mmio address

doorbell

ring_base_address

queue acquire register

Physical Address

Hardware Scheduler

(pipe=4, queue=0)

kernel_queue

Page 14: HSA Kernel Code (KFD v0.6)

Hardware Scheduler - No Oversubscription

IT_RUN_LIST

run_list

PM4 Packet (Type3)

IT_MAP_PROCESS

page_table_basepasid

sh_mem_config

PM4 Packet (Type3)

IT_MAP_QUEUES

mqd_addr(Memory Queue

Descriptoy)

PM4 Packet (Type3)

3 Processes

Page 15: HSA Kernel Code (KFD v0.6)

Hardware Scheduler - Oversubscription

IT_RUN_LIST

run_list

PM4 Packet (Type3)

IT_MAP_PROCESS

page_table_basepasid

sh_mem_config

PM4 Packet (Type3)

IT_MAP_QUEUES

mqd_addr(Memory Queue

Descriptoy)

PM4 Packet (Type3)

IT_RUN_LIST

run_list

PM4 Packet (Type3)

Page 16: HSA Kernel Code (KFD v0.6)

Per Application

Per Device

Per HW Queue

Only for HW scheduling

Page 17: HSA Kernel Code (KFD v0.6)

IOCTL Command Provided by KFD

● KFD_IOC_CREATE_QUEUEo Create hardware queue from application’s information (ex: ring base address)

● KFD_IOC_DESTROY_QUEUEo Release hardware queue

● KFD_IOC_UPDATE_QUEUE● KFD_IOC_SET_MEMORY_POLICY

o Set cache coherent policy● KFD_IOC_GET_CLOCK_COUNTERS

o Get GPU clock counter● KFD_IOC_GET_PROCESS_APERTURES

o Get apertures information of GPU● KFD_IOC_PMC_ACQUIRE_ACCESS● KFD_IOC_PMC_RELEASE_ACCESS

o Exclusive access for performance counters

Page 18: HSA Kernel Code (KFD v0.6)

HSA Driver Flow

● System intialization○ module_init○ device_init (Called by radeon)

● Application open “/dev/kfd” device

● Application send ioctl○ KFD_IOC_SET_MEMORY_POLICY○ KFD_IOC_CREATE_QUEUE

● Application send ioctl○ KFD_IOC_DESTROY_QUEUE

● Application termination

Page 19: HSA Kernel Code (KFD v0.6)

module_init(kfd_module_init)

● radeon_kfd_pasid_inito Initialize PASID bitmap

● radeon_kfd_chardev_inito register_chrdev: /dev/kfdo kfd_ops

▪ Define open, ioctl member function

Page 20: HSA Kernel Code (KFD v0.6)

kgd2kfd_device_init

● radeon_kfd_doorbell_init(kfd);● radeon_kfd_interrupt_init(kfd);● amd_iommu_set_invalidate_ctx_cb(kfd->pdev,

iommu_pasid_shutdown_callback);● device_queue_manager_init(kfd);

o dqm->initialize● dqm->start(kfd->dqm);

Page 21: HSA Kernel Code (KFD v0.6)

dqm->initialize For KFD_SCHED_POLICY_NO_HWS*

● Prepare pipe, queue bitmap

Page 22: HSA Kernel Code (KFD v0.6)

kfd_open

● radeon_kfd_create_process(current)o Create kfd_processo Assign PASID

Page 23: HSA Kernel Code (KFD v0.6)

KFD_IOC_SET_MEMORY_POLICY

● Two policyo cache_policy_coherento cache_policy_noncoherent

● Okra o default policy=cache_policy_coherento alternate policy=cache_policy_noncoherent

Page 24: HSA Kernel Code (KFD v0.6)

radeon_kfd_bind_process_to_device

● Called when user application send ioctl command

● amd_iommu_bind_pasid()o Register iommu with this kfd_process

Page 25: HSA Kernel Code (KFD v0.6)

KFD_IOC_CREATE_QUEUE

● Create queue with informations from userspace

● pqm_create_queue● Return queue_id and doorbell_address to

userspaceo queue_id is per kfd_processo doorbell_address map to device mmio address

Page 26: HSA Kernel Code (KFD v0.6)

pqm_create_queue

● find_available_queue_sloto Assign qid (per kfd_process)

● dqm->register_processo Register process to dqm (device queue manager)

● create_cp_queueo Create with queue_properties get from applicationo Map doorbell mmio address to application

● dqm->create_queue● dqm->execute_queue

Page 27: HSA Kernel Code (KFD v0.6)

dqm->create_queue For KFD_SCHED_POLICY_NO_HWS

● init_mqd (memory queue descriptor)o Store queue configuration from application

● Find unused (pipe, queue) from dqm (device queue manager)o If no, return -EBUSYo Maximum = 56

Page 28: HSA Kernel Code (KFD v0.6)

dqm->execute_queue For KFD_SCHED_POLICY_NO_HWS

● Write queue configuration to device● load_mqd

o ring_base_addro doorbell_offseto queue_priorityo ...

Page 29: HSA Kernel Code (KFD v0.6)

HSA GPU’s configuration register mmio address

Free hardware queue_id bitmap

queue select register

doorbell

ring_base_address

pasid=0queue_id=0

doorbell

ring_base_address

pasid=0queue_id=1

doorbell

ring_base_address

pasid=1queue_id=0

doorbell

ring_base_address

pasid=1queue_id=1

Each process can have up to 1024 queues

Physical Address

(pipe, queue)

Page 30: HSA Kernel Code (KFD v0.6)

kgd2kfd_device_init

● radeon_kfd_doorbell_init(kfd);● radeon_kfd_interrupt_init(kfd);● device_iommu_pasid_init(kfd);● kfd_topology_add_device(kfd);● amd_iommu_set_invalidate_ctx_cb(kfd->pdev,

iommu_pasid_shutdown_callback);● device_queue_manager_init(kfd);

o dqm->initialize● dqm->start(kfd->dqm);

Page 31: HSA Kernel Code (KFD v0.6)

dqm->start For KFD_SCHED_POLICY_HWS*

● pm_init (packet manager)● kernel_queue_init

o kernel_queue doorbello kernel_queue ring addresso load_mqd to write kernel_queue configuration to

device

Page 32: HSA Kernel Code (KFD v0.6)

pqm_create_queue

● find_available_queue_sloto Assign qid (per kfd_process)

● dqm->register_processo Register process to dqm (device queue manager)

● create_cp_queueo Create with queue_properties get from applicationo Map doorbell mmio address to application

● dqm->create_queue● dqm->execute_queue

Page 33: HSA Kernel Code (KFD v0.6)

dqm->create_queue ForKFD_SCHED_POLICY_HWS*

● init_mqd (memory queue descriptor)o Store queue configuration from application

Page 34: HSA Kernel Code (KFD v0.6)

dqm->execute_queue ForKFD_SCHED_POLICY_HWS*

● dqm->destroy_queues● pm_send_runlist

o pm_create_runlist_ib▪ Construct pm4 packet of MAP_PROCESS and

MAP_QUEUES type● Packet contains application’s ring address

o pm->kernel_queue->acquire_packet_buffer▪ Get a not used entry of kernel_queue

o pm_create_runlist▪ Construct pm4 packet of RUN_LIST type

o pm->kernel_queue->submit_packet▪ Kick kernel queue’s doorbell

Page 35: HSA Kernel Code (KFD v0.6)

Hardware Scheduler - No Oversubscription

IT_RUN_LIST

run_list

PM4 Packet (Type3)

IT_MAP_PROCESS

page_table_basepasid

sh_mem_config

PM4 Packet (Type3)

IT_MAP_QUEUES

mqd_addr(Memory Queue

Descriptoy)

PM4 Packet (Type3)

3 Processes

Page 36: HSA Kernel Code (KFD v0.6)

Hardware Scheduler - Oversubscription

IT_RUN_LIST

run_list

PM4 Packet (Type3)

IT_MAP_PROCESS

page_table_basepasid

sh_mem_config

PM4 Packet (Type3)

IT_MAP_QUEUES

mqd_addr(Memory Queue

Descriptoy)

PM4 Packet (Type3)

IT_RUN_LIST

run_list

PM4 Packet (Type3)

Page 37: HSA Kernel Code (KFD v0.6)

● Prepare (pipe, queue) bitmapdqm->initialize

dqm->start

● Create kfd_process● Assign PASID

kfd_open

● Get queue_id● Map doorbell to application

ioctl(CREATE_QUEUE)

● init_mqd● Find unused (pipe, queue) to

assign HW queue_id

dqm->create_queue

● Write queue configuration to device

dqm->execute_queue

dqm->initialize

● pm_init● kernel_queue_init

dqm->start

● Create kfd_process● Assign PASID

kfd_open

● init_mqddqm->create_queue

● Create pm4 packet ● Kick kernel_queue’s doorbell

dqm->execute_queue

● Get queue_id● Map doorbell to application

ioctl(CREATE_QUEUE)

Software Scheduling HardwareScheduling

Page 38: HSA Kernel Code (KFD v0.6)

Application Computation ...

● HW has ring_base_addr userspace addresso Application enqueue AQL packet and wait signal

● Application has HW doorbell mmio addresso Use to kick hardware

● Driver do nothing● Until application send

ioctl(KFD_IOC_DESTROY_QUEUE) or application finish

Page 39: HSA Kernel Code (KFD v0.6)

Haredware Queue Deactivation

1. Application send ioctl(KFD_IOC_DESTROY_QUEUE)

2. Task exit notifier

Page 40: HSA Kernel Code (KFD v0.6)

Haredware Queue Deactivation (1)

● ioctl(KFD_IOC_DESTROY_QUEUE)● pqm_destroy_queue

o dqm->destroy_queueo Restore queue, pipe bitmapo dqm->execute_queues(dqm);

Page 41: HSA Kernel Code (KFD v0.6)

dqm->destroy_queue For KFD_SCHED_POLICY_NO_HWS

● destroy_mqdo acquire_queue(kgd, pipe_id, queue_id);o write_register(kgd,

CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);

Page 42: HSA Kernel Code (KFD v0.6)

dqm->destroy_queue For KFD_SCHED_POLICY_HWS*

● dqm->destroy_queueso pm_send_unmap_queue

▪ Send a pm4 packet of UNMAP_QUEUESo pm_send_query_status(KFD_FENCE_COMPLETE

D)

Page 43: HSA Kernel Code (KFD v0.6)

Haredware Queue Deactivation (2)

● Task exit notifier will call iommu_pasid_shutdown_callbacko Register in kgd2kfd_device_init ->amd_iommu_set_invalidate_ctx_cbo Will be called in mmu_notifier’s release function

(mmu_notifier is registered in radeon_kfd_bind_process_to_device

->amd_iommu_bind_pasid)

Page 44: HSA Kernel Code (KFD v0.6)

iommu_pasid_shutdown_callback

● pqm_destroy_queueo dqm->destroy_queueo Restore queue, pipe bitmapo dqm->execute_queues(dqm);

Page 45: HSA Kernel Code (KFD v0.6)

Agenda

● Introduction to HSAo hUMAo User Level Queueing

● HSA Drivero Concepts

▪ Flow Overview▪ User & Hardware Queues

o Source Code Detail● IOMMU

o Concepts▪ GCR3▪ PPR

o Source Code Detail

Page 46: HSA Kernel Code (KFD v0.6)

Introduction to IOMMU

● User application send AQL packet into ring address which is virtual address

● Device accessing need translate VA to PA

DoorbellRing

Address

Page 47: HSA Kernel Code (KFD v0.6)

HSA GPU

Device table

PASID=2

GCR3

Assign this entry with kfd_process->mm->pgd

Physical Address

Page 48: HSA Kernel Code (KFD v0.6)

PRI & PPR

● The operating system is usually required to pin memory pages used for I/O.

● IOMMU Provide mechnism to let peripheral to use unpinned pages for I/O.

● Only support in AMD IOMMU_v2

Page 49: HSA Kernel Code (KFD v0.6)

PRI & PPR

● PRI(page request interface)o peripheral request memory management service

from a host OS (eg, page fault service for peripheral)o Issued by peripheral

● PPR(peripheral page service request)o When IOMMU receives a valid PRI request, it

creates a PPR message in request log to request changes to virtual address space

o Issued by IOMMU as interrupt

● Use to request IO page table changeo IOMMU driver can register PPR notifier

Page 50: HSA Kernel Code (KFD v0.6)

module_init(amd_iommu_v2_init)

● amd_iommu_register_ppr_notifier(&ppr_nb);o PPR callback

▪ ppr_notifier function

Page 51: HSA Kernel Code (KFD v0.6)

Set IOMMU With PASID

● amd_iommu_bind_pasid● Called when kfd_process create

o mmu_notifier_register(&pasid_state->mn, pasid_state->mm);

o amd_iommu_domain_set_gcr3(dev_state->domain, pasid, __pa(pasid_state->mm->pgd));

Page 52: HSA Kernel Code (KFD v0.6)

HSA GPU

Device table

PASID=2

GCR3

Assign this entry with kfd_process->mm->pgd

Page 53: HSA Kernel Code (KFD v0.6)

PRI & PPR Flow

Peripheral issue PRI to IOMMU

IOMMU write PPR request to PPR log(log contains fault address, pasid,

device_id, tag, flags)

IOMMU send interrupt to CPU

Page 54: HSA Kernel Code (KFD v0.6)

PPR FlowWhen irq comes

ppr_notifier

readl(iommu->mmio_base + MMIO_STATUS_OFFSET);

if (status & MMIO_STATUS_PPR_INT_MASK)

Register in amd_iommv_v2_init

do_fault

Page 55: HSA Kernel Code (KFD v0.6)

do_fault

● get_user_pages() API to pin fault pages into memoryo mm_struct, fault_addr

Page 56: HSA Kernel Code (KFD v0.6)

Flow Review

HSA-aware Kernel

KFD IOMMU Driver

Runtime Library

● open(“/dev/kfd”)● ioctl(KFD_IOC_SET_MEMORY_POLICY)● ioctl(KFD_IOC_CREATE_QUEUE)● ioctl(KFD_IOC_DESTROY_QUEUE)

Application

HSA Device IOMMU

Page 57: HSA Kernel Code (KFD v0.6)

Q&AThanks!

Page 58: HSA Kernel Code (KFD v0.6)

Reference

● https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD

● http://www.hsafoundation.com/standards/