Skip to content

Reverse-engineer

Compare memory contents from user-space driver

Observation: the GPU mem content captured from G71 did not work with G52.

Purpose: Investigate the differences of mem contents between two GPUs (Mali Bifrost G52 and G71)

To do this, we picked two devices (ODROID-N2 and Hikey960, equipped with G52 and G71, respectively) and used a simple OpenCL kernel, vector addition. The code is as follows.

__kernel void vector_add(__global const int *A, __global const int *B, __global int *C) {
    // Get the index of the current element to be processed
    int i = get_global_id(0);

    // Do the operation
    C[i] = A[i] + B[i];
}

1. GPU binary

We first take a look at the GPU binary (low-level GPU instruction) from each GPUs.

tgx_trans_va_to_pa: vaddr: 7f8b000000, pte: c6e294c3, paddr: c6e29000
tgx_dump_phys: =============== [Output] DUMP paddr 0xc6e29000 ==============
0x7f8b000000 | 29 80 00 30 04 24 30 87  37 E3 07 C0 48 00 40 29  |  )..0.$0.7...H.@) 
0x7f8b000010 | 21 4F 01 00 7C F0 88 9A  B9 65 17 0C 02 60 00 AF  |  !O..|....e...`.. 
0x7f8b000020 | 44 01 00 00 00 00 00 00  00 00 C8 E1 6E 0C 00 60  |  D...........n..` 
0x7f8b000030 | 4B 83 00 00 02 24 34 87  A7 31 00 C0 68 18 48 31  |  K....$4..1..h.H1 
0x7f8b000040 | 29 00 00 20 06 0C E6 3F  91 65 07 00 40 00 90 01  |  ).. ...?.e..@... 
0x7f8b000050 | 43 84 82 00 82 20 34 87  A7 B1 00 00 00 00 00 60  |  C.... 4........` 
0x7f8b000060 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7f8b000070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

Binary from G52 (using ODROID-N2)

tgx_trans_va_to_pa: vaddr: ffffa1000000, pte: 2041074c3, paddr: 204107000
tgx_dump_phys: =============== [Output] DUMP paddr 0x204107000 ==============
0xffffa1000000 | 29 80 00 30 04 24 30 87  37 E3 07 C0 48 00 40 29  |  )..0.$0.7...H.@) 
0xffffa1000010 | 21 4F 01 00 7C F0 88 9A  91 65 17 0C 02 40 00 A1  |  !O..|....e...@.. 
0xffffa1000020 | 44 01 00 00 00 00 00 00  00 00 C8 E1 6E 0C 00 60  |  D...........n..` 
0xffffa1000030 | 4B 83 00 00 02 24 34 87  A7 31 00 C0 68 18 48 31  |  K....$4..1..h.H1 
0xffffa1000040 | 29 00 00 20 06 0C E6 3F  91 65 07 00 40 00 90 01  |  ).. ...?.e..@... 
0xffffa1000050 | 43 84 82 00 82 20 34 87  A7 B1 00 00 00 00 00 60  |  C.... 4........` 
0xffffa1000060 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffa1000070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

Binary from G71 (using Hikey960)

The dumped memory from both sides seems same. At least the GPU kernel is not a cause.

CORRECT: two binaries are not same, see the 2nd line. B9 65 17 0C 02 60 00 AF vs 91 65 17 0C 02 40 00 A1

2. Job chain

tgx_trans_va_to_pa: vaddr: 7fa4f07000, pte: 400000c6e2a443, paddr: c6e2a000
tgx_dump_phys: =============== [Output] DUMP paddr 0xc6e2a000 ==============
0x7fa4f07000 | 00 70 F0 A4 7F 00 00 00  00 50 D8 9A 55 00 00 00  |  .p.......P..U... 
0x7fa4f07010 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07020 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07030 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

0x7fa4f07040 | 01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07050 | 09 01 01 40 00 00 00 00  00 00 00 00 00 00 00 00  |  ...@............ 
0x7fa4f07060 | FF 7F 00 00 C6 18 CF 63  00 00 00 20 00 00 00 00  |  .......c... .... 
0x7fa4f07070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07080 | 02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07090 | 00 00 00 00 00 00 00 00  00 71 F0 A4 7F 00 00 00  |  .........q...... 
0x7fa4f070a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f070b0 | 50 71 F0 A4 7F 00 00 00  C0 71 F0 A4 7F 00 00 00  |  Pq.......q...... 
0x7fa4f070c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f070d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f070e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f070f0 | 80 71 F0 A4 7F 00 00 00  00 00 00 00 00 00 00 00  |  .q.............. 
0x7fa4f07100 | 03 10 71 F0 A4 7F 00 00  00 00 00 00 00 00 00 00  |  ..q............. 
0x7fa4f07110 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07120 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07130 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07140 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07150 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07160 | 00 90 49 98 7F 00 00 00  00 90 45 98 7F 00 00 00  |  ..I.......E..... 
0x7fa4f07170 | 00 90 41 98 7F 00 00 00  00 00 00 00 00 00 00 00  |  ..A............. 
0x7fa4f07180 | 00 00 00 00 1F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07190 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f071a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f071b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f071c0 | 00 00 00 8B 7F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f071d0 | 01 20 00 08 00 00 00 00  00 00 00 00 00 00 00 00  |  . .............. 
0x7fa4f071e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f071f0 | 00 90 02 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07200 | 00 72 F0 A4 7F 00 00 00  80 4F D8 9A 55 00 00 00  |  .r.......O..U... 
0x7fa4f07210 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07220 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07230 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

0x7fa4f07240 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07250 | 09 01 02 40 00 00 00 00  00 00 00 00 00 00 00 00  |  ...@............ 
0x7fa4f07260 | FF 7F 00 00 C6 18 CF 63  00 00 00 20 00 00 00 00  |  .......c... .... 
0x7fa4f07270 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07280 | 02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07290 | 00 00 00 00 00 00 00 00  00 73 F0 A4 7F 00 00 00  |  .........s...... 
0x7fa4f072a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f072b0 | 50 73 F0 A4 7F 00 00 00  C0 73 F0 A4 7F 00 00 00  |  Ps.......s...... 
0x7fa4f072c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f072d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f072e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f072f0 | 80 73 F0 A4 7F 00 00 00  00 00 00 00 00 00 00 00  |  .s.............. 
0x7fa4f07300 | 03 10 73 F0 A4 7F 00 00  00 00 00 00 00 00 00 00  |  ..s............. 
0x7fa4f07310 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07320 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07330 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07340 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07350 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07360 | 00 90 49 98 7F 00 00 00  00 90 45 98 7F 00 00 00  |  ..I.......E..... 
0x7fa4f07370 | 00 90 41 98 7F 00 00 00  00 00 00 00 00 00 00 00  |  ..A............. 
0x7fa4f07380 | 00 00 00 00 1F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07390 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f073a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f073b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f073c0 | 00 00 00 8B 7F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f073d0 | 01 20 00 08 00 00 00 00  00 00 00 00 00 00 00 00  |  . .............. 
0x7fa4f073e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f073f0 | 00 90 02 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0x7fa4f07400 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

Job chain from G52 (using ODROID-N2)

tgx_trans_va_to_pa: vaddr: ffffab601000, pte: 400002040a8443, paddr: 2040a8000
tgx_dump_phys: =============== [Output] DUMP paddr 0x2040a8000 ==============
0xffffab601000 | 00 10 60 AB FF FF 00 00  00 DD 07 CC AA AA 00 00  |  ..`............. 
0xffffab601010 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601020 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601030 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

0xffffab601040 | 01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601050 | 09 01 01 40 00 00 00 00  00 00 00 00 00 00 00 00  |  ...@............ 
0xffffab601060 | FF 7F 00 00 C6 18 CF 63  00 00 00 1C 00 00 00 00  |  .......c........ 
0xffffab601070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601080 | 02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601090 | 00 00 00 00 00 00 00 00  00 11 60 AB FF FF 00 00  |  ..........`..... 
0xffffab6010a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6010b0 | 50 11 60 AB FF FF 00 00  C0 11 60 AB FF FF 00 00  |  P.`.......`..... 
0xffffab6010c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6010d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6010e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6010f0 | 80 11 60 AB FF FF 00 00  00 00 00 00 00 00 00 00  |  ..`............. 
0xffffab601100 | 03 10 11 60 AB FF FF 00  00 00 00 00 00 00 00 00  |  ...`............ 
0xffffab601110 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601120 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601130 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601140 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601150 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601160 | 00 C0 01 A3 FF FF 00 00  00 C0 03 A3 FF FF 00 00  |  ................ 
0xffffab601170 | 00 00 DC A2 FF FF 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601180 | 00 00 00 00 1F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601190 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011c0 | 00 00 00 A1 FF FF 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011d0 | 01 00 00 08 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6011f0 | 00 90 82 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601200 | 00 12 60 AB FF FF 00 00  80 DC 07 CC AA AA 00 00  |  ..`............. 
0xffffab601210 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601220 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601230 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

0xffffab601240 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601250 | 09 01 02 40 00 00 00 00  00 00 00 00 00 00 00 00  |  ...@............ 
0xffffab601260 | FF 7F 00 00 C6 18 CF 63  00 00 00 1C 00 00 00 00  |  .......c........ 
0xffffab601270 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601280 | 02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601290 | 00 00 00 00 00 00 00 00  00 13 60 AB FF FF 00 00  |  ..........`..... 
0xffffab6012a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6012b0 | 50 13 60 AB FF FF 00 00  C0 13 60 AB FF FF 00 00  |  P.`.......`..... 
0xffffab6012c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6012d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6012e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6012f0 | 80 13 60 AB FF FF 00 00  00 00 00 00 00 00 00 00  |  ..`............. 
0xffffab601300 | 03 10 13 60 AB FF FF 00  00 00 00 00 00 00 00 00  |  ...`............ 
0xffffab601310 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601320 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601330 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601340 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601350 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601360 | 00 C0 01 A3 FF FF 00 00  00 C0 03 A3 FF FF 00 00  |  ................ 
0xffffab601370 | 00 00 DC A2 FF FF 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601380 | 00 00 00 00 1F 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601390 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013c0 | 00 00 00 A1 FF FF 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013d0 | 01 00 00 08 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab6013f0 | 00 90 82 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 
0xffffab601400 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................ 

Job chain from G71 (using Hikey960)

job chain format seems the same, but the content can differ.

Some parts are same but some are different. For example, FF 7F 00 00 C6 18 CF 63 always appear in the 2nd line of each job chain but the its subsequent from both sides are different (G52 shows 20 but G71 shows 1C ).

2.1 Job chain analysis referring to Panfrost

We further analyze job chain layout based on the formats of structures that already reverse-engineered by Panfrost. (refer to midgard.xml and generate header file (midgard_pack.hby using gen_pack.py ).

The jobchain format varies in job types. Here we focus on compute job as we are looking into OpenCL first. Although not yet fully digging out, we somewhat understood the layout of job chain and its size.

layout

Job chain and buffers around it. All the structures is in the same AS

draw

Job header in the job chain. First a few bits indicate job type

While the example is from OpenCL compute job, the job type seems tiler (based on the format described in the Panfrost). But the job chain format is same as the compute job of which size is 192B.

draw

Draw section in the job chain

In the draw section, some GPU addrs get involved, each of which points to one of the followed buffers. Although the buffers and their purpose are not fully unveiled yet, some buffers contains data (e.g. in/output) or GPU binary addrs.

comparison

Job chain comparison between G52 (odroid) and G71 (hikey)

The green boxes are the different parts of jobchain and buffers between two models. 0x20 and 0x1C parts specified task job split, seem to be about parallelism.

The third box is the last reference buffer of draw section.

Layout of the last reference buffer of job chain

Note: after quick recording of G31 (ODROID-C4), it turns out the job chain looks quite similar to the one from G52 (as they are using the same platform, user and kernel drivers). However, the G31 seems to have 32-bit MMU with support of LPAE. This makes some differences in the page table entry and MMU configuration and thus the recorded from G31 cannot be directly replayed with G71.

RnR performance difference

When replaying record captured from both G52 (ODROID-N2) and G71 (Hikey960), we observed that there is about 3 - 4x performance gap. We guess that comes from differences between job chains.

However tunning the green boxes (in the figure above) does not remove the perf gap between two replays. So we move to the other ASs and register I/O to see if there is difference.

Address space: almost same except for the 5th space which contains some GPU addresses. However this is not the cause of perf diff.

Register I/O: because we are using different kdrivers for G52 and G71, the dumped registers I/Os are different, especially, in the beginning part. Note that we need to focus on the MMU and job submission part. For instance, focus on AS_TRANSCFG_LO, AS_MEMATTR_LO, and JS_AFFINITY_NEXT_LO. Regarding AS_XXX, it seem not affecting the perf although the values are different. However, in terms of JS_XX, the affinity stands for the # of cores to use. Let's see the differences as follows.

# captured by G71 (8 cores)
216: W reg 0x000018d0 val 000000ff [   JOB_CTRL |  JOB_SLOT1 | JS_AFFINITY_NEXT_LO (RW)]

# captured by G52 (2 cores)
759: W reg 0x000018d0 val 00000003 [   JOB_CTRL |  JOB_SLOT1 | JS_AFFINITY_NEXT_LO (RW)]

Different JS affinity captured from two GPUs

Note that although we simply change the value from 0x3 to 0xff, it does not speed up the GPU computation. The power could be one of the possible causes since it is possible that the reg I/O captured from G52 does not turn on all the available cores in the G71. So we just tried copy all the first part (GPU soft reset, core on, etc) of G71 to the one in the G52 which works with same performance we've achieved from RnR G71.

static u64 kbase_job_write_affinity(struct kbase_device *kbdev,
                base_jd_core_req core_req, 
                int js)
{               
    u64 affinity;

    if ((core_req & (BASE_JD_REQ_FS | BASE_JD_REQ_CS | BASE_JD_REQ_T)) ==
            BASE_JD_REQ_T) {
        /* Tiler-only atom */
        /* If the hardware supports XAFFINITY then we'll only enable
         * the tiler (which is the default so this is a no-op),
         * otherwise enable shader core 0.
         */
        if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_XAFFINITY))
            affinity = 1;
        else
            affinity = 0;
    } else if ((core_req & (BASE_JD_REQ_COHERENT_GROUP |
            BASE_JD_REQ_SPECIFIC_COHERENT_GROUP))) {
        unsigned int num_core_groups = kbdev->gpu_props.num_core_groups;
        struct mali_base_gpu_coherent_group_info *coherency_info =
            &kbdev->gpu_props.props.coherency_info;

        affinity = kbdev->pm.backend.shaders_avail &
                kbdev->pm.debug_core_mask[js];

        /* JS2 on a dual core group system targets core group 1. All
         * other cases target core group 0.
         */
        if (js == 2 && num_core_groups > 1)
            affinity &= coherency_info->group[1].core_mask;
        else
            affinity &= coherency_info->group[0].core_mask; // jin: fall here, core_mask = 0xff
    } else {
        /* Use all cores */
        EE("-- use all cores");
        affinity = kbdev->pm.backend.shaders_avail &
                kbdev->pm.debug_core_mask[js];
    }   

    kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_AFFINITY_NEXT_LO),
                    affinity & 0xFFFFFFFF);
    kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_AFFINITY_NEXT_HI),
                    affinity >> 32);

    return affinity;
}                   

Affinity varies depending on the core reqeust and GPU properties

The affinity is determined by kdriver. However, it depends on the core request from user runtime (e.g. coherent group) and GPU properties of client device's GPU which is loaded when the kdriver is initialized (e.g. # of cores).

Summary: if you want to replay the record captured from another device while using all the cores, check out if i) job chain is same, ii) all the available cores are on and iii) MMU and JS affinity are same.

NB: check out page table semantics. Different GPU model may have different pte configuration.