Seregon/PkgToolBox

Toolbox for analyzing and editing pkg application files for psp,ps3, ps4 and ps5, includes the most useful functions you might need.

Python/57.3 KB/No license
docs/HOWWORKS.md
PkgToolBox / docs / HOWWORKS.md
1# Technical Analysis of the Decryption Process for PlayStation 4 PKG and PFS File Formats
2 
3**Date:** May 23, 2025
4 
5**Analysis Author:** SeregonWar
6 
7**Primary Source:** C++ source code fragments (from the shadPS4 Emulator project) [shadPS4](https://github.com/shadps4-emu/shadPS4) and associated header files (`pkg.h`, `pfs.h`, `crypto.h`, `keys.h`, etc.).
8 
9## Abstract
10 
11This document outlines an in-depth technical analysis of the parsing and decryption mechanisms employed for the PKG (Package) and PFS (PlayStation File System) file formats specific to the PlayStation 4 console. Based on an examination of the provided source code, this paper details the fundamental data structures, the sequence of cryptographic operations, and the logical workflow required for the extraction and access of protected content. It highlights a multi-layered security system leveraging standard algorithms such as RSA-2048, AES-128-CBC, and AES-128-XTS, along with platform-specific key derivation processes. Python code examples, derived from the translation of the C++ logic, will be used to illustrate crucial steps.
12 
13## 1. Introduction
14 
15PKG files serve as the primary distribution container for software on the PlayStation 4 platform, encapsulating games, applications, patches, and downloadable content. Within these containers, application data is frequently organized into a PFS filesystem image, which is itself subject to encryption. Accessing this content necessitates a detailed understanding of the PKG file format, the identification and decryption of a hierarchy of cryptographic keys, and the subsequent interpretation and decryption of the PFS image. This paper aims to dissect this process as it emerges from the analysis of the shadPS4 Emulator project's source code.
16 
17## 2. PKG File Format Analysis
18 
19The PKG file is a structured container format, its layout primarily defined by its header and a table of entries describing internal metadata.
20 
21### 2.1. PKG File Header (`PKGHeader`)
22 
23The PKG file header, conforming to the definition in `pkg.h`, typically occupies the first 4096 bytes (0x1000) of the file. It contains metadata crucial for interpreting the rest of the package. Among the most significant fields, expressed in Big Endian format, are:
24 
25* **`magic` (u32_be):** A fixed identifier, `0x7F434E54` (corresponding to the ASCII string ".CNT"), which validates the file format.
26* **`pkg_table_entry_offset` (u32_be) and `pkg_table_entry_count` (u32_be):** Indicate the offset and number of entries in the PKG's file table, respectively. This table primarily lists metadata files (e.g., `param.sfo`, license files).
27* **`pkg_content_id` (u8[0x24]):** A 36-byte unique content identifier. From this string, typically at file offset `0x47` (skipping the first 7 characters of the `content_id` read from the header, which itself starts at file offset `0x30`), the `pkgTitleID` (9 characters, e.g., "CUSAXXXXX") is extracted.
28* **`pfs_image_offset` (u64_be) and `pfs_image_size` (u64_be):** Locate the encrypted PFS filesystem image within the PKG file.
29* **`pfs_cache_size` (u32_be):** A parameter used to size intermediate buffers during PFS processing, particularly for its initial decryption and the localization of the PFSC substructure.
30* **Various SHA256 Digests:** The header contains SHA256 hashes of different PKG sections (e.g., `digest_table_digest`, `digest_body_digest`, `pfs_image_digest`), used for integrity verification.
31* **`pkg_content_flags` (u32_be):** Flags providing contextual information about the content type (e.g., `PKGContentFlag.FIRST_PATCH`, `PKGContentFlag.REMASTER`).
32 
33```python
34# Example of PKGHeader class definition in Python (simplified)
35@dataclass
36class PKGHeader:
37 # ... (full field definition as per previous Python code) ...
38 _FORMAT_FULL = ">IIIIHHI..." # Big Endian notation for struct.unpack
39 
40 magic: int
41 pkg_table_entry_offset: int
42 pkg_table_entry_count: int
43 pkg_content_id: bytes # 36 bytes
44 pfs_image_offset: int
45 pfs_image_size: int
46 pfs_cache_size: int
47 # ... other fields ...
48 
49 @classmethod
50 def from_bytes(cls, data: bytes):
51 # ... (unpacking logic) ...
52 return cls(*values)
53```
54 
55### 2.2. PKG Entry Table (`PKGEntry`)
56 
57Located at the offset specified by `pkg_header.pkg_table_entry_offset`, this table is an array of `PKGEntry` structures. Each `PKGEntry` (32 bytes, Big Endian, defined in `pkg.h`) describes a metadata file, generally destined for the virtual `sce_sys/` directory upon extraction.
58 
59* **`id` (u32_be):** A numerical identifier for the file type. A mapping, such as the one provided in `pkg_type.cpp`, translates these IDs into standard filenames (e.g., `0x1000` -> "param.sfo").
60* **`filename_offset` (u32_be):** If applicable, an offset into a filename table (often an entry with `id = 0x0200`, "entry_names").
61* **`offset` (u32_be):** The absolute offset, from the beginning of the PKG file, where the data for this entry is located.
62* **`size` (u32_be):** The size in bytes of the entry's data.
63* **`flags1` (u32_be), `flags2` (u32_be):** Contain flags that, among other things, can indicate if the entry is encrypted and which key index to use for decryption.
64 
65```python
66# Example of PKGEntry class definition in Python
67@dataclass
68class PKGEntry:
69 _FORMAT = ">IIIIIIQ" # id, filename_offset, flags1, flags2, offset, size, padding
70 _SIZE = struct.calcsize(_FORMAT)
71
72 id: int
73 filename_offset: int
74 flags1: int
75 flags2: int
76 offset: int
77 size: int
78 padding: int # u64_be
79 name: str = "" # Added for convenience
80 
81 # ... (from_bytes method) ...
82```
83 
84## 3. PKG Entry Decryption Flow (`sce_sys` Content)
85 
86Several entries within the PKG, particularly those related to NPDRM (e.g., `nptitle.dat`, `npbind.dat`, with IDs like `0x0400`, `0x0401`, `0x0402`, `0x0403`), are encrypted. Their decryption follows these key steps, as implemented in `PKG::Extract` and `Crypto` (from `crypto.cpp` and `keys.h`):
87 
88### 3.1. Derivation of `dk3_` (Derived Key 3)
89 
901. **Accessing "entry_keys":** The PKG entry with `id = 0x0010` (identified as "entry_keys") is read.
912. **Reading Encrypted Keys:** From this entry's offset, various digests and an array of seven keys (`key1` in `pkg.cpp`), each 256 bytes, are read.
923. **RSA Decryption of `key1[3]`:** The fourth key in this array (`key1[3]`) is decrypted using the RSA-2048 algorithm with the PKCS#1 v1.5 padding scheme. The RSA private key employed for this operation is specified as `PkgDerivedKey3Keyset` in the `keys.h` file.
93 ```python
94 # Conceptual example of RSA decryption in Python
95 # self.dk3_ is a bytearray(32)
96 # key1_list[3] are the 256 encrypted bytes
97 # self.crypto._key_pkg_derived_key3 is the PyCryptodome RSA key object
98
99 # In RealCrypto.RSA2048Decrypt:
100 # key_to_use = self._key_pkg_derived_key3 if is_dk3 else self._key_fake
101 # cipher_rsa = Cipher_PKCS1_v1_5.new(key_to_use)
102 # decrypted_data = cipher_rsa.decrypt(ciphertext, None)
103 # output_key_buffer[:bytes_to_copy] = decrypted_data[:bytes_to_copy]
104 ```
1054. **Resulting `dk3_`:** The output of this RSA decryption is a 32-byte key, denoted `dk3_`.
106 
107### 3.2. Derivation of Image Key (`imgKey`) and EKPFS Key (`ekpfsKey`)
108 
1091. **Accessing "image_key":** The PKG entry with `id = 0x0020` ("image_key") is read.
1102. **Reading Encrypted Data:** From this entry's offset, 256 bytes of encrypted data, termed `imgkeydata`, are read.
1113. **Generating `ivKey`:**
112 * A 64-byte buffer is prepared. The first 32 bytes are a copy of the raw bytes of the `PKGEntry` (ID `0x0020`) read from the table. The subsequent 32 bytes consist of the `dk3_` key (obtained in the previous step).
113 * A SHA256 hash is computed over this 64-byte buffer. The resulting 32-byte digest is the `ivKey`.
114 ```python
115 # In RealCrypto.ivKeyHASH256:
116 # h = SHA256.new()
117 # h.update(cipher_input_64_bytes)
118 # digest = h.digest() # 32 bytes
119 # ivkey_result_buffer[:] = digest
120 ```
1214. **AES Decryption of `imgkeydata`:**
122 * The `ivKey` (32 bytes) is split: the first 16 bytes serve as the AES Initialization Vector (IV), while the next 16 bytes constitute the AES-128 key.
123 * The 256 bytes of `imgkeydata` are decrypted using AES-128 in CBC (Cipher Block Chaining) mode with the newly derived key and IV. The result of this operation is the `imgKey` (256 bytes).
124 ```python
125 # In RealCrypto.aesCbcCfb128Decrypt:
126 # key_aes = ivkey[16:32]
127 # iv_aes = ivkey[0:16]
128 # cipher_aes = AES.new(key_aes, AES.MODE_CBC, iv_aes)
129 # decrypted_data = cipher_aes.decrypt(ciphertext_256_bytes)
130 # decrypted_buffer[:] = decrypted_data # self.imgKey
131 ```
1325. **RSA Decryption of `imgKey`:**
133 * The `imgKey` (256 bytes) is further processed. It is decrypted using RSA-2048 (PKCS#1 v1.5), but this time with a different RSA private key, named `FakeKeyset` in `keys.h`.
134 * The output of this second RSA decryption is the `ekpfsKey` (Entitlement Key for PFS), a 32-byte key crucial for the subsequent PFS filesystem decryption.
135 
136### 3.3. Specific Decryption of NPDRM Entries
137 
138For NPDRM type entries (e.g., `nptitle.dat`, `npbind.dat`):
1391. **Specific `ivKey` Generation:** Similar to step 3.2.3, a 64-byte buffer is constructed by concatenating the bytes of the current NPDRM `PKGEntry` with `dk3_`. A SHA256 hash of this buffer produces an `ivKey` specific to this entry.
1402. **AES Decryption:** The encrypted data of the NPDRM entry is decrypted using AES-128-CBC, with the key and IV derived from this specific `ivKey`. The decrypted result overwrites the original entry's data in the extraction path.
141 
142## 4. PFS Image Decryption and Parsing
143 
144The PFS image, located via `pkg_header.pfs_image_offset` and `pkg_header.pfs_image_size`, contains the actual filesystem of the game/application.
145 
146### 4.1. Derivation of PFS Data and Tweak Keys (`dataKey`, `tweakKey`)
147 
1481. **Reading the PFS Seed:** A 16-byte cryptographic `seed` is read from a fixed offset (`0x370`) relative to the start of the PFS image (i.e., `pkg_header.pfs_image_offset + 0x370`).
1492. **Key Generation via HMAC-SHA256:** The `PfsGenCryptoKey` function (in `Crypto`) uses the `ekpfsKey` (32 bytes, obtained in step 3.2.5) and the PFS `seed` (16 bytes) to generate two 16-byte keys: `dataKey` and `tweakKey`.
150 * A 20-byte payload is constructed by concatenating a fixed index (`1`, as a `u32`) with the `seed`.
151 * An HMAC-SHA256 of this payload is computed, using `ekpfsKey` as the HMAC key.
152 * The resulting HMAC digest (32 bytes) is split: the first 16 bytes become the `tweakKey`, and the subsequent 16 bytes become the `dataKey`.
153 ```python
154 # In RealCrypto.PfsGenCryptoKey:
155 # hmac_sha256 = HMAC.new(ekpfs_32_bytes, digestmod=SHA256)
156 # index_bytes = struct.pack("<I", 1) # Little Endian u32 for index = 1
157 # d_payload = index_bytes + seed_16_bytes # Total 20 bytes
158 # hmac_sha256.update(d_payload)
159 # data_tweak_key_digest = hmac_sha256.digest() # 32 bytes
160 # tweakKey_buffer[:] = data_tweak_key_digest[0:16]
161 # dataKey_buffer[:] = data_tweak_key_digest[16:32]
162 ```
163 
164### 4.2. Initial PFS Image Decryption and PFSC Localization
165 
1661. **Partial Read and Decryption:** A portion of the encrypted PFS image is read from the PKG file. The size of this portion, according to the analyzed C++ code, is `pkg_header.pfs_cache_size * 2`. If `pfs_cache_size` is zero, this phase and subsequent PFS parsing are typically skipped.
1672. **AES-XTS Decryption:** This portion is decrypted using the AES-128-XTS algorithm with the `dataKey` and `tweakKey`. XTS decryption operates on 0x1000-byte (4 KiB) blocks, and the sector number (starting from 0 for this initial decryption) is used to calculate the initial tweak for each XTS block.
1683. **`PFSC_MAGIC` Search:** Within the (partially) decrypted PFS image buffer, the magic number `PFSC_MAGIC` (0x43534650, ASCII "PFSC", Little Endian) is searched to determine the `pfsc_offset_in_pfs_image`. This offset is relative to the start of the decrypted PFS image buffer and indicates the beginning of the PFSC data structure. The search typically occurs at 0x10000-byte intervals, starting from offset 0x20000.
169 
170### 4.3. Parsing the PFSC Header (`PFSCHdrPFS`) and Sector Map (`sectorMap`)
171 
172Starting from `pfsc_offset_in_pfs_image` (within the decrypted PFS buffer), the `PFSCHdrPFS` structure (defined as `PFSCHdr` in `pfs.h` and used in `pkg.cpp`) is read and interpreted:
173 
174* **`magic` (s32):** Should match `PFSC_MAGIC`.
175* **`data_length` (s64):** Total length of the data managed by this PFSC structure.
176* **`block_sz2` (s64):** Size of the logical, decompressed data blocks (typically 0x10000 bytes or 64 KiB).
177* **`block_offsets` (s64):** Offset (relative to the start of the PFSC structure) of the table (`sectorMap`) that maps logical blocks to their (compressed or uncompressed) data within the PFSC data area.
178 
179The number of logical data blocks is calculated: `num_data_blocks_in_pfsc = pfs_chdr_obj.data_length // pfs_chdr_obj.block_sz2`.
180The `sectorMap` is then read: it is an array of `num_data_blocks_in_pfsc + 1` offsets (`u64`). Each `sectorMap[i]` indicates the start offset of logical block `i` within the PFSC data area, and `sectorMap[i+1] - sectorMap[i]` gives its (potentially compressed) size.
181 
182### 4.4. Extraction and Decompression of PFSC Logical Data Blocks
183 
184An iteration is performed from `0` to `num_data_blocks_in_pfsc - 1`. For each logical block:
1851. The offset (`sector_offset_in_pfsc_data`) and size (`sector_data_size`) of the data block are retrieved from the `sectorMap`. This data is relative to the start of the PFSC data area (i.e., after `pfsc_offset_in_pfs_image` in the decrypted PFS buffer, and after the `PFSCHdrPFS` header if `block_offsets` points after it, within the `pfsc_content_actual_bytes` buffer).
1862. The data block is extracted.
1873. If its size (`sector_data_size`) is less than `pfs_chdr_obj.block_sz2` (e.g., < 0x10000), the data block is considered compressed and is decompressed using zlib (function `DecompressPFSC`) into a temporary buffer of size `block_sz2`. Otherwise, it is copied directly.
188 
189### 4.5. Parsing Inodes (`Inode`)
190 
191The first decompressed logical PFSC block (`i_block_pfsc = 0`) acts as the PFS "superblock" and contains, among other information, the total number of inodes (`ndinode_total_count`) at offset `0x30`. Subsequent logical blocks (from 1 up to a calculated `occupied_inode_blocks` based on `ndinode_total_count` and `sizeof(Inode)`) contain the inode table.
192 
193Each `Inode` structure (defined in `pfs.h`, 0xA8 bytes in size) provides metadata for a file or directory:
194* **`Mode` (u16):** File type (directory, regular file, etc., as specified by bits in `InodeModePfs`) and permissions.
195* **`Size` (s64):** Actual size of the file in bytes.
196* **`Blocks` (u32):** Number of logical data blocks (of size `block_sz2`) occupied by the file.
197* **`loc` (u32):** Index of the file's first data block in the `sectorMap`. This `loc` is an index relative to the start of the data area managed by PFSC, not an absolute index into `sectorMap`.
198 
199### 4.6. Parsing Directory Entries (`Dirent`)
200 
201Logical PFSC blocks following the inode blocks contain directory entries (`Dirent`, from `pfs.h`). The presence of "." and ".." entries is a common indicator of a dirent block.
202 
203Each `Dirent` provides:
204* **`ino` (s32):** The inode number to which this entry refers.
205* **`type` (s32):** Entry type, with values mapping to `PFSFileType` (e.g., `PFSFileType.PFS_FILE = 2`, `PFSFileType.PFS_DIR = 3`).
206* **`namelen` (s32):** Length of the file/directory name.
207* **`entsize` (s32):** Total size of this `Dirent` structure, used to advance to the next entry.
208* **`name` (char[512]):** The null-terminated file/directory name (actual length given by `namelen`).
209 
210During this parsing stage, an `fs_table` (list of `FSTableEntry`) and an `extract_paths` map (from inode number to `pathlib.Path`) are constructed to rebuild the filesystem hierarchy. The management of `current_dir_pfs` (the current PFS directory during parsing) and the correct determination of the PFS root path (influenced by the `uroot_reached` logic or the first "." entry) are crucial. Directories are created on disk as they are identified.
211 
212## 5. Actual File Extraction from PFS
213 
214Once the PFS structure has been parsed (inodes and dirents are available), individual files are extracted:
215 
2161. For each `FSTableEntry` representing a file:
217 * The corresponding `Inode` object is retrieved (the indexing of `iNodeBuf` using `fs_entry.inode` is a critical point requiring correct mapping or assumptions about inode compactness).
218 * From the inode, `loc` (index of the file's first block in `sectorMap`) and `Blocks` (number of blocks composing the file) are obtained.
2192. An iteration is performed for `Blocks` times, processing one logical block of the file at a time:
220 * **Locating Encrypted Data:** The current block's index in `sectorMap` is `loc + j_block_in_file`. From this, `sector_offset_in_pfsc_img` (offset of the data block within the PFSC data area) and `sector_data_actual_size` (size of this block, compressed or not) are obtained.
221 * **Calculating PKG Offset:** The absolute offset (`absolute_sector_data_offset_in_pkg`) of this data block within the original PKG file is determined.
222 * **Identifying XTS Block:** The XTS block number (0x1000 bytes, `xts_block_num_in_pfs_image`) containing the start of this sector's data, and the offset (`offset_of_sector_in_its_xts_block`) of the sector's data within that XTS block, are calculated.
223 * **Reading and XTS Decryption:** A chunk of data (0x11000 bytes in the C++ code, `read_chunk_from_pkg`) is read from the PKG file starting at the identified XTS block's beginning. This read chunk is decrypted using `decryptPFS` (AES-128-XTS) with `dataKey`, `tweakKey`, and `xts_block_num_in_pfs_image` as the initial XTS sector number. The result is stored in `decrypted_chunk_from_pkg`.
224 * **Extracting and Decompressing Sector Data:** The actual sector data (of size `sector_data_actual_size`) is extracted from `decrypted_chunk_from_pkg` using `offset_of_sector_in_its_xts_block`. If compressed, it is decompressed using zlib.
225 * **Writing File:** The decompressed block (0x10000 bytes or `block_sz2`) is written to the output file. For the final block of the file, only the amount of data needed to reach the total file size specified in the inode is written.
226 
227```python
228# Concept of PFS file extraction (simplified)
229# In PKG.extract_pfs_files:
230# for fs_entry in self.fs_table:
231# if fs_entry.type == PFSFileType.PFS_FILE:
232# inode_obj = self.iNodeBuf[fs_entry.inode - 1] # Assuming 1-based compact
233# output_path = self.extract_paths[fs_entry.inode]
234#
235# with open(output_path, "wb") as out_f:
236# for j_block in range(inode_obj.Blocks):
237# # ... locate sector_offset_in_pfsc_img, sector_data_actual_size from sectorMap ...
238# # ... calculate read_start_pos_in_pkg, xts_block_num_in_pfs_image, offset_of_sector_in_its_xts_block ...
239#
240# pkg_file.seek(read_start_pos_in_pkg)
241# read_chunk = pkg_file.read(0x11000)
242# decrypted_chunk = bytearray(len(read_chunk)) # Ensure multiple of 0x1000 or handle in decryptPFS
243 # C++ uses fixed 0x11000 buffers for decryptPFS.
244 # Make read_chunk & decrypted_chunk sizes
245 # a multiple of 0x1000, padding if necessary.
246 # Pad read_chunk if it's not a multiple of 0x1000 for decryptPFS
247 # effective_read_chunk_len = (len(read_chunk) + 0xFFF) & ~0xFFF if len(read_chunk) % 0x1000 != 0 else len(read_chunk)
248 # padded_read_chunk = bytearray(effective_read_chunk_len)
249 # padded_read_chunk[:len(read_chunk)] = read_chunk
250 # decrypted_chunk = bytearray(effective_read_chunk_len)
251 
252# self.crypto.decryptPFS(self.dataKey, self.tweakKey,
253# padded_read_chunk, # Or unpadded chunk if decryptPFS handles it
254# decrypted_chunk,
255# xts_block_num_in_pfs_image)
256#
257# sector_data = decrypted_chunk[offset_of_sector_in_its_xts_block :
258# offset_of_sector_in_its_xts_block + sector_data_actual_size]
259#
260# if sector_data_actual_size == 0x10000: # block_sz2
261# decompressed_data = sector_data
262# else:
263# decompressed_data = decompress_pfsc(sector_data, 0x10000) # block_sz2
264#
265# # ... write decompressed_data to out_f, handling the last block ...
266```
267 
268## 6. Cryptographic Algorithm Summary
269 
270The security system employs a combination of standard algorithms:
271 
272* **RSA-2048 (PKCS#1 v1.5 Padding):** Used for the asymmetric decryption of critical intermediate keys. The private keys (`PkgDerivedKey3Keyset` for `dk3_`, `FakeKeyset` for `ekpfsKey`) are defined via their components (n, e, d, p, q, coefficient).
273* **SHA256:** Employed as a hash function for deriving `ivKey` and for integrity digests in the PKG header.
274* **AES-128-CBC:** Used for symmetric decryption of `imgKey` (using `ivKey`) and NPDRM entries (using an entry-specific `ivKey`). The key and IV (both 16 bytes) are typically derived from a 32-byte SHA256 hash.
275* **HMAC-SHA256:** Utilized in the `PfsGenCryptoKey` function to derive PFS `dataKey` and `tweakKey` from `ekpfsKey` and a `seed`.
276* **AES-128-XTS (Advanced Encryption Standard with Xor-Encrypt-Xor based Tweaked CodeBook Mode):** The primary algorithm for sector-level decryption of the PFS image and files within it. It requires a `dataKey` and a `tweakKey`. The "tweak" (a value that modifies the cipher for each block, based on the sector number) is encrypted with `tweakKey` (AES-ECB). This encrypted tweak is then iteratively updated for each 16-byte AES block within the XTS sector via a polynomial multiplication in the Galois Field GF(2^128) (polynomial `x^128 + x^7 + x^2 + x + 1`, or `0x87` if the highest bit is carried).
277 
278## 7. Conclusions and Considerations
279The implementation of what is described can be found in [ShadPKG](https://github.com/seregonwar/ShadPKG), This technical paper was written to explain the technical vulnerabilities of pkg files and to give more technical information to those who have asked me in the last year if the files can be decrypted and modified, I am still implementing this system in PkgToolBox, once implemented anyone will be able to decrypt most of the applications and games currently available in the ps4 landscape, (a ps5 version may sooner or later come out as well).
280