Optimizing calibration settings for accurate water equivalent path length assessment using flat panel proton radiography

Objective: Proton range uncertainties can compromise the effectiveness of proton therapy treatments. Water equivalent path length (WEPL) assessment by flat panel detector proton radiography (FP-PR) can provide means of range uncertainty detection. Since WEPL accuracy intrinsically relies on the FP-PR calibration parameters, the purpose of this study is to establish an optimal calibration procedure that ensures high accuracy of WEPL measurements. To that end, several calibration settings were investigated. Approach: FP-PR calibration datasets were obtained simulating PR fields with different proton energies, directed towards water-equivalent material slabs of increasing thickness. The parameters investigated were the spacing between energy layers (ΔE) and the increment in thickness of the water-equivalent material slabs (ΔX) used for calibration. 30 calibrations were simulated, as a result of combining ΔE = 9, 7, 5, 3, 1 MeV and ΔX = 10, 8, 5, 3, 2, 1 mm. FP-PRs through a CIRS electron density phantom were simulated, and WEPL images corresponding to each calibration were obtained. Ground truth WEPL values were provided by range probing multi-layer ionization chamber simulations on each insert of the phantom. Relative WEPL errors between FP-PR simulations and ground truth were calculated for each insert. Mean relative WEPL errors and standard deviations across all inserts were computed for WEPL images obtained with each calibration. Main results: Large mean and standard deviations were found in WEPL images obtained with large ΔE values (ΔE = 9 or 7 MeV), for any ΔX. WEPL images obtained with ΔE ≤ 5 MeV and ΔX ≤ 5 mm resulted in a WEPL accuracy with mean values within ±0.5% and standard deviations around 1%. Significance: An optimal FP calibration in the framework of this study was established, characterized by 3 MeV ≤ ΔE ≤ 5 MeV and 2 mm ≤ ΔX ≤ 5 mm. Within these boundaries, highly accurate WEPL acquisitions using FP-PR are feasible and practical, holding the potential to assist future online range verification quality control procedures.


Introduction
Range probing and proton radiography (PR) have been proposed as tools to detect and mitigate sources of range uncertainty (Mumot et al 2010). Based on the principle that the same particle is used for treatment and for imaging, PR enables a direct measurement of relative stopping power of tissues, overcoming the uncertainties arising from the conversion of CT numbers into relative stopping power (Schneider and Pedroni 1994, Schneider et al 2005, Knopf and Lomax 2013, Doolan et al 2015.
PR solutions, classified as list mode or integration detector configurations, were first developed in the context of double scattering proton therapy systems (Poludniowski et al 2015). List mode detector configurations are composed of upstream and/or downstream particle trackers, as well as a residual energy detector (Talamonti et al 2010, Johnson 2018. Integrating systems rely on a single detector such as diode arrays Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. (Gottschalk et al 2011, Testa et al 2013, Doolan et al 2015, scintillators with charge-coupled devices (Zygmanski et al 2000, Ryu et al 2008, or flat panels (FP) (Jee et al 2017a, Zhang et al 2018, which are typically calibrated to the water equivalent path length (WEPL) experimentally or via Monte Carlo simulations (Poludniowski et al 2015, Würl et al 2020. Given the growing prevalence of pencil beam scanning over double scattering systems, new PR integrating solutions compatible with pencil beam scanning were proposed (Mumot et al 2010, Telsemeyer et al 2012, Bentefour et al 2016.
Multiple studies have shown the suitability of PR for range verification with a multi-layer ionization chamber (MLIC-PR), which measures the integral depth-dose profiles of pencil beams (Mumot et al 2010, Farace et al 2016b. MLIC-PR enabled the detection of patient misalignments, range uncertainty assessment in different types of tissues, as well as in vivo range verification in head and neck cancer patients (Farace et al 2016b, Hammi et al 2018, Meijers et al 2021.
Other investigations with pencil beam scanning systems focus on PR imaging with flat panel detectors (FP-PR), which provide dose measurements in a two-dimensional detector array and offer larger readout areas with respect to MLIC. The WEPL of proton beams can be obtained using energy-resolved dose functions (ERDFs), first proposed by Bentefour et al as a solution to measure WEPL by FP-PR with pencil beam scanning systems. An ERDF represents the change in the FP signal as a function of different initial pencil beam energies composing the PR field (Bentefour et al 2016). The WEPL can be retrieved by comparing the ERDFs obtained from a PR acquisition against a set of calibrated ERDFs with slabs of known water equivalent thickness (Bentefour et al 2016, Huo et al 2019, Alaka et al 2020, Harms et al 2020. WEPL obtaining by means of FP-PR using ERDFs was investigated in silico and verified experimentally with an electron density phantom, achieving relative stopping power accuracy below 1.5% in silico and 2.65% experimentally (Huo et al 2019, Harms et al 2020. For a head and neck phantom, FP-PR simulations were performed, and FP-PR image acquisitions were evaluated qualitatively (Huo et al 2019, Harms et al 2020. Even though WEPL accuracy relies intrinsically on the sparseness of the FP calibration dataset (Harms et al 2020), research up to date has not yet provided an optimal FP calibration procedure, which is essential for accurate WEPL assessment using FP-PR. In this work, WEPL accuracy was assessed in silico as a function of different calibration parameters with the purpose to find an optimal setting for FP calibration.

FP calibration settings
In this study, different calibration settings were explored. Each simulated calibration contained a collection of ERDFs obtained by repeatedly delivering a PR field, composed of multiple energy layers, towards waterequivalent material slabs of increasing thickness. The calibration parameters subject to investigation were the spacing between energy layers in the PR field (ΔE), and the slab thickness increments (ΔX).
FP-PR simulations were performed using openREGGUI (openreggui.org) ( ). All simulations were performed with PR fields covering an area of 30×30 cm 2 at the isocenter in the x-y plane, with a spot spacing of 5 mm, delivered at initial energies ranging from 70 to 225 MeV, from a gantry angle of 270 degrees.
For each energy layer in the PR field, the FP signal was extracted by integrating the FP dose along the beam direction (over the z-axis), thus obtaining a two-dimensional array in the x-y plane corresponding to the FP signal. For the calibration datasets, the FP signal assigned to each energy layer and slab thickness, e.g. each data point in every ERDF, was obtained after averaging the FP signal over all the pixels covered by the PR field in the x-y plane. Figure 2 shows two exemplary calibration datasets, the first one is composed of 41 ERDFs (ΔX=2 mm and ΔE=3 MeV), and the second one contains 9 ERDFs (ΔX=10 mm and ΔE=9 MeV).

WEPL obtained via FP-PR
In order to evaluate the WEPL accuracy achievable with each calibration setting, FP-PR simulations were performed using an electron density phantom (model 062M by Computerized Imaging Reference Systems, Inc.).
The phantom consists of a large and a small ring, containing 16 inserts of 8 different tissue equivalent materials representing the following tissue types: lung (exhale), adipose, muscle, dense bone, lung (inhale), breast, liver and trabecular bone.
An ERDF was obtained for each pixel in the FP-PR images of the phantom. WEPL values were obtained by minimizing the squared difference between each ERDF in a phantom FP-PR image and the ERDFs in a chosen calibration dataset. To allow comparison between ERDFs in the FP-PR images and ERDFs in the calibration, all ERDFs were normalized over their area. A cubic spline interpolation was applied to all ERDFs with ΔE >1 MeV, in order to have data points every 1 MeV in all calibration datasets and imaging PR fields. A linear interpolation  ERDFs are represented in different colors, corresponding to thicknesses from 0 to 80 mm. For each plot, the left-most ERDF corresponds to X=0 mm and the right-most ERDF corresponds to X=80 mm. The legend in the left side plot is omitted for readability.
across ERDFs corresponding to slab thicknesses not present in the calibration dataset was performed during the minimization process.

WEPL obtained via MLIC-PR (ground truth)
Ground truth WEPL values were provided by a range probing MLIC simulation (MLIC-PR) performed for each insert of the phantom. In the simulations, the MLIC was represented in the CT image by a water block of 30 cm of thickness at the exit of the phantom in the beam direction. The energy of each range probe was 210 MeV, and an isotropic dose grid of 1 mm was used in all directions. Integral depth dose profiles were obtained by integrating the dose in the dimensions perpendicular to the beam direction. The WEPL value corresponding to each insert was obtained using the Bragg peak pull-back method, with respect to a MLIC simulation in air (Huo et al 2019, Harms et al 2020. Calibration assessment WEPL accuracy was quantified in terms of WEPL relative errors (%), to determine the suitability of each calibration setting. WEPL relative errors between the ground truth WEPL values obtained from MLIC-PR simulations and the values obtained from FP-PR simulations in each insert were calculated (Harms et al 2020). In the WEPL images obtained by means of FP-PR, regions of interest of 10 mm were selected to extract the mean WEPL value in each insert.
The mean and standard deviation of the relative WEPL errors across all inserts was reported for images obtained with all calibration settings. Furthermore, the variability of the WEPL accuracy was reported as a function of different ΔX with a fixed ΔE, as well as for varying ΔE with a fixed ΔX.

Results
Thirty WEPL images of the electron density phantom were obtained making use of each calibration setting. Figure 3 shows two example WEPL images, obtained with the two calibration datasets depicted in figure 2. Figure 4 shows the mean and standard deviations extracted from each WEPL image, corresponding to each calibration setting. Mean and standard deviations are greatest for calibration settings with the largest ΔX and ΔE. Furthermore, figure 4 shows that large deviations are found for large ΔE (ΔE=9 or 7 MeV), regardless of the selected ΔX.
The lowest mean and standard deviations are found for settings with the smallest ΔX and ΔE. Generally, settings with ΔX5 mm, and ΔE5 MeV show mean values within ±0.5% and standard deviations around 1%. Figure 5 shows the variability of the mean and standard deviations (error bars) as a function of varying ΔE or ΔX separately. Standard deviations experience a great reduction as a function of decreasing ΔE, with values from −15% to 15% for ΔE=9 MeV towards values within ±1% for ΔE=1 MeV. Standard deviations had a moderate reduction as a function of decreasing ΔX, laying from −2% to 1% for ΔX=10 mm and from −1.2% to 0.5% for ΔX=1 mm.

Discussion
The suitability of multiple FP-PR calibration settings was assessed by means of relative WEPL errors, to determine an optimal calibration setting in terms of ΔE and ΔX that enables accurate WEPL measurements. As shown in figure 4, WEPL images of an electron density phantom obtained with ΔE5 MeV and ΔX5 mm resulted in a WEPL accuracy with mean values within ±0.5% and standard deviations around 1%. Figure 4 shows that WEPL accuracy strongly depends on the sparseness of the calibration dataset (Harms et al 2020). WEPL images obtained with the sparsest calibration settings (largest ΔE and ΔX) resulted in the largest deviations, especially for lung and bone equivalent tissue inserts (see table s1 (available online at stacks. iop.org/PMB/66/21NT02/mmedia) and figure s1 in supplementary material). For calibration settings with ΔE5 MeV and ΔX5 mm, relative WEPL errors were reduced across all inserts, although higher relative  WEPL errors were found in inserts corresponding to lung equivalent tissues with respect to other inserts (see figure s1) (Harms et al 2020). Lung equivalent inserts have the lowest densities, meaning that a sub-millimeter absolute WEPL error can result in a relative WEPL error of up to −2.5%. The ground truth WEPL values used to calculate relative WEPL errors were as well obtained with sub-millimeter accuracy, making use of the pull-back method (Farace et al 2016a, 2016b, Meijers et al 2021. ΔE and ΔX were investigated separately in figure 5, showing that ΔE has a stronger impact than ΔX in the WEPL accuracy. This is due to the fact that the characteristic steep dose increase in an ERDF gets smoothed out by the cubic interpolation performed within data points in an ERDF (across the energy dimension). In that case, the optimization process in which ERDFs in the calibration dataset are compared against ERDFs from a FP-PR image of the phantom is more inaccurate. On the contrary, ΔX does not show a strong impact on WEPL accuracy. Linear interpolation between ERDFs corresponding to different slab thicknesses is successfully performed since all ERDFs in a calibration dataset have a similar shape.
Mean and standard deviation values are comparable for calibration settings with ΔE=3 MeV or ΔE=1 MeV, as well as for settings with ΔX=2 mm or ΔX=1 mm. However, a calibration dataset with ΔE=1 MeV or ΔX=1 mm would result in a highly time consuming FP calibration dataset acquisition. For practicability, optimal calibration settings within the framework of this study were restricted to 3 MeVΔE5 MeV and 2 mmΔX5 mm. Table 1 shows a comparison between the WEPL accuracy achieved in other studies against the WEPL accuracy obtained in this study for an exemplary FP calibration setting chosen within the optimality boundaries. Huo et al chose small ΔE and ΔX, and obtained a WEPL accuracy similar to the one achieved in this study with ΔE=3 MeV and ΔX=5 mm. Harms et al opted for an experimental acquisition of a calibration dataset with large ΔX, resulting in larger errors in bone and lung equivalent materials.
The implemented procedure to assign a WEPL value to an ERDF extracted from the FP-PR of the phantom was previously described by other studies (Huo et al 2019, Harms et al 2020. As shown in table 1, the achievable accuracy between this study and previous studies is comparable.
In this study, an optimal FP calibration procedure in terms of ΔE and ΔX was determined, which is essential to bring PF-PR acquisitions towards a clinical application. However, acquisition time and imaging dose remain as limitations of FP-PR (Harms et al 2020). Parameters like the spot spacing, the number of energy layers or the energy range remain to be optimized to preserve high WEPL accuracy while reducing the acquisition time and the imaging dose. In this study, FP-PR fields had energies from 70 to 225 MeV, which resulted in many pencil beams stopping inside the phantom. Therefore, it is imperative to develop a methodology that excludes the lowest energy layers that would get absorbed in a patient (Huo et al 2019, Harms et al 2020. Pencil beams in the PR fields directed to the electron density phantom went across homogeneous tissue equivalent materials. However, range mixing will certainly impact FP-PR images acquired for patients, where pencil beams intersect a wide variety of tissues, resulting in ERDFs with a less steep dose increase and a slower dose fall off (Huo et al 2019). Range mixing can potentially hamper the optimization process in which ERDFs in the calibration dataset and ERDFs acquired from a patient are compared. Therefore, the performance of the optimization process when ERDFs are subject to range mixing should be investigated. Furthermore, a methodology to include range mixing in the calibration dataset or in the optimization process could be developed, for instance by means of signal deconvolution (Hammi et al 2018) or artificial intelligence (van der Heyden et al 2021).
In this work, high WEPL accuracy with optimal calibration parameters was achieved by means of FP-PR, which suggests that FP-PR could serve as an online range verification tool. FP-PR could be employed for the detection of setup errors, CT calibration curve errors or anatomical variations. Furthermore, a simultaneous detection of multiple sources of range uncertainty using FP-PR could be automated and integrated into adaptive proton therapy workflows (Seller Oria et al 2020).

Conclusion
An optimal FP calibration procedure in the framework of this study has been established, characterized by 3 MeVΔE5 MeV and 2 mmΔX5 mm. Within these boundaries, highly accurate WEPL acquisitions by means of FP-PR are feasible and practical, which could assist future online range verification quality control procedures.