study streaming multigrid for gradient domain operations on large images
TRANSCRIPT
Streaming Multigrid for Gradient-Domain Operations on Large
ImagesMichael Kazhdan, Johns Hopkins University
Hugues Hoppe, Microsoft Research
SIGGRAPH 08
Abstract
• Develop a streaming multigrid solver – 2 passes over out-of-core data
1. Solver of Poisson eq. with unconstrained boundary conditions
2. Construct Relaxation, Restriction and Prolongation operators in multigrid method based on the B-spline basis
3. Build up a framework to pipeline multigrid with a window of rows of images
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Gradient-domain problem as Poisson solution
Solve the problem that
• Find U to minimize
• The Poisson equation
Image processing on gradient-domain
• Lighting removal by zeroing small gradient [Horn 74]
• HDR image tone-mapped by adaptively attenuating luminance gradients [Weiss 01]
• Overlapping images stitched seamlessly by merging gradients [P´erez et al.03; Agarwala et al.04;Levin et al.04]
• Shadow removal by zeroing large luminance gradients in regions of constant chromaticity [Finlayson et al. 02]
• Undesirable reflections removed in flash and ambient image pairs [Agrawal et al. 05]
• Photographic tone management is improved using gradient constraints [Bae et al. 06]
Painting on gradient-domain
• Painting with interactive gradient-domain modeling [McCann & Pollard 08]
• Diffusion curves [Orzan et al. 08]
Large images on gradient-domain
• Processing of large images [Kopf et al. 07]
• GB pixel images are too large to fit main memory
• Direct solution (ex: Cholesky factorization) is impractical
• Iteration techniques (ex: conjugate gradients and multigrid method) are in-efficient– Requiring many iterations over out-of-core data
Standard multigrid V-cycle
Contributions
• A general, accurate and efficient solver for Poisson eq. over out-of-core images
– Sufficient accuracy in 2 V-cycles on GB images• 2nd order B-spline finite elements gets more
accuracy than traditional finite element in multigrid method
– Efficiency on temporally locality• Small moving windows of data in memory• Pipelined V-cycle : 3 Gauss-Seidel relaxations,
restriction and prolongation
Contributions
• On a single CPU core– Solve a 3-channel, 16-MB gradient-domain
problem with an rms error on the order of 10−5 in 15 seconds.
• High ratio of local computation to memory bandwidth– Temporal locality in L1 cache
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Solvers of Poisson equation
• Iterative solvers– Gauss-Seidel and conjugate gradient
• Memory-efficient • Require many iterations
– Reduce # of iterations using multi-resolution pre-conditioners [Gortler and Cohen 95; Szeliski 06]
– Reduce # of iterations using multigrid solvers [Brandt 77; Briggs et al. 00]
– GPU + multigrid [Bolz et al. 03; Goodnight et al. 03; G¨oddeke et al. 08]
Out-of-core• Multigrid are difficult to schedule out-of-core [Toledo 99]
• Solve out-of-core problem on a coarser-resolution grid and then upsample the resulting approximation [Kopf et al. 07a]– Can not maintain robust sharp features
• Adaptive partition to solve Poisson eq.For image stitching, – Poisson eq. in the image seams [Agarwala 07] – Poisson eq. in the context of fluid flow simulation [Losasso et al. 04]– Poisson eq. in the surface reconstruction [Kazhdan et al. 06].
• Our method addresses the general case where1. The Poisson eq. is solved accurately everywhere2. The problem size can not be reduced3. No initial solution guess is available
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Gradient-domain image processing
),(),(),(
),(),(
),(,),(
2
2
2
2
yxfy
yxU
x
yxU
yxGUyxU
fU
yxGy
U
x
UyxU
GU
Poisson Equation
▽ u
h
uu
x
iUu ii
i
1)(
x0 x1xi
xN
u0u1
uiuN
h
ui+1
ΔU
211
211
12
2
1
2)(
)()(
)(
)(
h
uuuh
uuuuh
uu
x
iUu
h
uu
x
iUu
iii
iiii
iii
iii
xi+1xi-1 xi
xN
ui-1uN
h
ui+1ui
ΔU= f
ji
jijijijiji
jijijijijiji
ji
fh
uuuuuh
uuu
h
uuu
y
jiU
x
jiUu
,
2
,1,1,,1,1
2
,1,1,
2
,,1,1
2
2
2
2
,
4
22
),(),(
Lu=f
3,3
2,3
1,3
3,2
2,2
1,2
3,1
2,1
1,1
3,3
2,3
1,3
3,2
2,2
1,2
3,1
2,1
1,1
,2
,1,1,,1,1,
411
1411
1411
11411
11411
11411
1141
1141
114
4
f
f
f
f
f
f
f
f
f
u
u
u
u
u
u
u
u
u
fh
uuuuuu ji
jijijijijiji
u1,1 u1,2 u1,3
u2,1 u2,2 u2,3
u3,1 u3,2 u3,3
U
Lu=f
• Gauss-Seidel relaxation on u
Lu = f
(L+D+R) u = f where L = (L+D+R)
D u = f – (L+R) u
u (k) = D -1 (f – (L+R) u(k-1)) = D -1 f + RJ u(k-1)
jiij
kijii
ki LuLfu ,
)1(,
)(
Error, residual
• Consider Lu = f
Error ek = u – u(k)
Residual rk = f - L u(k) = L ek
• u (k) = D -1 f + RJ u(k-1)
ek = u – u(k)
= (D -1 f + RJ u) – (D -1 f + RJ u(k-1))
= RJ (u - u(k-1) )
= RJ ek-1
= RJk e0
• e0 depends on RJ and eo
00 er
Limits of Gauss-Seidel
ek= RJk e0
• Gauss-Seidel relaxation is smooth but converges slowly on low-freq. components
• Why use coarse grids ? [Briggs et al. 00]– Coarse grids can be used to compute an
improved initial guess for the fine-grid relaxation
– Relaxation on the coarse-grid is much cheaper
coarse-grid correction
1. Relax k times on Lhuh = fh on fine-grid, initial u(0) arbitraryu (k) = D -1 f + RJ u(k-1)
2. Compute the residual of the find-grid
r = Lh uh(k) - fh
3. Restrict the residual to the coarse-grid
rH = R rh where H = 2h, R is the restriction
4. Compute the error on the coarse-grid
LH eH = rH, where LH = R Lh P5. Prolongate (interpolate) the error to the fine-grid
eh= P eH , where P is the prolongation6. Correct the fine-grid solution
uh(k+1) = uh
(k) + eh
2
3
111
4
5
6Fine grid
coarse grid
Restriction & Prolongation
Restriction operator• Fine grid coarse grid• Typically using local weighted
averaging
Prolongation operator• Coarse grid fine grid• Typically using bilinear
interpolation
Restriction (1D)
prolongation (1D)
R = PT
2D stencils of the multigrid
2D Restriction 2D Prolongation 2D Relaxation (Laplacian)
1
14-1
1
1/41/21/4
1/211/2
1/41/24/1
1/161/81/16
1/81/41/8
1/161/816/1
Standard multigrid V-cycle
fl-1=Rll-rl ul=Pl
l-1uPl-1+uR
l
Llminulmin=flmin
Base solution
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Representing the Poisson equation with 1D B-spline basis
• Solving Poisson eq. reduces to solving Lu = f– L as the N x N matrix with Li,j= < ΔBi(x), Bj(x)> = <>
– f as the vector with fj = < F(x), Bj(x)>
Li,j= <▽Bi(x), -▽ Bj(x)>
Fitting forward-difference gradient constrains
• Gradient t of unknown image v
• Difference of B-spline
2nd order B-spline1D case
2nd order B-spline2D case
2D stencils of the multigrid operators using B-splines
• Prolongation
• Restriction
• Laplacian
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
How to pipeline ?• phases of each row
– Al R Al-1 R Al-2 P Al-1 P Al
– Does neighborhood in Laplacian stencil exist ?
Gauss-Seidel Relaxation (A), Restriction (R), Prolongation (P)
Rn-1 Al R Al-1 R Al-2 P Al-1 P Al
Rn Al R Al-1 R Al-2 P Al-1 P
Rn+1 Al R Al-1 R Al-2 P Al-1
Al
R
Al-1
Al-2
R P
Al-1
Al
P
Rn-1 Al Al Al
Rn Al Al Al
Rn+1 Al Al Al
Time
row
row
Perform 3 times relaxations (A) as a single streaming operations
Temporally blocked relaxation • Maintain a window of rows [i-1, i+2k+1] to perform a skipping, counter-
current relaxation sweep, updating pixels in rows {i+2k-1, i+2k-3, …, i+1}
• Row 4 {11}
– Relaxed twice: Row 3 {02,06}, row 2{03,08}
– Relaxed once: Row 5 {7}, row {10}
Row 1
Row 2Row 3
Row 4Row 5
Row 6
The pixel row is memory-resident
i The ith relaxation globally
Perform k=3 times relaxations (A) as a single streaming operations
Full data pipeline for gradient-domain processing
Memory analysis
• Implement the active windows on ulR and fl
as circular memory buffer of images rows
• Window size (w, h) – w: the image width at the coarser level – h: 2k+3 (restriction), 2k+5 (prolongation)
• Memory usage is O(Nx) for an Nx x Ny
image
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Parameters (n,k,v)
• n-order of the finite element (n-order B-spline)• K times Gauss-Seidel relaxations• v passes of V-cycles
2-order B-spline basis (2, 3, 2)
(n, v)
Plot of the rms and max errors vs. # of multigrid V-cycles
2nd-order element give the fastest convergence !
(n, k=2, v=1)
Parameter selection
• Sufficiently accurate solution with a minimum number of V-cycles– 8-bit channel image processing– Max error < 1/256
(2,5,1): 2nd order B-spline basis, 5 Gauss-Seidel updates with a single-V cycle in all our applications
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Implementation
• Maximizing disk throughput– Larger block (4MB)
transfer to minimize disk latency
– 2X speed improvement• Storing the intermediate
u,f floating-point value on dist at half precision
• Relaxation optimization– leverage the vertical and
horizontal symmetries of the stencil
– Make use of CPU SSE2- 4-vector instructors
• Non-power-of-two image– Padding the input image
to 2lmax-lmin times the coarest level
• Multi-channel image– Stitching requires full-
color gradient– Interleave the per-
channel solutions to reduce the total # of passes
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Environment
• NB– 2.2 GHz Core 2 Duo processor– 4 GB RAM
• Timing– I/O to read the gradient field from disk– Write JPEG-compressed ouput to disk
• Initial solution– u=0 in all experiments
• (2,5,1) 2nd order B-spline basis, 5 Gauss-Seidel updates with a single-V cycle
Image stitching
19,588 x 4,457 (87 MB) panorama form 9 photos
Copy the image
gradients and solving the Poisson eq.
Image stitching
SM: streaming multigrid solverQT: quadtree (AT) solver of Agarwala [07]
Tone-mapping (HDR normal tone)
before after
Tone-mapping (HDR normal tone)
Stream multigraid is the first one to solve Poisson eq in time that is linear on the # of pixels
GB stitching and tone-mapping
May not capture the true scene contrast
GB stitching and tone-mapping
Outline
1. Introduction2. Related work3. Review of finite-difference multigrid4. Our finite-element multigrid approach5. Streaming multigrid solver6. Efficient convergence of 2nd-order elements7. Implementation8. Application results9. Conclusions & future work
Conclusion
• Streaming multigrid– Out-of-core tech. for solving large global
linear system• Local access• A few passes of sequential I/O
– 2nd order B-spline finite element formulation is compatible with traditional multigrid
• Efficient accurate solution in a single V-cycle
Future work• Dirichlet boundary condition
– Modify 2D stencils– Construct f
• Soft constraint to match some original image u0
• Weighted minimization
where w(x,y) is a spatially varying 2x2 diagonal matrix that weight difference in x and y independently
• Bilaplacian
Future work
• Reduce disk bandwidth by using compression/decompression of the streamed temporary data
• Parallelization– Many-core CPUs or GPUs for instance by
partitioning image rows