Searching natural environments, as for example, when foraging or looking for a landmark, combines reasoning under uncertainty, planning and visual search. Existing paradigms for studying search in humans focus on step-by-step information sampling, without examining advance planning. We propose and evaluate a Bayesian model of how people search in a naturalistic maze-solving task. The model encodes environment exploration as a sequential process of acquiring information modeled by a Partially Observable Markov Decision Process (POMDP), which maximizes the information gained. We show that the search policy averaged across participants is optimal. Individual solutions, however, are highly variable and can be explained by two heuristics: thinking and guessing. Self-report and inference, a Gaussian Mixture Model over inverse POMDP, consistently assign most subjects to one style or the other. By analyzing individual participants' decision times we show that individuals solve partial POMDPs and plan their search a limited number of steps in advance.