Our goal is to strengthen the foundations of metaproteomics as a microbial community analysis tool that links the functional identity of actively expressed gene products with host phylogeny. We used shotgun metaproteomics to survey waters in six disparate aquatic habitats (Cayuga Lake, NY; Oneida Lake, NY; Gulf of Maine; Chesapeake Bay, MD; Gulf of Mexico; and the South Pacific). Peptide pools prepared from filter-gathered microbial biomass, analyzed by nano-liquid chromatography-mass spectrometry (MS/MS) generating 9,693 ± 1,073 mass spectra identified 326 ± 107 bacterial proteins per sample. Distribution of proteobacterial (Alpha and Beta) and cyanobacterial (Prochlorococcus and Synechococcus spp.) protein hosts across all six samples was consistent with the previously published biogeography for these microorganisms. Marine samples were enriched in transport proteins (TRAP-type for dicarboxylates and ATP binding cassette (ABC)-type for amino acids and carbohydrates) compared with the freshwater samples. We were able to match in situ expression of many key proteins catalyzing C-, N-, and S-cycle processes with their bacterial hosts across all six habitats. Pelagibacter was identified as the host of ABC-type sugar-, organic polyanion-, and glycine betaine-transport proteins; this extends previously published studies of Pelagibacter's in situ biogeochemical role in marine C- and N-metabolism. Proteins matched to Ruegeria confirmed these organism's role in marine waters oxidizing both carbon monoxide and sulfide. By documenting both processes expressed in situ and the identity of host cells, metaproteomics tested several existing hypotheses about ecophysiological processes and provided fodder for new ones.