Flood inundation mapping (FIM) can protect human lives and property damages by providing timely damage assessments to plan relief work efficiently. At the current stage, FIM models rely heavily on temporal and spatial data receiving from in-situ gauging and remote sensing to estimate the flood extent, magnitude, and risk. However, these sources of flood information have their own serious shortages. This study aims to introduce a vision-based framework for measuring water level in rivers, using computer vision and deep learning techniques. The time-lapse images captured by terrestrial and surveillance cameras during flood events are analyzed to extract numerical information of the inundated areas. For this purpose, first a 3D point cloud of the region of interest was constructed using iPhone LiDAR sensor. In second step, water was detected and segmented by a deep learning model in the images captured by surveillance cameras during floods. Finally, using projection matrix, 3D point cloud was projected on the 2D image plane. The resulted (2D) pixel coordinates of this projection were intersected with the water lines detected by deep learning model to estimate water depth on banklines of the river. Four different DL-based models including Convolutional Neural Networks (CNN) and Vision Transformers (ViT) were trained and tested in this study. SegFormer-B5 outperformed other models by achieving 99.55% IoU on the captured images. The performance of the proposed framework on Aug 19, 2022 deployment at Rocky Branch creek in Columbia, South Carolina showed 0.9071, 0.9053, 0.0009 and 1.0185 for R-Squared, Nash-Sutcliff-Efficiency, Mean Squared Error and Percent Bias, respectively, between the ground truth data collected by ultrasonic sensor and those estimated by camera for the right bankline. Such a vision-based framework can be substituted for conventional in-situ methods to inform decision-making process by providing faster and more accurate combinations of numerical and visual information about flood.