Current trends towards on-edge computing on smart portable devices requires ultra-low power circuits to be able to make feature extraction and classification tasks of patterns. This manuscript proposes a novel approach for feature extraction operations in speech recognition/voice activity detection tasks suitable for portable devices. Whereas conventional approaches are based on either completely analog or digital structures, we propose a 'hybrid” approach by means of voltage-controlled-oscillators. Our proposal makes use of a bank a band-pass filters implemented with ring-oscillators to extract the features (energy within different frequency bands) of input audio signals and digitize them. Afterwards, these data will input a digital classification stage such as a neural network. Ring-oscillators are structures with a digital nature, which makes them highly scalable with the possibility of designing them with minimum length devices. Additionally, due to their inherent phase integration, low-frequency band-pass filters can be implemented without large capacitors. Consequently, we strongly benefit from power consumption and area savings. Finally, our proposal may incorporate the analog-to-digital converter into the structure of the own features extractor circuit to make the full conversion of the raw data when triggered. This supposes a unique advantage with respect to other approaches. The architecture is described and proposed at system-level, along with behavioral simulations made to check whether the performance is the expected one or not. Then the structure is designed with a 65-nm CMOS process to estimate the power consumption and area on a silicon implementation. The results show that our solution is very promising in terms of occupied area with a competitive power consumption in comparison to other state-of-the-art solutions.