algorithmic information theory data compression document representation information filtering word removal data compression information filtering information theory algorithmic information theory document representation document segmentation substring word removals information retrieval systems