For my PhD project, I am looking for an auditory database that contains speech (preferably English) that is aligned with its transcription on the syllable or word level. Ideally, the database contains sentences, but words or single syllables could also work. For example, a recording of the sentence 'how are you?' would have a transcription that indicates the use of 'how', 'are' and 'you' at particular times in the recording.
Does anyone know of such a (preferably freely available) database? And if not, what tools would you recommend split data that is annotated on sentence-level (semi-)automatically?