strange Dos metacharacters
I have a text file of .srt subtitles downloaded in dos fornat,
and want to convert to unix text style. I have no problems
removing the ^M character with sed (using ctrl-V ctrl-M).
But every other character in the file is this "^ [at] " :
^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]
I can't seem to produce this control sequence on the keyboard.
Does anyone know what it is?
Cheers,
Simon
--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void
Re: strange Dos metacharacters
* simonp [at] nospam.com <simonp [at] nospam.com>:
> I have a text file of .srt subtitles downloaded in dos fornat,
> and want to convert to unix text style. I have no problems
> removing the ^M character with sed (using ctrl-V ctrl-M).
>
> But every other character in the file is this "^ [at] " :
>
> ^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]
>
> I can't seem to produce this control sequence on the keyboard.
> Does anyone know what it is?
>
> Cheers,
> Simon
Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
or UTF-8. You'll need something like iconv to convert it.
--
James Michael Fultz <xyzzy [at] sent.as.invalid>
Remove this part when replying ^^^^^^^^
Re: strange Dos metacharacters
On my slackware xterm bash:
$ for i in $(seq 0 255)
> do printf "$i \x`printf %x $i`\n"
> done|cat -v|grep [at]
0 ^ [at]
64 [at]
128 M-^ [at]
192 M- [at]
$
$ echo -e "\x00"|cat -v
^ [at]
$
$ echo -e "\000"|cat -v
^ [at]
$
$ echo -e "m\000a\rc"|cat -v
m^ [at] a^Mc
$ echo -e "m\000a\rc"|tr -d '\000\r'|cat -v
mac
$
Re: strange Dos metacharacters
James Michael Fultz <xyzzy [at] sent.as.invalid> wrote:
> * simonp [at] nospam.com <simonp [at] nospam.com>:
>> I have a text file of .srt subtitles downloaded in dos fornat,
>> and want to convert to unix text style. I have no problems
>> removing the ^M character with sed (using ctrl-V ctrl-M).
>>
>> But every other character in the file is this "^ [at] " :
>>
>> ^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]
>>
>> I can't seem to produce this control sequence on the keyboard.
>> Does anyone know what it is?
>>
>> Cheers,
>> Simon
>
> Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
> or UTF-8. You'll need something like iconv to convert it.
>
Thanks for the tip, that was exactly the problem.
The _file_ utility (which I just discovered) identified it as
UTF-16, and iconv converted easily to ASCII.
(Turns out the subtitles are in Italian though.)
Cheers,
Simon
--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void