strange Dos metacharacters

I have a text file of .srt subtitles downloaded in dos fornat,
and want to convert to unix text style. I have no problems
removing the ^M character with sed (using ctrl-V ctrl-M).

But every other character in the file is this "^ [at] " :

^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]

I can't seem to produce this control sequence on the keyboard.
Does anyone know what it is?

Cheers,
Simon

--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void
simonp [ Do, 03 April 2008 04:28 ] [ ID #1934424 ]

Re: strange Dos metacharacters

* simonp [at] nospam.com <simonp [at] nospam.com>:
> I have a text file of .srt subtitles downloaded in dos fornat,
> and want to convert to unix text style. I have no problems
> removing the ^M character with sed (using ctrl-V ctrl-M).
>
> But every other character in the file is this "^ [at] " :
>
> ^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]
>
> I can't seem to produce this control sequence on the keyboard.
> Does anyone know what it is?
>
> Cheers,
> Simon

Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
or UTF-8. You'll need something like iconv to convert it.

--
James Michael Fultz <xyzzy [at] sent.as.invalid>
Remove this part when replying ^^^^^^^^
James Michael Fultz [ Do, 03 April 2008 05:04 ] [ ID #1934425 ]

Re: strange Dos metacharacters

On my slackware xterm bash:

$ for i in $(seq 0 255)
> do printf "$i \x`printf %x $i`\n"
> done|cat -v|grep [at]
0 ^ [at]
64 [at]
128 M-^ [at]
192 M- [at]
$
$ echo -e "\x00"|cat -v
^ [at]
$
$ echo -e "\000"|cat -v
^ [at]
$
$ echo -e "m\000a\rc"|cat -v
m^ [at] a^Mc
$ echo -e "m\000a\rc"|tr -d '\000\r'|cat -v
mac
$
mop2 [ Do, 03 April 2008 05:26 ] [ ID #1934427 ]

Re: strange Dos metacharacters

James Michael Fultz <xyzzy [at] sent.as.invalid> wrote:
> * simonp [at] nospam.com <simonp [at] nospam.com>:
>> I have a text file of .srt subtitles downloaded in dos fornat,
>> and want to convert to unix text style. I have no problems
>> removing the ^M character with sed (using ctrl-V ctrl-M).
>>
>> But every other character in the file is this "^ [at] " :
>>
>> ^ [at] m^ [at] a^ [at] ^ [at] p^ [at] i^ [at] c^ [at] c^ [at] o^ [at] l^ [at] o^ [at] ^ [at] t^ [at] a^ [at] g^ [at] l^ [at] i^ [at] o^ [at] .^ [at] ^ [at]
>>
>> I can't seem to produce this control sequence on the keyboard.
>> Does anyone know what it is?
>>
>> Cheers,
>> Simon
>
> Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
> or UTF-8. You'll need something like iconv to convert it.
>

Thanks for the tip, that was exactly the problem.

The _file_ utility (which I just discovered) identified it as
UTF-16, and iconv converted easily to ASCII.

(Turns out the subtitles are in Italian though.)

Cheers,
Simon

--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void
simonp [ Do, 03 April 2008 06:20 ] [ ID #1934428 ]
Linux » comp.unix.shell » strange Dos metacharacters

Vorheriges Thema: Unix process ID
Nächstes Thema: Re: email from shell with htm attachment