Post by Jean-Nicolas JolivetPost by Daniel Fischer-- assuming that only characters > 'i' (chr 105) are escaped (or the
escape character itself, but that should be dropped regardless).
Sorry, this is my fault for not being clearer about it: I am decoding
text that is already encoded with a Binary to ASCII encoding... the
1. Fetch a character from the input stream.
2. Increment the character's ASCII value by 42, modulo 256
3. If the result is a critical character ('\NUL', '\r', '\n' or '='),
write the escape character ('=') to the output stream and increment
character's ASCII value by 64, modulo 256. 4. Output the character to
the output stream.
Okay, so there are no two escape characters in succession.
Post by Jean-Nicolas JolivetI am writing a decoder here so obviously I am reversing the process....
(I remove 42 for regular characters, 106 for special characters...and if
the result is < 0, I add 256 to it...)
Just adding (yet again) more context information here ;) Still trying to
fully understand your last suggestions Daniel! :) Thanks again!
Also, I wouldn't want anyone to think I just want someone to write the
algorithm for me! :) Like I said, I already have a fully working
algorithm, but being new to Haskell, I'm looking for ways to optimize it
(or, really just different ways that I could do it!)
Here's how I am doing it right now (I am using ByteString here,
Then the foldr is not the best option.
Post by Jean-Nicolas Jolivetbut it should be pretty straightforward anyway)...
I'm warning you, this is pretty ugly :)
-- Decode an encoded ByteString
-- the zip + tail method removes the first character so I am adding it afterward (ugly)
decodeByteString :: L.ByteString -> L.ByteString
decodeByteString str = do
let str1 = mapMaybe decodepair (L.zip str(L.tail str))
let firstChar = decodechar (ord(L.head str) - 42)
L.pack (firstChar:str1)
Is it really necessary to pack it? That's a relatively expensive operation,
it may be better to have it return a String if you're not doing anything
with it after decoding except writing it to a file or so.
The zipping is also not the optimal choice, it takes a lot of checking for
the chunk-boundaries (I assume you're using lazy ByteStrings, since you
chose the prefix L), and you construct pairs only to deconstruct them
immediately (the compiler *may* optimise them away, but I'm skeptical).
Post by Jean-Nicolas Jolivet-- Decode a pair of character, returning either the
-- decoded character or Nothing
decodepair :: (Char, Char) -> Maybe Char
decodepair cs
| snd(cs) == '=' = Nothing
| fst(cs) == '=' = Just (decodechar(ord(snd cs) - 106))
| otherwise = Just (decodechar(ord(snd cs) - 42))
-- Reverse the modulo 256...
decodechar :: Int -> Char
decodechar i
| i < 0 = chr (i + 256)
| otherwise = chr i
Since you're doing arithmetic modulo 256, that stuff can be done faster and
simpler with Word8.
----------------------------------------------------------------------
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import Data.ByteString.Unsafe (unsafeAt)
escape :: Word8 -> Word8
escape = (+150)
normal :: Word8 -> Word8
normal = (+214)
decodeW :: L.ByteString -> [Word8]
decodeW = dec False . L.toChunks
where
dec _ [] = []
dec esc (str:more) = go esc 0
where
!len = S.length str
{-# INLINE charAt #-}
charAt :: Int -> Word8
charAt i = unsafeAt str i
go !b !i
| i == len = dec b more
| b = escape (charAt i) : go False (i+1)
| otherwise = case charAt i of
61 -> go True (i+1)
c -> normal c : go False (i+1)
word8ToChar :: Word8 -> Char
word8ToChar = toEnum . fromIntegral
decodeC :: L.ByteString -> String
decodeC = map word8ToChar . decodeW
decodeBS :: L.ByteString -> L.ByteString
decodeBS = L.pack . decodeW
----------------------------------------------------------------------
Post by Jean-Nicolas JolivetJean-Nicolas Jolivet
Post by Daniel FischerPost by Jean-Nicolas JolivetFirst I would like to thank everyone for the very interesting replies
and suggestions I got so far!...
I tried to implement (and at the very least understand) most of them!...
-apply a "transformation" to a character (in my case, subtracting 42
to its ASCII value, which I obtain with chr(ord(c) - 42) -if the
character is preceded by a specific character (that would be, an
escape character, in this case '=') then subtract 106 to its value
instead of 42... -if the character is the escape character itself,
'=', then skip it altogether
Ah, that complicates matters a little.
- What happens if ord c < 42 (ord c < 106, if c is preceded by the
escape character?)
- What about escaped escape characters?
However,
foo xs = catMaybes $ zipWith f (' ':xs) xs
where
f _ '=' = Nothing
f '=' c = Just (chr $ ord c - 106)
f _ c = Just (chr $ ord c - 42)
is still pretty simple, as is the direct recursion
foo = go ' '
where
go _ ('=' :cs) = go '=' cs
go '=' (c:cs) = chr (ord c - 106) : go c cs
go _ (c:cs) = chr (ord c - 42) : go c cs
go _ _ = []
-- assuming that only characters > 'i' (chr 105) are escaped (or the
escape character itself, but that should be dropped regardless).
fooGen :: Char -> (Char -> Char) -> (Char -> Char) -> String -> String
fooGen e esc norm str = catMaybes $ zipWith f (d:str) str
where
d = if e == maxBound then pred e else succ e
f x y
| y == e = Nothing
| x == e = Just (esc y)
| otherwise = Just (norm y)
is an easy generalisation.
Post by Jean-Nicolas Jolivet(keeping in mind that the next character needs to be
escaped)...
I managed to do it, however I'm not totally satisfied in the way I
did it... the problem was that... as I just explained, in some cases,
the character that is being processed has to be "skipped" (and by
that I mean, not added to the resulting string). This happens when
the processed character IS the escape character...
What I did was to build a List of Maybe Char.... my function does the
proper operation on the character and returns a "Just Char" when the
character is processed, or Nothing when it is the escaped
character... so basically I would end up with something like: [Just
'f', Just 'o', Just 'o', Nothing]... I am mapping this using mapMaybe
to end up with a proper String...
Would there be any more efficient way of doing this?
That is already pretty efficient. The direct recursion is probably a
bit more efficient, but I don't think the difference will be large.
Post by Jean-Nicolas JolivetConsidering that
the escape character should NOT be added to the resulting string, is
there any way I can avoid using the Maybe monad?
Sure, apart from the direct recursion,
fooGen e esc norm str = tail $ foldr f [] (d:str)
where
d = if e == maxBound then pred e else succ e
f x (y:zs)
| y == e = x:zs
| x == e = x:esc y:zs
| otherwise = x:norm y:zs
f x [] = [x]
catMaybes and zipWith is clearer, though, and I don't think the foldr
will perform better.
Post by Jean-Nicolas JolivetOnce again, thanks everyone for all the suggestions!
Jean-Nicolas Jolivet