Home Machine Learning All the pieces You Can Do with Python’s textwrap Module | by Martin Heinz | Feb, 2024

All the pieces You Can Do with Python’s textwrap Module | by Martin Heinz | Feb, 2024

0
All the pieces You Can Do with Python’s textwrap Module | by Martin Heinz | Feb, 2024

[ad_1]

Find out about all of the issues you are able to do with Python’s textwrap module, together with formatting, textual content wrapping, trimming and extra

Photograph by Howdy Sunday on Unsplash

Python has many choices for formatting strings and textual content, together with f-strings, format() operate, templates and extra. There’s nevertheless one module that few individuals find out about and it is referred to as textwrap.

This module is particularly constructed that will help you with line-wrapping, indentation, trimming and extra, and on this article we are going to have a look at all of the issues you should utilize it for.

Let’s begin with quite simple, but very helpful operate from the textwrap module, referred to as shorten:

from textwrap import shorten

shorten("This can be a lengthy textual content or sentence.", width=10)
# 'This [...]'
shorten("This can be a lengthy textual content or sentence.", width=15)
# 'This can be a [...]'
shorten("This can be a lengthy textual content or sentence.", width=15, placeholder=" <...>")
# 'This can be a <...>'

Because the title suggests, shorten permits us to trim textual content to sure size ( width) if the desired string is just too lengthy. By default, the placeholder for the trimmed textual content is [...], however that may be overridden with the placeholder argument.

A extra attention-grabbing operate from this module is wrap. The apparent use-case for it’s to separate lengthy textual content into strains of identical size, however there are extra issues we are able to do with it:

from textwrap import wrap
s = '1234567890'
wrap(s, 3)
# ['123', '456', '789', '0']

On this instance we break up a string into equal chunks which may be helpful for batch processing, reasonably than simply formatting.

Utilizing this operate nevertheless, has some caveats:

s = '12n3  45678t9n0'
wrap(s, 3)
# ['12', '3 ', '456', '78', '9 0']
# the primary ("12") ingredient "contains" newline
# the 4th ingredient ("78") "contains" tab
wrap(s, 3, drop_whitespace=False, tabsize=1)
# ['12 ', '3 ', '456', '78 ', '9 0']

You have to be cautious with whitespaces, when utilizing wrap – above you may see the behaviour with newline, tab and house characters. You possibly can see that the primary ingredient ( 12) “contains” newline, and 4th ingredient ( 78) “contains” tab, these are nevertheless, dropped by default, subsequently these parts solely have 2 characters as an alternative of three.

We are able to specify the drop_whitespace key phrase argument to protect them and keep the correct size of chunks.

It is likely to be apparent, however wrap can be nice for reformatting entire recordsdata to sure line width:

with open("some-text.md", "r", encoding="utf-8") as f:
formatted = wrap(f.learn(), width=80) # Record of strains
formatted = fill(f.learn(), width=80) # Single string that features line breaks
# ... write it again

We are able to additionally use the fill operate which is a shorthand for "n".be part of(wrap(textual content, ...)). The distinction between the two is that wrap will give us a listing of strains that we would wish to concatenate ourselves, and fill offers us a single string that is already joined utilizing newlines.

textwrap module additionally features a extra highly effective model wrap operate, which is a TextWrapper class:

import textwrap

w = textwrap.TextWrapper(width=120, placeholder=" <...>")
for s in list_of_strings:
w.wrap(s)
# ...

This class and its wrap methodology are nice if we have to name wrap with the identical parameters a number of occasions as proven above.

And whereas we’re trying that the TextWrapper, let’s additionally check out some extra key phrase arguments:

consumer = "John"
prefix = consumer + ": "
width = 50
wrapper = TextWrapper(initial_indent=prefix, width=width, subsequent_indent=" " * len(prefix))
messages = ["...", "...", "..."]
for m in messages:
print(wrapper.fill(m))

# John: Lorem Ipsum is just dummy textual content of the
# printing and typesetting business. Lorem
# John: Ipsum has been the business's normal dummy
# textual content ever for the reason that 1500s, when an
# John: unknown printer took a galley of sort and
# scrambled it to make a kind specimen

Right here we are able to see using initial_indent and subsequent_indent for indenting the primary line of paragraph and subsequent ones, respectively. There are couple extra choices, which you could find in docs.

Moreover, as a result of TextWrapper is a category, we are able to additionally prolong it and fully override a few of its strategies:

from textwrap import TextWrapper

class DocumentWrapper(TextWrapper):

def wrap(self, textual content):
split_text = textual content.break up('n')
strains = [line for par in split_text for line in TextWrapper.wrap(self, par)]
return strains

textual content = """First line,

One other, a lot looooooonger line of textual content and/or sentence"""
d = DocumentWrapper(width=50)
print(d.fill(textual content))

# First line,
# One other, a lot looooooonger line of textual content and/or
# sentence

This can be a good instance of fixing the wrap methodology to protect present line breaks and to print them correctly.

For a extra full instance for dealing with a number of paragraphs with TextWrapper, try this text.

Lastly, textwrap additionally contains two features for indentation, first one being dedent:

# Ugly formatting:
multiline_string = """
First line
Second line
Third line
"""

from textwrap import dedent

multiline_string = """
First line
Second line
Third line
"""

print(dedent(multiline_string))

# First line
# Second line
# Third line

# Discover the main clean line...
# You should use:
multiline_string = """
First line
Second line
Third line
"""

# or
from examine import cleandoc
cleandoc(multiline_string)
# 'First linenSecond linenThird line'

By default, multiline strings in Python honor any indentation used within the string, subsequently we have to use the ugly formatting proven within the first variable within the snippet above. However we are able to use the dedent operate to enhance formatting – we merely indent the variable worth nevertheless we like after which name dedent on it earlier than utilizing it.

Alternatively, we may additionally use examine.cleandoc, which additionally strips the main newline. This operate nevertheless encodes the whitespaces as particular characters (n and t), so that you would possibly have to reformat it once more.

Naturally, when there may be dedent, then there must be additionally indent operate:

from textwrap import indent

indented = indent(textual content, " ", lambda x: not textual content.splitlines()[0] in x)

We merely provide the textual content and the string that every line will likely be indented with (right here simply 4 areas, we may — for instance — use >>> to make it appear to be REPL). Moreover, we are able to provide a predicate that can resolve whether or not the road must be indented or not. Within the instance above, the lambda operate makes it in order that first line of string (paragraph) just isn’t indented.

textwrap is an easy module with just some features/strategies, but it surely as soon as once more reveals that Python actually comes with “batteries included” for issues that do not essentially must be normal library, however they’ll prevent a lot time while you occur to wish them.

When you occur to do lots of textual content processing, then I additionally advocate testing the entire docs part devoted to working with textual content. There are lots of extra modules and little features that you just didn’t know you wanted. 😉

This text was initially posted at martinheinz.dev

[ad_2]