regex - Why does grep show me different output depending on my input file size? -
i'm little puzzled output of grep
command, seems truncating results based on size of -f file
. instance, consider 1000-line file of strings, patterns.txt
, e.g.:
adkgjwofjdjglkadjglkjasdfahdg dsklfjsldkfjaghwioeghsdlkjfld sdkljfsdkljghsdlfhkwhfklshdfo ... sdklfjsdklfjsdklfjslkjghdfkjj
and 1gb queryfile.txt
search patterns. when run
grep -f -o -f patterns.txt queryfile.txt | grep -c adkgjwofjdjglkadjglkjasdfahdg
in case, command reports 0 matches 1st line, (adkgjwofjdjglkadjglkjasdfahdg
) of patterns.txt
, though there 35 occurrences in queryfile.txt
. verified reducing patterns.txt
file first 10 lines. rerunning
grep -f -o -f patterns_reduced-list.txt queryfile.txt | grep -c adkgjwofjdjglkadjglkjasdfahdg
properly reports 35 occurrences of adkgjwofjdjglkadjglkjasdfahdg
.
what's happening?
this shouldn't happen unless... patterns overlap.
check example:
echo "xyxx" | grep -o -f yx$'\n'xy # output: xy
this finds second pattern (xy
), , because of won't find first pattern (yx
).
echo "xyxx" | grep -o -f yx # output: yx
Comments
Post a Comment