Commit Graph

2537 Commits

Author SHA1 Message Date
John MacFarlane
4a47e6a34c Fix unnecessary variable assignment. 2020-05-13 15:40:42 -07:00
John MacFarlane
7486f5fff3 New lint target using clang-tidy. 2020-05-13 12:30:30 -07:00
John MacFarlane
35535d5822 CI: Another attempt to add clang-tidy. 2020-05-13 09:47:25 -07:00
John MacFarlane
5a13e06b02 Remove appveyor build. 2020-05-13 09:36:06 -07:00
John MacFarlane
57dc0a9824 CI: change shared to cmark_opts in matrix, add linter. 2020-05-13 09:35:38 -07:00
John MacFarlane
99bb5523b7 CI: test with different shared library configs. 2020-05-13 09:18:38 -07:00
John MacFarlane
8ab7c6a73c CI: try setting CPP in matrix. 2020-05-13 09:14:13 -07:00
John MacFarlane
62539423f8 Fix syntax for matrix. 2020-05-13 08:58:06 -07:00
John MacFarlane
a2ab3e8613 Linux CI: use both gcc and clang. 2020-05-13 08:56:47 -07:00
John MacFarlane
5bc4802795 Add .gitattributes to ensure that line endings are normalized. 2020-05-13 08:03:37 -07:00
John MacFarlane
57f814d944 Windows CI: ensure UTF-8. 2020-05-12 23:24:39 -07:00
John MacFarlane
9e5209054c Revert "spec_tests.py: ignore line endings on diff."
This reverts commit 54c990d173.
2020-05-12 23:21:53 -07:00
John MacFarlane
54c990d173 spec_tests.py: ignore line endings on diff. 2020-05-12 23:17:07 -07:00
John MacFarlane
9411fe7146 Revert "spec_tests.py: don't keep line endings (for windows CI)."
This reverts commit abc45c57d3.
2020-05-12 23:09:51 -07:00
John MacFarlane
abc45c57d3 spec_tests.py: don't keep line endings (for windows CI). 2020-05-12 23:07:09 -07:00
John MacFarlane
76ddef4b1b Add CI badge. 2020-05-12 22:57:45 -07:00
John MacFarlane
2a42369f42 Windows CI: try installing msvc tools. 2020-05-12 22:54:36 -07:00
John MacFarlane
8cf576367b Revert "Revert "CI: avoid using nmake.bat.""
This reverts commit 745b877835.
2020-05-12 22:49:16 -07:00
John MacFarlane
314b5a1ee2 Windows CI: specify cmd shell. 2020-05-12 22:47:40 -07:00
John MacFarlane
745b877835 Revert "CI: avoid using nmake.bat."
This reverts commit c5732b26bb.
2020-05-12 22:47:20 -07:00
John MacFarlane
c5732b26bb CI: avoid using nmake.bat. 2020-05-12 22:43:30 -07:00
John MacFarlane
640d2f3491 CI: install valgrind for linux. 2020-05-12 22:39:16 -07:00
John MacFarlane
59ff258ea1 Fix CI. 2020-05-12 22:36:02 -07:00
John MacFarlane
97f36568fb Diagnostic for CI. 2020-05-12 22:30:15 -07:00
John MacFarlane
84ab15c9f0 Revert "Setup python."
This reverts commit 9f760cefde.
2020-05-12 22:27:14 -07:00
John MacFarlane
9f760cefde Setup python. 2020-05-12 22:22:46 -07:00
John MacFarlane
5c77b675b5 Add GitHub actions CI. 2020-05-12 22:16:14 -07:00
data-man
b467630d73 Update to Unicode 13.0 2020-05-12 22:08:12 -07:00
John MacFarlane
ef20bfbd5b Add uninstall target to Makefile. 2020-03-20 17:25:23 -07:00
Leo Neat
855361ca1c Adding CIFuzz 2020-03-19 09:07:03 -07:00
John MacFarlane
74e8f638ad Skip UTF-8 BOM if present at beginning of buffer.
Closes #334.
2020-03-03 15:05:32 -08:00
John MacFarlane
67ec0eef4b Add casts for MSVC10.
This is kivikakk's commit 62166fe3b6b07068ed4c4207113e3c4b060ad4a8
in cmark-gfm.
2020-02-16 08:54:19 -08:00
John MacFarlane
b2378e459b Fix #220 (hash collisions for references).
This commit ports Vicent Marti's fix in cmark-gfm.
(384cc9db4cd7a90f59c0751e58eb7b3023d38b85)

His commit message follows:

As explained on the previous commit, it is trivial to DoS the CMark
parser by generating a document where all the link reference names hash
to the same bucket in the hash table.

This will cause the lookup process for each reference to take linear
time on the amount of references in the document, and with enough link
references to lookup, the end result is a pathological O(N^2) that
causes medium-sized documents to finish parsing in 5+ minutes.

To avoid this issue, we propose the present commit.

Based on the fact that all reference lookup/resolution in a Markdown
document is always performed as a last step during the parse process,
we've reimplemented reference storage as follows:

1. New references are always inserted at the end of a linked list. This
is an O(1) operation, and does not check whether an existing (duplicate)
reference with the same label already exists in the document.

2. Upon the first call to `cmark_reference_lookup` (when it is expected
that no further references will be added to the reference map), the
linked list of references is written into a fixed-size array.

3. The fixed size array can then be efficiently sorted in-place in O(n
log n). This operation only happens once. We perform this sort in a
_stable_ manner to ensure that the earliest link reference in the
document always has preference, as the spec dictates. To accomplish
this, every reference is tagged with a generation number when initially
inserted in the linked list.

4. The sorted array is then compacted in O(n). Since it was sorted in a
stable way, the first reference for each label is preserved and the
duplicates are removed, matching the spec.

5. We can now simply perform a binary search for the current
`cmark_reference_lookup` query in O(log n). Any further lookup calls
will also be O(log n), since the sorted references table only needs to
be generated once.

The resulting implementation is notably simple (as it uses standard
library builtins `qsort` and `bsearch`), whilst performing better than
the fixed size hash table in documents that have a high number of
references and never becoming pathological regardless of the input.
2020-02-16 08:50:54 -08:00
John MacFarlane
04936d6323 Add pathological test for reference collisions (see #220).
This is taken from GitHub's fix:
66a0836dc9
2020-02-16 08:40:39 -08:00
John MacFarlane
9d6697f9d3 Update date on cmark.1. 2020-02-11 20:49:21 -08:00
John MacFarlane
72a2d2f755 cmark.1 - Document --unsafe instead of --safe.
Closes #332.
2020-02-11 17:40:01 -08:00
John MacFarlane
ea16f66469 cmark.1: remove docs for --normalize which no longer exists.
See #332
2020-02-11 17:37:03 -08:00
John MacFarlane
fdbdbf7ecc Add cmark_get_default_mem_allocator().
API change: This adds a new exported function in cmark.h.

Closes #330.
2020-02-09 09:03:16 -08:00
Nick Wellnhofer
71ef02503e Fix URL check in is_autolink
In a recent commit, the check was changed to strcmp, but we really
have to use strncmp.
2020-01-25 09:39:32 -08:00
Nick Wellnhofer
1f09bfd091 Fix null pointer deref in is_autolink
Introduced by a recent commit. Found by OSS-Fuzz.
2020-01-25 09:39:32 -08:00
Saleem Abdulrasool
242e277a66 build: substitute the path into the generate files
This resorts to the variable substitution to ensure the path embedded is
correct.  Without this, the path at the time of the configuration.  In
the case of the Swift project, this ended up searching in the *source*
directory rather than the *build* directory.  This will ensure that we
export the file to an absolute location and we use the same location in
the `cmarkConfig.cmake` file by means of CMake's `configure_file`
subsitution.
2020-01-24 17:27:22 -08:00
Saleem Abdulrasool
14622a194c build: use absolute path for cmarkTargets.cmake
Adjust the include of the CMake file to use a cmarkConfig.cmake relative
location which enables use without considerations for the path.
2020-01-23 08:32:43 -08:00
Nick Wellnhofer
f3f50b29d6 Rearrange struct cmark_node
Introduce multi-purpose data/len members in struct cmark_node. This
is mainly used to store literal text for inlines, code and HTML blocks.

Move the content strbuf for blocks from cmark_node to cmark_parser.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.

Reduces size of struct cmark_node by 8 bytes.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
3ef0718f9f Improve packing of struct cmark_list
Allows to reduce size of struct cmark_node later.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
68a3f24d93 Use C string instead of chunk in renderer
Fix another place where an "allocated" cmark_chunk was used.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
b0a4cfa36e Use C string instead of chunk for literal text
Use zero-terminated C strings and a separate length field instead of
cmark_chunks. Literal inline text will now be copied from the parent
block's content buffer, slowing the benchmark down by 10-15%.

The node struct never references memory of other nodes now, fixing #309.
Node accessors don't have to check for delayed creation of C strings,
so parsing and iterating all literals using the public API should
actually be faster than before.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
75b48c5938 Use C string instead of chunk for custom block contents
Reduces size of struct cmark_node by 8 bytes.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
b237924585 Use C string instead of chunk for link URL and title
Use zero-terminated C strings instead of cmark_chunks without storing
the length. This introduces a few additional strlen computations,
but overhead should be low.

Allows to reduce size of struct cmark_node later.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
3acbdf0965 Use C string instead of chunk for code info and literal
Use zero-terminated C strings instead of cmark_chunks without storing
the length. The length of code literals will be readded in a later
commit. strlen overhead for code info should be negligible.

Reduces size of struct cmark_node by 8 bytes.
2020-01-23 08:25:54 -08:00
Nick Wellnhofer
df7ef9ed7b Helper function to set C strings in nodes 2020-01-23 08:25:54 -08:00