AI Coding Tools Ranked

AI Coding Tools Ranked — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Linguatec

    Linguatec

    The Linguatec Sprachtechnologien GmbH is a language technology provider, specialized in the field of machine translation, speech synthesis and speech recognition. Linguatec was founded in Munich in 1996 and its headquarters are in Pasing. Linguatec has won the European Information Society Technologies Prize three times. On their website, they are now using the online service Voice Reader Web, so that the information can be read out in every language by means of a text-to-speech function. == Core areas == Machine translation The different versions of Personal Translator (seven language pairs) can be used "for home use" or for professional business use in the company network. In addition to this, specialist dictionaries are offered to broaden standard vocabulary. Speech synthesis The Voice Reader text-to-speech program reads in twelve languages: German, British English, American English, French, Quebec French, Spanish, Mexican Spanish, Italian, Dutch, Portuguese, Czech, Chinese. Speech recognition Voice Pro is based on ViaVoice technology from IBM. There are special software programs for doctors and lawyers. == Patents == 2005 pending patent application for a newly developed hybrid technology that uses the intelligence of neural networks for machine translation. == Awards == 2004 European IT Prize for Beyond Babel 2004 test winner Stiftung Warentest – best voice recognition 1998 European IT Prize – applied voice recognition 1996 European IT Prize – automated translation == Studies == 2005 University of Regensburg: Voice Reader user test 2002 Fraunhofer Institute for Industrial Engineering and Organization IAO: user study on the efficiency of machine translation

    Read more →
  • AI Photo Editors Reviews: What Actually Works in 2026

    AI Photo Editors Reviews: What Actually Works in 2026

    Curious about the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Tagged Deterministic Finite Automaton

    Tagged Deterministic Finite Automaton

    In the automata theory, a tagged deterministic finite automaton (TDFA) is an extension of deterministic finite automaton (DFA). In addition to solving the recognition problem for regular languages, TDFA is also capable of submatch extraction and parsing. While canonical DFA can find out if a string belongs to the language defined by a regular expression, TDFA can also extract substrings that match specific subexpressions. More generally, TDFA can identify positions in the input string that match tagged positions in a regular expression (tags are meta-symbols similar to capturing parentheses, but without the pairing requirement). == History == TDFA were first described by Ville Laurikari in 2000. Prior to that it was unknown whether it is possible to perform submatch extraction in one pass on a deterministic finite-state automaton, so this paper was an important advancement. Laurikari described TDFA construction and gave a proof that the determinization process terminates, however the algorithm did not handle disambiguation correctly. In 2007 Chris Kuklewicz implemented TDFA in a Haskell library Regex-TDFA with POSIX longest-match semantics. Kuklewicz gave an informal description of the algorithm and answered the principal question whether TDFA are capable of POSIX longest-match disambiguation, which was doubted by other researchers. In 2017 Ulya Trafimovich described TDFA with one-symbol lookahead. The use of a lookahead symbol reduces the number of registers and register operations in a TDFA, which makes it faster and often smaller than Laurikari TDFA. Trafimovich called TDFA variants with and without lookahead TDFA(1) and TDFA(0) by analogy with LR parsers LR(1) and LR(0). The algorithm was implemented in the open-source lexer generator RE2C. Trafimovich formalized Kuklewicz disambiguation algorithm. In 2018 Angelo Borsotti worked on an experimental Java implementation of TDFA; it was published later in 2021. In 2019 Borsotti and Trafimovich adapted POSIX disambiguation algorithm by Okui and Suzuki to TDFA. They gave a formal proof of correctness of the new algorithm and showed that it is faster than Kuklewicz algorithm in practice. In 2020 Trafimovich published an article about TDFA implementation in RE2C. In 2022 Borsotti and Trafimovich published a paper with a detailed description of TDFA construction. The paper incorporated their past research and presented multi-pass TDFA that are better suited to just-in-time determinization. They also compared TDFA against other algorithms and provided benchmarks. == Formal definition == TDFA have the same basic structure as ordinary DFA: a finite set of states linked by transitions. In addition to that, TDFA have a fixed set of registers that hold tag values, and register operations on transitions that set or copy register values. The values may be scalar offsets, or offset lists for tags that match repeatedly (the latter can be represented efficiently using a trie structure). There is no one-to-one mapping between tags in a regular expression and registers in a TDFA: a single tag may need many registers, and the same register may hold values of different tags. The following definition is according to Trafimovich and Borsotti. The original definition by Laurikari is slightly different. A tagged deterministic finite automaton F {\displaystyle F} is a tuple ( Σ , T , S , S f , s 0 , R , R f , δ , φ ) {\displaystyle (\Sigma ,T,S,S_{f},s_{0},R,R_{f},\delta ,\varphi )} , where: Σ {\displaystyle \Sigma } is a finite set of symbols (alphabet) T {\displaystyle T} is a finite set of tags S {\displaystyle S} is a finite set of states with initial state s 0 {\displaystyle s_{0}} and a subset of final states S f ⊆ S {\displaystyle S_{f}\subseteq S} R {\displaystyle R} is a finite set of registers with a subset of final registers R f {\displaystyle R_{f}} (one per tag) δ : S × Σ → S × O ∗ {\displaystyle \delta :S\times \Sigma \rightarrow S\times O^{}} is a transition function φ : S f → O ∗ {\displaystyle \varphi :S_{f}\rightarrow O^{}} is a final function, where O {\displaystyle O} is a set of register operations of the following types: set register i {\displaystyle i} to nil or to the current position: i ← v {\displaystyle i\leftarrow v} , where v ∈ { n , p } {\displaystyle v\in \{\mathbf {n} ,\mathbf {p} \}} copy register j {\displaystyle j} to register i {\displaystyle i} : i ← j {\displaystyle i\leftarrow j} copy register j {\displaystyle j} to register i {\displaystyle i} and append history: i ← j ⋅ h {\displaystyle i\leftarrow j\cdot h} , where h {\displaystyle h} is a string over { n , p } {\displaystyle \{\mathbf {n} ,\mathbf {p} \}} === Example === Figure 0 shows an example TDFA for regular expression ( 1 a 2 ) ∗ 3 ( a | 4 b ) 5 b ∗ {\displaystyle (1a2)^{}3(a|4b)5b^{}} with alphabet Σ = { a , b } {\displaystyle \Sigma =\{a,b\}} and a set of tags T = { 1 , 2 , 3 , 4 , 5 } {\displaystyle T=\{1,2,3,4,5\}} that matches strings of the form a … a b … b {\displaystyle a\dots ab\dots b} with at least one symbol. TDFA has four states S = { 0 , 1 , 2 , 3 } {\displaystyle S=\{0,1,2,3\}} three of which are final S f = { 1 , 2 , 3 } {\displaystyle S_{f}=\{1,2,3\}} . The set of registers is R = { r 1 , r 2 , r 3 , r 4 , r 5 } {\displaystyle R=\{r_{1},r_{2},r_{3},r_{4},r_{5}\}} with a subset of final registers R f = { r 1 , r 2 , r 3 , r 4 , r 5 } {\displaystyle R_{f}=\{r_{1},r_{2},r_{3},r_{4},r_{5}\}} where register r i {\displaystyle r_{i}} corresponds to i {\displaystyle i} -th tag. Transitions have operations defined by the δ {\displaystyle \delta } function, and final states have operations defined by the φ {\displaystyle \varphi } function (marked with wide-tipped arrow). For example, to match string a a b {\displaystyle aab} , one starts in state 0, matches the first a {\displaystyle a} and moves to state 1 (setting registers r 1 , r 2 {\displaystyle r_{1},r_{2}} to undefined and r 3 {\displaystyle r_{3}} to the current position 0), matches the second a {\displaystyle a} and loops to state 1 (register values are now r 1 = 0 , r 2 = r 3 = 1 {\displaystyle r_{1}=0,r_{2}=r_{3}=1} ), matches b {\displaystyle b} and moves to state 2 (register values are now r 1 = 1 , r 2 = r 3 = r 4 = 2 {\displaystyle r_{1}=1,r_{2}=r_{3}=r_{4}=2} ), executes the final operations in state 2 (register values are now r 1 = 1 , r 2 = r 3 = r 4 = 2 , r 5 = 3 {\displaystyle r_{1}=1,r_{2}=r_{3}=r_{4}=2,r_{5}=3} ) and finally exits TDFA. == Complexity == Canonical DFA solve the recognition problem in linear time. The same holds for TDFA, since the number of registers and register operations is fixed and depends only on the regular expression, but not on the length of input. The overhead on submatch extraction depends on tag density in a regular expression and nondeterminism degree of each tag (the maximum number of registers needed to track all possible values of the tag in a single TDFA state). On one extreme, if there are no tags, a TDFA is identical to a canonical DFA. On the other extreme, if every subexpression is tagged, a TDFA effectively performs full parsing and has many operations on every transition. In practice for real-world regular expressions with a few submatch groups the overhead is negligible compared to matching with canonical DFA. == TDFA construction == TDFA construction is performed in a few steps. First, a regular expression is converted to a tagged nondeterministic finite automaton (TNFA). Second, a TNFA is converted to a TDFA using a determinization procedure; this step also includes disambiguation that resolves conflicts between ambiguous TNFA paths. After that, a TDFA can optionally go through a number of optimizations that reduce the number of registers and operations, including minimization that reduces the number of states. Algorithms for all steps of TDFA construction with pseudocode are given in the paper by Borsotti and Trafimovich. This section explains TDFA construction on the example of a regular expression a ∗ t b ∗ | a b {\displaystyle a^{}tb^{}|ab} , where t {\displaystyle t} is a tag and { a , b } {\displaystyle \{a,b\}} are alphabet symbols. === Tagged NFA === TNFA is a nondeterministic finite automaton with tagged ε-transitions. It was first described by Laurikari, although similar constructions were known much earlier as Mealy machines and nondeterministic finite-state transducers. TNFA construction is very similar to Thompson's construction: it mirrors the structure of a regular expression. Importantly, TNFA preserves ambiguity in a regular expression: if it is possible to match a string in two different ways, then TNFA for this regular expression has two different accepting paths for this string. TNFA definition by Borsotti and Trafimovich differs from the original one by Laurikari in that TNFA can have negative tags on transitions: they are needed to make the absence of match explicit in cases when there is a bypass for a tagged transition. Figure 1 shows TNFA for the example regu

    Read more →
  • AI Writing Assistants Reviews: What Actually Works in 2026

    AI Writing Assistants Reviews: What Actually Works in 2026

    Looking for the best AI writing assistant? An AI writing assistant is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI writing assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Machine unlearning

    Machine unlearning

    Machine unlearning is a branch of machine learning focused on removing specific undesired element, such as private data, wrong or manipulated training data, outdated information, copyrighted material, harmful content, dangerous abilities, or misinformation, without needing to rebuild models from the ground up. Large language models, like the ones powering ChatGPT, may be asked not just to remove specific elements but also to unlearn a "concept," "fact," or "knowledge," which aren't easily linked to specific examples. New terms such as "model editing," "concept editing," and "knowledge unlearning" have emerged to describe this process. == History == Early research efforts were largely motivated by Article 17 of the GDPR, the European Union's privacy regulation commonly known as the "right to be forgotten" (RTBF), introduced in 2014. The GDPR did not anticipate that the development of large language models would make data erasure a complex task. This issue has since led to research on "machine unlearning," with a growing focus on removing copyrighted material, harmful content, dangerous capabilities, and misinformation. Just as early experiences in humans shape later ones, some concepts are more fundamental and harder to unlearn. A piece of knowledge may be so deeply embedded in the model's knowledge graph that unlearning it could cause internal contradictions, requiring adjustments to other parts of the graph to resolve them. Researchers have now also started studying unlearning in the context of removing incorrect or adversarially manipulated training data such as systematically biased labels or poisoning attacks. == Motivations == At present, machine unlearning is motivated by a growing range of concerns that extend well beyond the field's original focus on data privacy. A widely used taxonomy in the literature distinguishes two high-level categories of motivation. Access revocation covers cases where a data subject or rights holder requests the removal of data they own or control. This is most commonly associated with RTBF established by the European Union's General Data Protection Regulation (GDPR) and analogous legislation such as the California Consumer Privacy Act (CCPA). These regulations grant individuals the legal right to request erasure of their personal data from any system that has processed it, including models that were trained on it. Access revocation also encompasses the removal of copyrighted or pay-walled content that was incorporated into training corpora without the necessary licenses, a concern that has become prominent with the widespread use of largely web-scraped pre-training datasets. Model correction covers cases where the model exhibits undesirable behavior arising from the training data, regardless of any individual's request. This includes: Removal of toxic, biased, or unsafe outputs introduced by harmful content in the training set Correction of stale or factually incorrect associations, such as outdated knowledge encoded in a deployed model Removal of dangerous capabilities, such as detailed knowledge of the synthesis of chemical or biological agents Correction of the influence of data poisoning or adversarial attacks that have corrupted model behavior This second category has been formalized as corrective machine unlearning, which frames unlearning as a post-training mechanism for repairing the effects of bad or harmful training data. It is closely related to the AI safety literature, where data filtering alone has been found insufficient to prevent hazardous knowledge from being encoded in model weights, motivating unlearning as a complementary risk mitigation strategy. A further distinction has been drawn in the literature between removal {eliminating the influence of specific training data on model parameters) and suppression (preventing the model from generating specific outputs regardless of how that knowledge is encoded). These two goals are not equivalent: removing training data does not guarantee meaningful output suppression, and suppressing outputs does not constitute removal of the underlying training data's influence. == SISA Training == SISA is a training strategy consisting of four mechanisms designed to make machine unlearning more efficient by structuring how models are trained and updated. Its goal is to allow a system to remove the influence of specific data points without retraining an entire model from scratch. By reorganizing training data and workflows, SISA reduces the computational burden of unlearning requests. Sharding divides the training dataset into multiple disjoint subsets, or shards. Each shard is used to train a separate model instance. This ensures that a single data point affects only one shard, so unlearning it requires updating only the corresponding shard rather than the full model. Isolation refers to training each shard independently, with nothing shared across shards during the training process. This separation prevents cross-contamination between shards, ensuring that forgetting data in one shard does not require adjustments to any others. Slicing breaks the data within each shard into sequential slices and stores model states after each slice is trained on. When an unlearning request targets a piece of data, the system can roll back to the checkpoint before the point was seen and retrain only from that slice forward. This reduces retraining time even within a shard. Aggregation occurs at inference, when the model is queried. It combines the outputs of each shard to determine the output of the overall model. This is often through majority voting or averaging. This allows SISA-trained systems to behave like a single model despite being composed of multiple shard-level models. Together, these mechanisms enable machine learning systems to forget specific data points with far lower computational cost than full retraining. The trade-off is that sharding and slicing can lead to reduced model accuracy, worse generalization, and increased storage requirements for the intermediate checkpoints. This can be tolerable based on the needs of the individual or organization to comply with "right to be forgotten" or efficiently recover from backdoor attacks. == Algorithms == Machine unlearning algorithms are broadly categorized into exact and approximate methods, reflecting a fundamental trade-off between formal guarantees and computational tractability. === Exact Unlearning === Exact unlearning methods produce a model that is statistically indistinguishable from one retrained from scratch on the dataset with the forget data removed. The canonical framework for exact unlearning is SISA Training (Sharded, Isolated, Sliced, and Aggregated), introduced by Bourtoule et al. (2021). SISA partitions the training dataset into disjoint shards and trains a separate sub-model on each. At inference time, predictions are aggregated across sub-models. When an unlearning request is received, only the sub-model corresponding to the shard containing the target data requires retraining, reducing computational overhead proportionally to the number of shards. Exact methods provide the strongest guarantees but become prohibitively expensive for large pre-trained neural networks and are generally limited to settings where training can be structured in advance. === Approximate Unlearning === Approximate unlearning methods seek to produce a model whose behavior is sufficiently close to an exactly unlearned model without the cost of full retraining. These methods dominate practical applications. Common approaches include: Gradient Ascent: The model is fine-tuned by maximizing the loss on the forget set, directly degrading its performance on targeted data. This is the most direct approach but risks destabilizing performance on retained data. Random Labelling: The model is fine-tuned on the forget set using randomly shuffled labels, confusing its associations with the targeted data while producing a less aggressive weight shift than pure gradient ascent. Gradient Difference: Combines gradient ascent on the forget set with simultaneous gradient descent on the retain set, using the retain objective as a regularizer to preserve general model utility. KL Divergence Regularization: Minimizes the KL divergence between the outputs of the unlearned model and the original model on the retain set, anchoring behavior on data the model should remember. Weight Pruning and Fine-tuning: Parameters with the smallest L1-norm are pruned — targeting weights most weakly associated with general knowledge and potentially most associated with the forget set — followed by fine-tuning on the retain set to restore utility. Layer Reset and Fine-tuning: The first or last k layers are re-initialized to random weights and the model is subsequently fine-tuned on the retain set. This is a coarse but computationally simple approach. Selective Synaptic Dampening: Uses influence functions to estimate the effect of individual trainin

    Read more →
  • How to Choose an AI Chatbot

    How to Choose an AI Chatbot

    Looking for the best AI chatbot? An AI chatbot is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI chatbot slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Raymond J. Mooney

    Raymond J. Mooney

    Raymond J. Mooney is an American computer scientist, professor of computer science, and director of the Artificial Intelligence laboratory at the University of Texas at Austin. His research focuses on machine learning and natural language processing. He was educated at O'Fallon Township High School in O'Fallon, Illinois and earned a BS, MS, and Ph.D. in computer science at the University of Illinois at Urbana-Champaign, where he was advised by Gerald DeJong. He is a fellow of the Association for Computing Machinery (ACM), Association for Computational Linguistics (ACL), and Association for the Advancement of Artificial Intelligence (AAAI).

    Read more →
  • AI Coding Assistants Reviews: What Actually Works in 2026

    AI Coding Assistants Reviews: What Actually Works in 2026

    Comparing the best AI coding assistant? An AI coding assistant is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI coding assistant slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • ReactiveX

    ReactiveX

    ReactiveX (Rx, also known as Reactive Extensions) is a software library originally created by Microsoft that allows imperative programming languages to operate on sequences of data regardless of whether the data is synchronous or asynchronous. It provides a set of sequence operators that operate on each item in the sequence. It is an implementation of reactive programming and provides a blueprint for the tools to be implemented in multiple programming languages. == Overview == ReactiveX is an API for asynchronous programming with observable streams. Asynchronous programming allows programmers to call functions and then have the functions "callback" when they are done, usually by giving the function the address of another function to execute when it is done. Programs designed in this way often avoid the overhead of having many threads constantly starting and stopping. Observable streams (i.e. streams that can be observed) in the context of Reactive Extensions are like event emitters that emit three events: next, error, and complete. An observable emits next events until it either emits an error event or a complete event. However, at that point it will not emit any more events, unless it is subscribed to again. The examples below use the RxJS implementation of Reactive Extensions for the JavaScript programming language. === Motivation === For sequences of data, it combines the advantages of iterators with the flexibility of event-based asynchronous programming. It also works as a simple promise, eliminating the pyramid of doom that results from multiple layers of callbacks. === Observables and observers === ReactiveX is a combination of ideas from the observer and the iterator patterns and from functional programming. An observer subscribes to an observable sequence. The sequence then sends the items to the observer one at a time, usually by calling the provided callback function. The observer handles each one before processing the next one. If many events come in asynchronously, they must be stored in a queue or dropped. In ReactiveX, an observer will never be called with an item out of order or (in a multi-threaded context) called before the callback has returned for the previous item. Asynchronous calls remain asynchronous and may be handled by returning an observable. It is similar to the iterators pattern in that if a fatal error occurs, it notifies the observer separately (by calling a second function). When all the items have been sent, it completes (and notifies the observer by calling a third function). The Reactive Extensions API also borrows many of its operators from iterator operators in other programming languages. Reactive Extensions is different from functional reactive programming as the Introduction to Reactive Extensions explains: It is sometimes called "functional reactive programming" but this is a misnomer. ReactiveX may be functional, and it may be reactive, but "functional reactive programming" is a different animal. One main point of difference is that functional reactive programming operates on values that change continuously over time, while ReactiveX operates on discrete values that are emitted over time. (See Conal Elliott's work for more-precise information on functional reactive programming.) === Reactive operators === An operator is a function that takes one observable (the source) as its first argument and returns another observable (the destination, or outer observable). Then for every item that the source observable emits, it will apply a function to that item, and then emit it on the destination Observable. It can even emit another Observable on the destination observable. This is called an inner observable. An operator that emits inner observables can be followed by another operator that in some way combines the items emitted by all the inner observables and emits the item on its outer observable. Examples include: switchAll – subscribes to each new inner observable as soon as it is emitted and unsubscribes from the previous one. mergeAll – subscribes to all inner observables as they are emitted and outputs their values in whatever order it receives them. concatAll – subscribes to each inner observable in order and waits for it to complete before subscribing to the next observable. Operators can be chained together to create complex data flows that filter events based on certain criteria. Multiple operators can be applied to the same observable. Some of the operators that can be used in Reactive Extensions may be familiar to programmers who use functional programming language, such as map, reduce, group, and zip. There are many other operators available in Reactive Extensions, though the operators available in a particular implementation for a programming language may vary. ==== Reactive operator examples ==== Here is an example of using the map and reduce operators. We create an observable from a list of numbers. The map operator will then multiply each number by two and return an observable. The reduce operator will then sum up all the numbers provided to it (the value of 0 is the starting point). Calling subscribe will register an observer that will observe the values from the observable produced by the chain of operators. With the subscribe method, we are able to pass in an error-handling function, called whenever an error is emitted in the observable, and a completion function when the observable has finished emitting items. ==== Usage in stream-oriented programming ==== Certain RxJS primitives such as BehaviorSubject make it possible to create pure stateful streams to track application state of arbitrary complexity in simple terms. The button below will feed an event to the stream, which in turn will re-emit the next natural number every time, back into the tag that follows and displays the count of clicks detected. Libraries such as Rimmel.js, designed around RxJS Observables, enable integration between reactive streams and the HTML DOM: == History == Reactive Extensions was created by the Cloud Programmability Team at Microsoft around 2011, as a byproduct of a larger effort called Volta. It was originally intended to provide an abstraction for events across different tiers in an application to support tier splitting in Volta. The project's logo represents an electric eel, which is a reference to Volta. The extensions suffix in the name is a reference to the Parallel Extensions technology which was invented around the same time; the two are considered complementary. The initial implementation of Rx was for .NET Framework and was released on June 21, 2011. Later, the team started the implementation of Rx for other platforms, including JavaScript and C++. The technology was released as open source in late 2012, initially on CodePlex. Later, the code moved to GitHub and has been ported to several other languages, including Go, Java, Kotlin, PHP and Rust.

    Read more →
  • Structured prediction

    Structured prediction

    Structured prediction or structured output learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than discrete or real values. Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to the ground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, so approximate inference and learning methods are used. == Applications == An example application is the problem of translating a natural language sentence into a syntactic representation such as a parse tree. This can be seen as a structured prediction problem in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains including bioinformatics, natural language processing (NLP), speech recognition, and computer vision. === Example: sequence tagging === Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such as part-of-speech tagging (POS tagging) and named entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with a class label representing the type of word: The main challenge of this problem is to resolve ambiguity: in the above example, the words "sentence" and "tagged" in English can also be verbs. While this problem can be solved by simply performing classification of individual tokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strong conditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as a hidden Markov model or conditional random field that predicts the entire tag sequence for a sentence (rather than just individual tags) via the Viterbi algorithm. == Techniques == Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. The main techniques are: Conditional random fields Structured support vector machines Structured k-nearest neighbours Recurrent neural networks, in particular Elman networks Transformers. === Structured perceptron === One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by Collins. This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows: First, define a function ϕ ( x , y ) {\displaystyle \phi (x,y)} that maps a training sample x {\displaystyle x} and a candidate prediction y {\displaystyle y} to a vector of length n {\displaystyle n} ( x {\displaystyle x} and y {\displaystyle y} may have any structure; n {\displaystyle n} is problem-dependent, but must be fixed for each model). Let G E N {\displaystyle GEN} be a function that generates candidate predictions. Then: Let w {\displaystyle w} be a weight vector of length n {\displaystyle n} For a predetermined number of iterations: For each sample x {\displaystyle x} in the training set with true output t {\displaystyle t} : Make a prediction y ^ {\displaystyle {\hat {y}}} : y ^ = a r g m a x { y ∈ G E N ( x ) } ( w T , ϕ ( x , y ) ) {\displaystyle {\hat {y}}={\operatorname {arg\,max} }\,\{y\in GEN(x)\}\,(w^{T},\phi (x,y))} Update w {\displaystyle w} (from y ^ {\displaystyle {\hat {y}}} towards t {\displaystyle t} ): w = w + c ( − ϕ ( x , y ^ ) + ϕ ( x , t ) ) {\displaystyle w=w+c(-\phi (x,{\hat {y}})+\phi (x,t))} , where c {\displaystyle c} is the learning rate. In practice, finding the argmax over G E N ( x ) {\displaystyle {GEN}({x})} is done using an algorithm such as Viterbi or a max-sum, rather than an exhaustive search through an exponentially large set of candidates. The idea of learning is similar to that for multiclass perceptrons.

    Read more →
  • RE/flex

    RE/flex

    RE/flex (or RE-flex) is a computer program that generates lexical analyzers also known as "scanners" or "lexers". Lexical analysis is the process of converting an input character stream into a sequence of tokens, a task known as lexical tokenization. == Overview == Most notable lexer generators used in practice, including Flex, Ragel, and RE/flex are based on deterministic finite automata (DFA) for efficient pattern matching, despite the theoretical possibility of an exponential increase in DFA size. In practice, lexer specifications typically use deterministic regular expressions, which makes substantial DFA blowup uncommon. RE/flex translates a POSIX-compliant lexer specification directly into a DFA using standard construction techniques described in the compiler literature, extending the techniques to handle lazy matching and indentation detection applicable to specific programming language tokenization tasks. Like Flex, RE/flex generates efficient DFA-based scanners, but it shares no code with Flex and is implemented as a complete rewrite in C++. In addition to its native DFA-based engine, RE/flex can also be combined with external regular expression libraries that are not DFA-based, such as the C++ standard library regex engine, PCRE, and boost.regex. This is achieved by systematically rewriting the set of lexer patterns into a form suitable for tokenization with the selected external library. RE/flex performs this rewriting automatically using translation rules that are specific to each supported regular expression library. A lexer specification defines a set of regular expression patterns { p i : i = 1 , … , n } {\displaystyle \{p_{i}:i=1,\ldots ,n\}} corresponding to different token classes, such as identifiers, keywords, literals, and operators. These patterns can be combined into a single regular expression R = ( p 1 ) ∣ ( p 2 ) ∣ … ∣ ( p n ) {\displaystyle R=(p_{1})\mid (p_{2})\mid \ldots \mid (p_{n})} . When applied to an input string, a regular expression engine repeatedly matches R {\displaystyle R} , returning the index i of the matched subpattern ( p i ) {\displaystyle (p_{i})} , thereby decomposing the input into a sequence of tokens. Example use cases include: Compiler construction, such as the use of RE/flex in the Tiger Compiler project within the EPITA compiler construction curriculum Compiler-compiler systems, including its use in Ox, an attribute-grammar–based compiling system Pattern matching and search tools, such as grep-like utilities, including the use of RE/flex in ugrep

    Read more →
  • Jiliang Tang

    Jiliang Tang

    Jiliang Tang is a Chinese-born computer scientist and a University Foundation Professor of Computer Science and Engineering at Michigan State University, where he is the director of the Data Science and Engineering (DSE) Lab. His research expertise is in data mining and machine learning. == Education and career == He received his BEng in software engineering (2008) and MSc in computer science (2010) from the Beijing Institute of Technology, Beijing, China. His PhD is from Arizona State University (2015), under the direction of Huan Liu. After gaining his PhD, he worked as a research scientist at Yahoo Labs (2015–16) before joining Michigan State University as an assistant professor (2016). His research has mostly been published jointly with Huan Liu. It has received over thirteen thousand citations documented by Google Scholar, and has received coverage in the media. == Awards == He has received the 2020 ACM SIGKDD Rising Star Award that "aims to celebrate the early accomplishments of the SIGKDD communities' brightest new minds", NSF Career Award, and Michigan State University's Distinguished Withrow Research Award. == Selected publications == === Books === Jiliang Tang, Huan Liu. Trust in Social Media, (Synthesis digital library of engineering and computer science; Synthesis lectures on information security, privacy, and trust, # 13) Morgan & Claypool, 2015 ISBN 9781627054058 === Peer reviewed journal articles === Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter. 2017 Sep 1;19(1):22-36. [1] Tang J, Alelyani S, Liu H. Feature selection for classification: A review. Data classification: Algorithms and applications. 2014:37. [2] Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection: A data perspective. ACM Computing Surveys (CSUR). 2017 Dec 6;50(6):1-45. [3] Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS. Heterogeneous network embedding via deep architectures. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining 2015 Aug 10 (pp. 119–128) Gao H, Tang J, Hu X, Liu H. Exploring temporal effects for location recommendation on location-based social networks. InProceedings of the 7th ACM conference on Recommender systems 2013 Oct 12 (pp. 93–100). Hu X, Tang J, Gao H, Liu H. Unsupervised sentiment analysis with emotional signals. InProceedings of the 22nd international conference on World Wide Web 2013 May 13 (pp. 607–618).

    Read more →
  • Clubhouse (app)

    Clubhouse (app)

    Clubhouse is an American social audio app for iOS and Android developed by Alpha Exploration Co. that enables users to participate in real-time, audio-only communication within virtual "rooms". Launched in March 2020 by Paul Davison and Rohan Seth, the platform is characterized by its "drop-in" nature, where users can join live discussions on a wide range of topics as either listeners or speakers. The application gained attention in early 2021, operating on an invite-only model and featuring appearances from public figures such as Elon Musk, Oprah Winfrey, and Mark Zuckerberg. During this period, Clubhouse reached a reported valuation of approximately $4 billion and contributed to the expansion of similar social audio features like Twitter Spaces and Spotify Greenroom. The app later expanded to Android in May 2021 and removed its waitlist in July 2021, opening access to the general public. == History == Clubhouse began as an invite only social media startup by Paul Davison and Rohan Seth in Fall 2019. Originally designed for podcasts with the name Talkshow, the app was rebranded as "Clubhouse" and officially released for the iOS operating system in March 2020 and as of May 2021 the Android systems as well. Clubhouse was valued at $100 million after receiving funding from notable angel investors. These investors included Ryan Hoover (Founder, Product Hunt), Balaji Srinivasan (Former CTO, Coinbase), James Beshara (Co-Founder, Tilt.com), and several venture capitalists, including a $12 million Series A investment from the venture capital firm, Andreessen Horowitz, in May 2020. The app gained popularity in the early months of the COVID-19 pandemic. It had 600,000 registered users by December 2020. In January 2021, CEO Paul Davison announced that the active weekly user base on the app consisted of approximately 2 million individuals. The company announced that it would start working on an Android version of the app. In that month, the app became widely used in Germany when German podcast hosts Philipp Klöckner and Philipp Gloeckler began an invite-chain over a Telegram group. It brought German influencers, journalists, and politicians to the platform. Clubhouse raised their Series B at a $1 billion valuation. On February 1, 2021, Clubhouse had an estimated 3.5 million downloads on a global level which grew rapidly to 8.1 million downloads by February 15. This significant growth in popularity was because celebrities such as Elon Musk and Mark Zuckerberg made appearances on the app. In the same month, Clubhouse hired an Android Software Developer. A year after the app's release, the number of weekly active users was greater than 10 million, but the user base declined 21% during three weeks from late February to early March. This decline was reportedly caused by a decrease in the number of Clubhouse users after its initial release. During its initial roll out, the app was accessible only by invitation, and invitation codes on eBay were selling at up to $400. On April 5, 2021, Clubhouse partnered with Stripe to launch its first monetizing feature called Clubhouse Payments. Although testing began with only 1,000 users, after a week, the company rolled out the functionality to another 60,000 or more users in the US. In the same month, Twitter entered in discussions to purchase Clubhouse for $4 billion. The talks ended with no acquisition. Later, the company raised their Series C round of funding at a $4 billion valuation. The app also received interest in a partnership, with the National Football League announcing a content deal that month; Twitter Spaces later poached Clubhouse's exclusive NFL deal with 20 official NFL Spaces scheduled for the 2021-22 season. Finally, On May 9, 2021, Clubhouse launched a beta version of the Android app for users in the US, and on May 21, 2021, Clubhouse became available worldwide for Android users. In July 2021, Clubhouse announced a partnership with TED to offer exclusive talks. and on July 21, 2021, the company discarded its invitation system and made the application available to all, though a wait list for registration was still applied in order to manage new traffic. As of the time of the announcement, the company stated it had 10 million users on the wait list. On September 23, 2021, the company announced a new feature named "Wave". In October 2021, Clubhouse rolled out new features called "Replays and Clips". In April 2023, the company announced it was reducing its staff by half amid a "resetting" due to post-pandemic market shifts. == Features == === Rooms === The primary feature of Clubhouse is real-time virtual "rooms" in which users can communicate with each other via audio. Rooms are divided into different categories based on levels of privacy. Moderator roles are denoted by a green star that appears next to the user's name. When a user joins a room, they are initially assigned to the role of a "listener" and cannot unmute themselves. Listeners can notify the moderators of their intent to join the stage and speak by clicking on the "raise hand" icon. Users who are invited to the stage become "speakers" and can unmute themselves. Users can exit a room by tapping the "leave quietly" button or with the help of peace sign emoji. === Houses === In August 2022, Clubhouse announced a feature called Houses, an invite-based version of the rooms. === Events === A lot of conversations in Clubhouse are of spontaneous nature. However, users can schedule conversations by creating events. While scheduling an event, users can first name the event and then set the date and time at which the conversation will begin. Users can also add co-hosts to help moderate the event. Once the event has been created, it is added to the Clubhouse "bulletin". The bulletin shows upcoming scheduled events and allows users to set notifications for events by clicking the bell icon corresponding to the event. Users can access the bulletin by clicking on the calendar icon at the top of the home page. === Clubs === At the Clubhouse, clubs are user communities that regularly discuss a common interest. Many clubs are present in Clubhouse which represents a wide array of topics. Users can find clubs by name under the search tab. A club consists of three categories of users: "Admin", "Leader", and "Member". Members can create private rooms and invite more users into the club. Leaders have all the privileges of a member. Apart from that, they are authorized to create/schedule club-branded open rooms. An admin can modify club settings, add/delete users, change user privileges and create/schedule any type of room. There are three types of clubs: "Open", "By Approval", and "Closed" for membership. Any user can join an open club by pressing the "Join The Club" button on the club profile. In case of approval, users need to apply and wait for membership by clicking the "Apply To Join" button on the club profile. The admins of the respective club are privileged to accept or reject the user's request. In a closed club, membership is limited to users selected by the club admin. All users of a club will be notified when a public room within the club is created. The club creation is restricted to active users and whoever creates the club will become the club admin. Eligible users can create a club by going to their profile, press the "+" sign present in the "Member of" section. Clubs in which a user is a member are shown on their profile page. The first club to half a million members was the Human Behavior Club founded by The Digital Doctor (Dr. Sohaib Imtiaz). === Backchannel === Backchannel is the messaging function which allows users to interact individually or within a group via text. The Backchannel feature was initially leaked on June 18, 2021, in response to the launch of Spotify Greenroom. This is notable step because, until this point, Clubhouse was voice only with no way to hyperlink or message. It was entirely dependent on Instagram and Twitter for text messaging. The feature was initially leaked in the App Store, which the company says was an accident on Twitter. A month later, after multiple failed attempts, the Clubhouse Backchannel finally launched on July 14, 2021. === Explore === The homepage of Clubhouse provides access to ongoing chat rooms, which are recommended based on the people and clubs that are followed by the user. As the users tap on the magnifying glass icon, they will be redirected to the explore page. On that page, users can search for people and clubs to follow and also find conversations categorized by topics. === Clubhouse Payments === This is the direct payment service provided by the app, which allows users to send money to content creators. It includes those users who had enabled this functionality in their profile. Money can be sent from users to the creator by clicking on their profile. Press "Send Money" then enter the amount you want to send. When a user does this for the first time, they'll be prompted to reg

    Read more →
  • The Best Free AI Bug Finder for Beginners

    The Best Free AI Bug Finder for Beginners

    Shopping for the best AI bug finder? An AI bug finder is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI bug finder slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Tree transducer

    Tree transducer

    In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking, tree transducers extend tree automata in the same way that word transducers extend word automata. Manipulating tree structures instead of words enable TT to model syntax-directed transformations of formal or natural languages. However, TT are not as well-behaved as their word counterparts in terms of algorithmic complexity, closure properties, etcetera. In particular, most of the main classes are not closed under composition. The main classes of tree transducers are: == Top-Down Tree Transducers (TOP) == A TOP T is a tuple (Q, Σ, Γ, I, δ) such that: Q is a finite set, the set of states; Σ is a finite ranked alphabet, called the input alphabet; Γ is a finite ranked alphabet, called the output alphabet; I is a subset of Q, the set of initial states; and δ is a set of rules of the form q ( f ( x 1 , … , x n ) ) → u {\displaystyle q(f(x_{1},\dots ,x_{n}))\to u} , where f is a symbol of Σ, n is the arity of f, q is a state, and u is a tree on Γ and Q × 1.. n {\displaystyle Q\times 1..n} , such pairs being nullary. === Examples of rules and intuitions on semantics === For instance, q ( f ( x 1 , … , x 3 ) ) → g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle q(f(x_{1},\dots ,x_{3}))\to g(a,q'(x_{1}),h(q''(x_{3})))} is a rule – one customarily writes q ( x i ) {\displaystyle q(x_{i})} instead of the pair ( q , x i ) {\displaystyle (q,x_{i})} – and its intuitive semantics is that, under the action of q, a tree with f at the root and three children is transformed into g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle g(a,q'(x_{1}),h(q''(x_{3})))} where, recursively, q ′ ( x 1 ) {\displaystyle q'(x_{1})} and q ″ ( x 3 ) {\displaystyle q''(x_{3})} are replaced, respectively, with the application of q ′ {\displaystyle q'} on the first child and with the application of q ″ {\displaystyle q''} on the third. === Semantics as term rewriting === The semantics of each state of the transducer T, and of T itself, is a binary relation between input trees (on Σ) and output trees (on Γ). A way of defining the semantics formally is to see δ {\displaystyle \delta } as a term rewriting system, provided that in the right-hand sides the calls are written in the form q ( x i ) {\displaystyle q(x_{i})} , where states q are unary symbols. Then the semantics [ [ q ] ] {\displaystyle [\![q]\!]} of a state q is given by [ [ q ] ] = { u ↦ v ∣ u is a tree on Σ , v is a tree on Γ , and q ( u ) → δ ∗ v } . {\displaystyle [\![q]\!]=\{u\mapsto v\mid u{\text{ is a tree on }}\Sigma ,\ v{\text{ is a tree on }}\Gamma {\text{, and }}q(u)\to _{\delta }^{}v\}.} The semantics of T is then defined as the union of the semantics of its initial states: [ [ T ] ] = ⋃ q ∈ I [ [ q ] ] . {\displaystyle [\![T]\!]=\bigcup _{q\in I}[\![q]\!].} === Determinism and domain === As with tree automata, a TOP is said to be deterministic (abbreviated DTOP) if no two rules of δ share the same left-hand side, and there is at most one initial state. In that case, the semantics of the DTOP is a partial function from input trees (on Σ) to output trees (on Γ), as are the semantics of each of the DTOP's states. The domain of a transducer is the domain of its semantics. Likewise, the image of a transducer is the image of its semantics. === Properties of DTOP === DTOP are not closed under union: this is already the case for deterministic word transducers. The domain of a DTOP is a regular tree language. Furthermore, the domain is recognisable by a deterministic top-down tree automaton (DTTA) of size at most exponential in that of the initial DTOP. That the domain is DTTA-recognizable is not surprising, considering that the left-hand sides of DTOP rules are the same as for DTTA. As for the reason for the exponential explosion in the worst case (that does not exist in the word case), consider the rule q ( f ( x 1 , x 2 ) ) → g ( p 1 ( x 1 ) , p 2 ( x 1 ) , p 3 ( x 2 ) ) {\displaystyle q(f(x_{1},x_{2}))\to g(p_{1}(x_{1}),p_{2}(x_{1}),p_{3}(x_{2}))} . In order for the computation to succeed, it must succeed for both children. That means that the right child must be in the domain of p 3 {\displaystyle p_{3}} . As for the left child, it must be in the domain of both p 1 {\displaystyle p_{1}} and p 2 {\displaystyle p_{2}} . Generally, since subtrees can be copied, a single subtree can be evaluated by multiple states during a run, despite the determinism, and unlike DTTA. Thus the construction of the DTTA recognising the domain of a DTOP must account for sets of states and compute the intersections of their domains, hence the exponential. In the special case of linear DTOP, that is to say DTOP where each x i {\displaystyle x_{i}} appears at most once in the right-hand side of each rule, the construction is linear in time and space. The image of a DTOP is not a regular tree language. Consider the transducer coding the transformation f ( x ) → g ( x , x ) {\displaystyle f(x)\to g(x,x)} ; that is, duplicate the child of the input. This is easily done by a rule q ( f ( x 1 ) ) → g ( p ( x 1 ) , p ( x 1 ) ) {\displaystyle q(f(x_{1}))\to g(p(x_{1}),p(x_{1}))} , where p encodes the identity. Then, absent any restrictions on the first child of the input, the image is a classical non-regular tree language. However, the domain of a DTOP cannot be restricted to a regular tree language. That is to say, given a DTOP T and a language L, one cannot in general build a DTOP T ′ {\displaystyle T'} such that the semantics of T ′ {\displaystyle T'} is that of T, restricted to L. This property is linked to the reason deterministic top-down tree automata are less expressive than bottom-up automata: once you go down a given path, information from other paths is inaccessible. Consider the transducer coding the transformation f ( x , y ) → y {\displaystyle f(x,y)\to y} ; that is, output the right child of the input. This is easily done by a rule q ( f ( x 1 , x 2 ) ) → p ( x 2 ) {\displaystyle q(f(x_{1},x_{2}))\to p(x_{2})} , where p encodes the identity. Now let's say we want to restrict this transducer to the finite (and thus, in particular, regular) domain { f ( c , a ) , f ( c , b ) } {\displaystyle \{f(c,a),\ f(c,b)\}} . We must use the rules q ( f ( x 1 , x 2 ) ) → p ( x 2 ) , p ( a ) → a , p ( b ) → b {\displaystyle q(f(x_{1},x_{2}))\to p(x_{2}),\ p(a)\to a,\ p(b)\to b} . But in the first rule, x 1 {\displaystyle x_{1}} does not appear at all, since nothing is produced from the left child. Thus, it is not possible to test that the left child is c. In contrast, since we produce from the right child, we can test that it is a or b. In general, the criterion is that DTOP cannot test properties of subtrees from which they do not produce output. DTOP are not closed under composition. However this problem can be solved by the addition of a lookahead: a tree automaton, coupled to the transducer, that can perform tests on the domain which the transducer is incapable of. This follows from the point about domain restriction: composing the DTOP encoding identity on { f ( c , a ) , f ( c , b ) } {\displaystyle \{f(c,a),\ f(c,b)\}} with the one encoding f ( x , y ) → y {\displaystyle f(x,y)\to y} must yield a transducer with the semantics { f ( c , a ) ↦ a , f ( c , b ) ↦ b } {\displaystyle \{f(c,a)\mapsto a,\ f(c,b)\mapsto b\}} , which we know is not expressible by a DTOP. The typechecking problem—testing whether the image of a regular tree language is included in another regular tree language—is decidable. The equivalence problem—testing whether two DTOP define the same functions—is decidable. == Bottom-Up Tree Transducers (BOT) == As in the simpler case of tree automata, bottom-up tree transducers are defined similarly to their top-down counterparts, but proceed from the leaves of the tree to the root, instead of from the root to the leaves. Thus the main difference is in the form of the rules, which are of the form f ( q 1 ( x 1 ) , … , q n ( x n ) ) → q ( u ) {\displaystyle f(q_{1}(x_{1}),\dots ,q_{n}(x_{n}))\to q(u)} .

    Read more →